Note: This article was originally published in May 2009

When people are starting to wonder about origins of a discipline, it is on one side a proof of maturity for that discipline (a good sign) but also a sign showing the need to step back because we do not know anymore where we are, after so much fidgetiness (maybe a less good sign…).

Within the French forum “Méthode SOA“, Pierre Bonnet wonders about the origins of the term “Master Data Management” on the occasion of the writing of his new book “MDM and semantic modelling“.

The topic of managing Master Data is far from being something new, and it also has different names depending on the branch of industry. Better known as “Reference Data Management” in financial industry, or later “Master Data Management”, this last expression is now widely spread since 2004 with firsts publications of analysts (Ventana Research, IDC Group, Bloor Research and later on Gartner and Forrester), consulting firms (Logica Management Consulting) and software vendors (Orchestra Networks).

After few searches on Google, the oldest reference I found the term “Master Data Management” dates back to July 2002 at Rohm and Haas (chemistry) within an internal note that praised benefits of a Master Data project, grouping together in a project Customers, Suppliers and Materials data. This context here is transverse, cross-business processes and entities within the company, multi-objects oriented and so not a simple pure CDI (Customer Data Integration) nor a pure PIM (Product Information Management) issue that was already existing somewhere else. Please note that the underlaying technical infrasructure used was a ERP SAP/R3 from SAP.

While talking about SAP, please note that the german software vendor is also the first market participant to communicate about the term “Master Data Management”: at SAPPHIRE symposium in September 2002 at Lisboa, SAP communicated about the imminent release of a software called SAP Collaborative Master Data Management, which will become later SAP NetWeaver MDM. It seems also that early developers joined the SAP (C)MDM project already in July 2001. According to the press release, SAP was pushed by its customers (Nokia, Motorola semi-conductors and Dow Chemicals) to start developing such a software. SAP then integrated Master Data Management concept in its strategy already in 2002.

Until we find evidence of the contrary, we can estimate the origins of the term “Master Data Management in its current meaning) at around mid-2001/beginning of 2002, through SAP and its customer pool.

If anybody do have a proof of an earlier use of that term, the MDM community (and I) will be happy to know it !

PS: As a matter of interest, when having a look at registered patents, the oldest patent mentioning the term “Master Data Management” was submitted in … 1996 by Sharp, an ingenious data replication system on a local terminal that synchronize itself with a “Master” database… but we may be out of scope… or ?

References:

One year ago, a post has announced to the world the death of the Semantic Web Technologies. Today, all clinical signs corroborate with this announcement: Semantic Web Technologies will not revolutionalize Data Management and Information Management in enterprises.

Promised to a bright future
Born in 2001, the Semantic Web – or Web 3.0 – was promised to the same bright future as its brother the World Wide Web, and the stack technology standards enabling the usage of these interlinked data – RDF, OWL, SPARQL etc. – were promised to a bright future to revolutionalize the way data are managed within our digital world, including in the enterprise world.

A whole community of enthusiasts tried to apply this stack of technologies to every possible use-case to find the “killer app”, the holy grail that would act as a beachhead to start the revolution globally.

  • Big players – like Thomson Reuters, with their project Open Calais, or Oracle with its dedicated triple-store repository – have put research projects to the face of the public.
  • Technology providers in various industries like PolarLake tried to leverage Semantic Web Technology standards to solve existing problems better, benefiting from an ecosystem of existing application to faster model data, design taxonomies and ontologies, query and join data, translate data, derive facts… Data Integration and Relational Databases were non-sense, Semantic Web Technologies would save the billions of dollars wasted by firms in these activities.
  • Semantic Web Standards have been used to format the huge amount of multiple metadata generated to give a meaning to unstructured content, like at the BBC and enabling more powerful and more natural search functionalities.
  • The health industry uses it for deriving better diagnostics based on observations and symtoms. It is used as well in life science in analyzing DNA.

12 years afterward, the revolution still did not happen: the Web of Data still remain a pilot project (Linked Data) and the technology, appart from these few niches, remains as experiments within research labs. Semantic Web Technologies died before wide adoption, we will never see these technologies adopted in the enterprise world: the revolution of information management will not be “semtech”.

The murderer? They are multiple…

The Big Internet Guys killed the Semantic Web
First suspects, the initiative schema.org from Google, Yahoo &Co clearly killed any hope of developing RDF, the language used to describe anything as triplet [subject, relation, object] on the web. The big search engines have decided that microformats will describe how data will be accessed and indexed within the web pages. Exit RDF for the Web, exit the whole technology stack that was using it as foundation.

Big Data technologies hijacked firms’ attention
For many, the complexity of RDF/OWL and triple-stores has serverly limited their utility beyond very specific niches, but failed to find a problem that only semantic technologies would solve.

Performances was as well a subject of constant concerns when it came to derive facts from data, so much that benchmark contests were taking place every year to show that triple-store repositories were performing better every year. This wasn’t enough.

Clearly, Big Data NoSQL technologies demonstrate more practical usages and are easier to implement: they have killed any hope to find the very expected “killer application” that would have brought semantic technologies within the walls of the enterprises.

No acquisiton of semantic technology providers
One of the worst sign remain that after 12 years, no significant acquisition has been made by any major firm. And for those who saw a hope last year in Bloomberg investing in semantic technologies, through its acquisition of PolarLake, the story might be more complicated than that: while PolarLake offered a semantic technology based reference data management solution for financial firms, its product portfolio also included an advanced data exchange solution and RDAM, a unique module designed to optimize data acquisition costs toward helping firms to reduce their data costs for… Bloomberg data.

Whats next?
Too complicated, no support from influencer players, too many actors to be involved, no killer application. Of course, there is still some business to make in very specific niche, and the W3C continue developing the Semantic Web standards.
However, the research community will need to take the lessons learned and adapt to the new situation:

  • adapt to the need for speed by a user community influenced by the Big Data buzz, brought by the Big Data buzz
  • provide technology easier to implement to facilitate adoption.
  • combine the increasing interest in social media, mobile and big data to improve real-time contextual analysis, sentiment analysis, new information discovery
  • continue developing on the few use cases that made semantic web technologies so unique: classification and taxonomies, metadata to unstructured content, enterprise-level search, automated derivation of complex expertise (well, this one is not very new)

With the success of online crowdsourcing platform like Amazon’s Mechanical Turk, and data cleansing activities seen as non differentiating capability for most firms, would crowdsourcing be a viable alternative?

USAID’s crowdsourcing experiment
Last year in June 2012, the USAID reported to have called for online volunteers to cleanse 20.000 records before publishing the data on data.gov, the portal of the US government’s open data initiative.

The operation happened over a week-end with volunteers constituted of non-expert persons, which managed in 16 hours to cleanse 20.000 not machine-readable records by determining whether each record was a city or state level. “A big group of people can do a little piece of the puzzle and create the bigger whole,” says a USAID spokesman.

Even automated data contains errors
While the USAID admits that errors have probably been made because non-specialists were utilized, a spokeman reminds that even automated data contains errors.
The article describing USAID’s experiment reports that an accuracy assessment will be conducted to evaluate the credibility of the crowdsourced data, and the limits of the model.

Data vendors crowdsourcing their database
Another similar experiment has been conducted last year by Avox, a US data vendor specialized in Legal Entities data.

Called Wiki-data, the initiative was based on a wiki model, allowing any individual to complete and suggest updates to the Avox’s directory of company profiles. The submitted data are then checked by Avox, integrated into Avox’s database and made available to Avox’s customers over an API or traditional exports.

Trend: Crowdsourcing for data collection at speed, breadth and lower costs
While concerns on accuracy are inherent to the crowdsourcing model, these two experiments shows value of crowdsourcing in speed, costs and volumes. If accuracy is a concern, the submitted data need to be validated in-house, but non critical data (list of prospects, etc.) might not need such overload.

Trend: Value of data vendors and outsourcers switching toward quality of data
I believe that crowdsourcing plateforms like Mechanical Turk or CloudCrowd (with workflows and quality controls as work items) will create new data offerings at a very competitive price, moving the value of established data vendors and outaourcing providers from coverage of data toward higher quality of data.

Trend: Outsourcing of data maintenance activities, not quality controls
As outsourcing data management are seducing companies for whom these activities do not represent a differentiated capability, crowdsourcing like any outsourcing model will still require data quality controls in house for mission critical data. This stress out again the importance of establishing a body of Data Governance within the enterprise to ensure the quality controls when necessary, and reduce the overall cost of data maintenance activities.

If data is an asset, who else than the banks can better secure them, secure their transfer and generate revenues out of it? This is the very interesting proposition made in front of the whole banking industry last October 2012 by Innotribes, an innovation incubator backed by SWIFT.

The Middle Age of the Data Economy
As the number of data sources will continue to explose to fuel the increasing need of data, new businesses are created to fill the void: with new data marketplaces and data brokers opening worldwide almost on a daily basis, the digital ecosystem looks more and more like our physical world. But we are still somewhere in the Middle Age, with data exchanged from peer to peer, with the challenge to identify trusted data sources, ensure secured transactions and the need of a regulatory frame to protect the participants of this ecosystem.

Banks as a platform to facilitate exchanges of digital assets
Called the Digital Asset Grid, the project announced by Innotribes invites the banking industry to play the same central role for digital assets as they play today in our economy: facilitating secured exchanges of financial assets between parties. And this might revolutionize the way we use and exchange data tomorrow.

20130122-023203.jpg

In this vision, banks would provide data accounts to organizations and individuals to store their data in a security box: corporate data, legal documents, individuals attributes. When you authorize it, banks would provide trusted access to your digital assets to authorized third parties, on your behalf. This system would enable facilitated exchanges of information between third parties and would improve our real world transactions (e.g. buying a car, getting it registrated and insured), by replacing the costly and error prones exchanges of emails, faxes, papers, tc.

But in addition to charging for hosting data accounts and related access services, the real value for banks according to Innotribe is that banks would be in an unique position to act as a platform to facilitate the creation of new applications and services fuelled by these data they host.

Banks monetizing digital assets: a valuable proposition
The value proposition goes beyond the open model promoted by the LinkedData movement, by addressing the monetization of data services. Not to say as well that their intentions are radically opposed in their proposition, one promoting opened and available data, the other one focused on ownership and monetization.

This initiative would however introduce a very limited disruption in the way that banks do already secure trades and settlement of dematerialized financial instruments and currencies, by extending the current ecosystem to manage digital assets.

While initiative of such kind are not new, banks would rather find their place as established corporations with well known reputation to individuals and corporations, speeding up the establishement of trust. On the infrastructure side, SWIFT has already established the existing communication channels between the banks and could play a similar role in this new data business proposed by Innotribes.

The challenges of setting up a whole ecosystem
The main challenges will remain similar to those involved in the adoption of an innovation requiring a wide variety of actors, starting from the bank themselves, the parties trusting banks to host their data and pay for it, and the third parties to trust banks to deliver trusted data: a similar challenge as the one faced by the introduction of credit cards and payment cards, with some successes and some failures (like in France with Moneo, a micro-payment card system).

Another challenge will be the role of the Digital Asset Grid compared to existing global firms like Google, Facebook, Apple or telecom companies which already act as trusted sources and generate revenues using the data they hosts.

First use case?
I would expect that the initiative will require the successful implementation of pilot use cases within the next 3 to 5 years, before a wider adoption in the next decades. A first use case illustrated by Innotribe would be around the exchange of dematerialized paper documents between corporations, where paperwork is error prone and very costly. Afterward, the scope of digital assets could be extended to a wider audience, with technically very little limitations.

Innotribe reports the quote bank involved in the project:

“the Digital Asset Grid could double the size of the whole banking system in its entirety”

.