Tamr Catalog: Organize Your Metadata

Much of the roughly $3 to 4 trillion invested in enterprise software over the last 20 years, according to Gartner, has gone toward building and deploying applications to automate and optimize key business processes for specific functions (sales, marketing, manufacturing) and/or geographies (countries, regions). The result: The data tied to these investments is extremely heterogeneous and siloed.

Companies are investing heavily in Big Data Analytics – $44 billion in 2014 alone according to Gartner – in an attempt to benefit from all this data. But Big Data Analytics requires clean, unified data that spans all the various silos. Most companies are finding that this heterogeneity is a massive roadblock to getting high-quality data efficiently into their state-of-the-art analytics and visualization tools. In fact, only 10 to 12% of the available data within an enterprise is used for analytics.

The problem runs deeper: Not only are enterprises unable to use this heterogeneous data for analysis, it is far
too complex and time-consuming to even identify and locate all of the other “dark” data relevant to analysis. Knowledge about the data is so fragmented across the enterprise that an average CIO does not have an inventory of all of their data sources and attributes, let alone how they relate to logical entities (e.g., customers, suppliers). And some of the most useful attributes remain underutilized because of lack of knowledge sharing across departments. 

+ IDC has stated that up to 90% of big data is dark data 1.

+ An Informatica study showed only 16% of the respondents believed they know where allsensitive structured data is located, and a very small percentage (7%) know where unstructured data resides 2