Winning With Big Data – Is Your Organization Operating At Only 20% Capacity?

Posted by at 19:53h

Discovering Big Data

Big Data Has Always Been the Future

Over the past decades there have been many predictions on the value that data would bring to a company to better compete.  Famously, the editor in charge of business books at Prentice Hall in 1957 commented: “I have travelled the length and breadth of this country and talked with the best people, and I can assure you that data processing is a fad that won’t last out the year.”

Fast forward 60 years to 2016, and it’s now clear that innovative companies such as Amazon, Uber, Facebook and Priceline, to name just a few, were able to initially disrupt and then dominate their respective industries because of their aggressive use of data.  Peter Sondergaard, Senior Vice President and Head of Research at Gartner, makes an analogy comparing data to a new source of power, stating “Information is the oil of the 21st century, and analytics is the combustion engine.”

Big Data Is Available But Why Is It Limited

Unfortunately, most companies are only utilizing a small amount of the available data and data relationships in their company.   Our research has found that in practical terms, the typical company uses only about 20% of its enterprise data because it relies upon the data explicitly known at the metadata level of the company’s databases.  There is nothing inherently wrong with this, as the metadata are the specified data elements that the database architects design into your system to store and manage the required information for the application.  For example, if we were to create an application to manage sales of Rock Music Albums, we might create database elements that include Album Name (‘Let It Be’), Artist Name (Beatles), Record Label (Apple), Release Date (May 8, 1970) and Number of Albums Sold (545 Million).  [NOTE: If you want to appear more contemporary, you can substitute the Beatles with Justin Timberlake or Drake!]

Inevitably as time goes by, the needs of the Rock Music Album business will continue to grow.  The business will request that their software developers build additional functionality into their applications to allow them to better compete against new competitors and enter new markets.  The software developers will happily comply, and will develop new software programs which will introduce new data and data relationships into the software, but which are often not updated at the metadata level in the database.  These new data elements and data relationships created by the programmers might include very useful information such as Producer (Phil Spectre), Lead Singer (Paul  McCartney), Song Writer (John Lennon, Paul McCartney), Highest Chart Position (1), Weeks on Chart (59), Digital Albums Sold (on iTunes, Amazon, Other), CD Albums Sold (at Amazon, Walmart, Target, Best Buy, Other), Vinyl Albums Sold (at Amazon, eBay, Best Buy, Other), and much more.  Over time, these new data elements can come to comprise up to 80% of your enterprise data.

The Challenges of Data Discovery

So, in this hypothetical case, what does this mean for the business in regards to understanding their company’s data assets, and their ability to leverage it?  Unfortunately, the news is not good.  The developers often do a poor job of documenting their work, and over time have left the company.  The new data elements and data relationships developed are never updated into the metadata and are effectively ‘hidden’, used by the program solely for its computing tasks.  The business is left with performing data management and analysis with only the 20% of its data visible at the original database metadata level, and is not able to effectively utilize the other 80% of its ‘hidden’ data assets.   Discovering and identifying these undocumented data elements and data relationships is a huge challenge.  Significant effort is required to address this, causing delays in time-to-market and/or deployment with substandard product or incorrect information.  This puts the business at a significant competitive disadvantage to other more data savvy companies.

The Solution To Tackle Data Discovery

Fortunately, there are several ways for the business to address this.  A common solution is an expensive manual approach, whereby the business assigns specific project teams comprised of the company’s software developers, data architects and subject matter experts (often supported by teams from outside technology consulting firms) to manually review its entire data environment to discover and document the hidden data elements and its data relationships.  Alternatively, the business can partner with a technology firm like ROKITT that specializes in automated data discovery.

If you are embarking on a data discovery project, consider ROKITT as your strategic partner of choice.  ROKITT has worked in some of the most demanding IT environments in the world, and our people bring the agility, skill and experience required to build the best-in-class, automated data discovery product.  ROKITT ASTRA uses machine learning, heuristics and deep domain knowledge to automatically discover and self-learn data relationships with up to 90%+ accuracy to help organizations to quickly and accurately baseline and understand their enterprise data landscape.