The future looks rosy for companies who take advantage of what strategic data management can do. But the specter of needing a team of people handling on-premises hardware and the cost implications of doing so continues to make organizations hesitant to move forward with a new data strategy.
Here are a handful of factors to consider when weighing the costs versus benefits of implementing a big data strategy in your organization.
In 2012, I conducted a study that compared the cost of managing data with traditional data warehousing assets, such as Oracle, to the cost of managing that same data with an open-source software framework, such as Hadoop.
At the end of the day, including a 60% discount off list price for the hardware and software licenses for Oracle, the cost to manage 1 terabyte in a 16 terabyte configuration with traditional assets was $26,000 per terabyte compared to $400 per terabyte with an open-source framework.
The reason there wasn’t a mass exodus in 2012 from Oracle to Hadoop was because you have to consider the total cost of ownership. You have to ask, “Does my organization have the skills to manage this new technology environment? Is my existing Business Objects universe investment compatible with the back end?” In 2012, the answer was no.
Today, you can connect your existing Business Objects universe investment to Hadoop on the back end. Then, take all that data out of Oracle, expose it through HIVE tables where it can be accessed, and enable the environment to perform even faster than it can perform in Oracle for pennies on the dollar. Pennies! Why wouldn’t you do that?
It goes something like this, “Well, if my competitor is running their data warehouse for $4 million a year on a legacy technology stack, and I can ‘lift and shift’ my data warehouse to a technology stack that I can run for $40,000 a year, who’s going to gain a competitive advantage?”
In the TV series, “How to Get Away with Murder,” a forensic analysis of a suspect’s cell phone data that was backed up to his computer is performed. The other data is provided by the telecom provider.
Because of the GPS service on the suspect’s phone, the detectives were able to identify his entire route from one state to another, how much time he spent when he was in motion, how much time he spent when he stopped, when he started again, and how many minutes his phone was in a particular location. They were able to create a geospatial plot of his path, all using the data stream from his mobile phone as he was driving his car with his phone on his person.
This brings us to another important point when we think about data today. We’re living in a world of mashups. There’s opportunity to go out and subscribe to a Twitter feed and mash that up with an email address linkage in a way that would identify my behavior and thought processes.
All that lives in the Twitter space or in my Facebook posts can be analyzed. Mashing up these many sources of data into a mega-analytic platform capability has become something that is easy to accomplish, but not if you don’t have a strategy for how you’re going to manage the data.
Sam Walton’s objective with his fledgling Walmart stores was to always know what the customer wanted to buy and always have it on the shelves when he or she walked into the store.
Back in the 1980s, Walmart used Teradata technology to build a database to collect all of the point-of-sale data, which was then used to calculate how many units they would need to ship to each store so they wouldn’t have to carry a surplus of inventory. The rest is history. The database actually became much more valuable to Walmart than the inventory carrying costs problem they solved using it. And now Walmart is a half-trillion-dollar a year global company.
Amazon is another huge data success story. As you know, they started as an online bookseller and didn’t make much money selling books online. But what they were able to do is get consumers to go to their portal and interact and leave data behind.
They were very successful in leveraging that data, and from that data, they have grown into a company with over $100 billion dollars in sales. And now, of course, Amazon sells everything.
Amazon is using the highest-end analytics, called predictive analytics. In fact, they recently filed for a patent on an analytic model that can predict what you’re going to buy before you buy it. Predictive analytics tells them there’s a pretty good chance that you’re going to purchase a product in the next 24-48 hours.
They’re so confident in the accuracy of their algorithm that they will ship you that product before you even buy it. Let’s say something from Amazon shows up on your doorstep that you didn’t order, but it’s something that you wanted. Then you’ll pay for it. This isn’t yet a production feature of amazon.com, but keep your eye on the bouncing ball!
The future belongs to companies whose data game is completely integrated into the foundation of how they do business in the marketplace. And because companies like Amazon know so much more and their revenue is so diverse and their ability to manage data is so significant, they are now even in the data hosting and data enrichment services business.
They are selling their data and hosting apps in an infrastructure that exists because of their desire to manage data and ability to do it effectively.
If you look at where the venture capital partners are investing their money today, you’ll see that it’s in companies who are busy creating that layer of integration between the front end and the back end because they have determined that the benefits of having a big data strategy greatly outweigh any costs.