Your choice regarding cookies on this site
Click on Accept to agree or Preferences to view and choose your cookie settings.
According to IDC, data volumes within enterprise data centers will grow by a factor of 50 over the next decade. And C-level executives have come to appreciate the potential within those multiple terabytes – in a 2012 survey by the McKinsey Global Institute more than half of the respondents said that big data and analytics was a top ten digital priority. We asked four IT leaders to outline their journey for extracting business value from big data.
Dr David Stephenson
Head of business analytics, eBay Classifieds Group
Online ads business serving 1,000+ cities
At eBay, we are pushing the boundaries of big data, having developed systems and tools to handle what are unbelievable amounts of data. Even a year ago, we were taking in 50 terabytes (TB) a day, and processing over 100 petabytes (PB). One of our systems alone, “Vivaldi,” touches about one terabyte every second.
For our big data needs, we have a multi-PB Teradata enterprise data warehouse. It is blindingly fast on structured SQL and provides great concurrency, but it is not inexpensive and meets just a portion of our analytical needs. What we were missing was flexibility when dealing with semi- or unstructured data and the power to run complex algorithms. The cost meant that, until recently, we were not able to keep all of the information we’d like to. We would sample 1% to 10% of the 50TB coming through our systems daily and throw away the rest. As an Internet company dependent on data, we simply hated throwing any of it away.
The challenge is that we have got hundreds of trillions of user behavioral patterns and clickstream data we’re trying to keep and analyze. So that led us to develop a system that projects structure onto this kind of unstructured data at run-time. Not an easy thing to do.
The “Singularity” project, as we call it, has given us the flexibility to query semi-structured data in a relatively low-cost way (it runs on commodity hardware) and to scale up from roughly a half dozen PB of retained data to nearly ten times that amount — so we are now able to store and analyze all of our usage data. The type of query we might ask of that semi-structured data is, “What were the top items displayed on the site on a given day?” This might look at 5 billion page views, 40 billion page impressions, with 135 million unique items appearing, and generate an answer in an unbelievable 30 seconds.
Inventor of the World Wide Web and director of the W3C International governing body of the web
CIOs are split on the idea of opening up their vast amounts of internal data to consumers and partners. On the one hand they are saying they should not be giving out detailed data about things like products; and on the other hand, they are paying out millions to share as much data as possible about products so consumers or potential partners can find and buy them.
Restricting what people can read about your products makes no business sense. We have to understand that sharing data brings real benefits. I think the principle of “progressive competitive disclosure” applies here: the more information you are prepared to share with customers or suppliers, the more likely it is they will deal with you. And that is starting to happen in data. Look at the example of Best Buy. They’re now using RDFa [Resource Description Framework in Attributes, the W3C-developed semantic web standard for embedding rich metadata in web documents] so a customer or partner can go to any Best Buy product page online and slurp up all the embedded data about it. Best Buy channels can pull in data about who’s selling what products all over the Best Buy network.
At the moment there is a big open data push by some governments but it’s not just about transparency: open data is about economic benefits. For example, if I want to travel across Europe by public transportation, surely all the bus, ferry and train schedules I need from all the transportation companies should be available.The only way that can happen is if governments at a European level require those companies to publish their data.
World’s largest online games publisher with revenues of $4.8 billion
Our games are intrinsically very social: you talk, you interact with other gamers, and you even connect with us socially for customer service. But that means one of our big challenges is unstructured data. What we found is that when we take unstructured big data [from Activision social apps, Twitter, Facebook, email, etc] and try to turn it into structured data for reporting, a lot of data quality issues arise. That data is changing in real-time, and by the time you squeeze this into the structured model it breaks — on a regular basis. So we find ourselves now with a data mart that’s structured, that needs to be tuned, adjusted and worked on very frequently. We really want to understand what the gamer does and how they are playing our games, and integrate that big data with the structured data, like their name, in order to make their experience better — so we can understand how best to improve our games.
We even use it to find people cheating in competitions. The trick is to be able to make use of that data in real time, not tomorrow or the day after. Waiting for 24 hours to find out if someone cheated while playing Call of Duty is too late. Waiting for 24 hours to respond to someone who is not enjoying Skylanders because they are using the wrong mode isn’t going to help them — and they might be gone by then. But merging structured and unstructured data, getting insight on such huge volumes and then serving it up in real-time is super difficult. It takes people who really know about the analytics, that really know how to dig the data. We have a whole group of PhDs who do nothing else but that. They are very analytical, very statistical and very smart people — real data nerds.
Update: Since January 2013, Robert Schmid has been a technology strategist for the Salesforce.com Foundation.
CIO, Maersk Line
Danish shipping business operating more than 600 vessels and 3.8m containers around the world
Maersk Line is a network business, but one operating in an industry that is immature when it comes to deploying technology. We are presented with a fantastic opportunity, because shipping, after all, is an optimization business. We need to optimize our networks, our container flows, our yields — everything around our business. And until a few years ago, we hadn’t spent a lot of time trying to do that.
Maersk Line needs an improvement engine and that engine is the capability to embrace big data and work out how we can use it to optimize our business. We may be one global network company, but we have a hugely complex IT landscape, with diverse solutions and data silos across regions, functions and projects.
During the past three years we have begun to correct all of that. It is a strategy that calls for a high level of data integration — providing the ability to share, analyze and optimize data across the company — and a high level of business process standardization. We now have a business intelligence (BI) vision: a data foundation that is the single source of the truth.
For example, we operate 200,000 refrigerated containers around the world. As part of our Remote Container Management project, we can give customers access — via our data warehouse — to data relevant to these containers. This allows them to, say, vary the temperature inside while the cargo is in transit.
So if a consignment of ripening fruit is on its way to St. Petersburg from South America and the customer decides to discharge it early in Rotterdam, they can re-set the temperature remotely so the cargo is mature when it arrives in Rotterdam. In such situations, IT is being proactive and partnering with the business in a collective way, with commitment made at the highest levels of the organization. That’s the engine that’s going to drive improvement for us.
Update: Robin Johnson replaced Adam Gade as CIO of Maersk Line in August 2012.
Click on Accept to agree or Preferences to view and choose your cookie settings.