How big data raises the stakes in the security arena
Organizations are changing their approaches to data security and governance to match their big data ambitions.
As an Internet analytics company, comScore must crunch copious amounts of data to provide its clients with meaningful market insights. In fact, the Reston, Virginia-based company ingests a whopping 60 billion new data events every day. With such vast volumes, you’d think the storage of data and its processing would be foremost concerns for comScore. But in a world where hackers are exposing spy programs and threatening major motion picture studios, it’s data security – not storage – that’s keeping comScore’s IT team on high alert.
“It’s an ever-changing game of cat and mouse,” says Michael Brown, chief technology officer at comScore.
Brown is part of a growing band of C-level executives relying on innovative approaches and solutions to stay one step ahead of predators. Ad hoc networks, field-level encryption, fuzzification – these are just a few of the new tactics companies such as comScore is now employing to protect its mammoth data assets.
Data security has always ranked high on CIOs’ priority lists but a number of factors are creating an even greater sense of urgency. For one, there’s the emergence of big data. According to research firm IDC, the world will be producing an unprecedented 44 zettabytes (or 44 trillion gigabytes) of data annually by the year 2020.
Big data, big risks
“With big data comes a lot more risk,” warns Brian Bourne, president of New Signature Canada, a Toronto-based technology consultancy and founder of Sec Tor, the largest security education conference in Canada. The ability to amass ever-larger volumes of personal information, customer information, credit card information, customer buying patterns, location data and data from sensors increases the potential security risk substantially, he outlines.
|Brian Bourne, president of New Signature Canada|
But it is not just the ballooning nature of data assets that matters here. Another factor raising the stakes on security is open data. Initiatives, such as President Barack Obama’s signing of the Open Data Executive Order in May of 2013, are increasing the availability of public data sets – information that can be used to build new businesses, generate revenue or develop new products. The concern, however, is that multiple sources of public data can be pieced together to yield confidential information about individuals.
|Peter Wood, CEO of First Base Technologies|
It’s something Peter Wood, chief executive officer of First Base Technologies, a UK-based security consultancy, has been observing. “As we consolidate more and more data into one big repository, there is a danger in that when it’s interrogated, it can create other data sets that have a completely different security risk level than the actual source data,” says Wood. For example, linking the name and email address of an individual from one data set to the home address and credit rating gleaned from another data set may actually reveal an individual’s identity, creating unanticipated security risks.
Elephant in the computer room
Then there’s the growing popularity of Hadoop to contend with. This open-source software framework stores and processes massive amounts of structured and unstructured data across low-cost commodity servers. Yet despite its ability to parse huge volumes of disparate data, many insist on keeping the technology in sandbox settings rather than in real-world production environments. That’s because Hadoop’s distributed architecture makes encrypting data and enforcing roles-based access controls more difficult than if data were part of a single solitary cluster. Indeed, according to a recent survey by DataGuise, 77% of respondents said they feel it is important to protect access to the sensitive data stored in their Hadoop environment.
Despite these risk factors, savvy CIOs are finding new and inventive ways to protect their data while continuing to take advantage of up-and-coming big data tools like Hadoop. In the case of comScore, which uses the MapR Hadoop distribution, Brown says creating one set of networks for the capture of data and a separate set of networks for the processing of data has helped to create a secure Hadoop environment. “This allows us to isolate [our data] and provide a separation that’s useful from a security perspective,” he says. “If the networks were together, someone could compromise the machine that’s receiving data and get direct access to the core asset which is a risk.”
Another way comScore protects its data is through a process it calls ‘fuzzification.’ Essentially, fuzzification involves sifting through the data being captured, detecting sensitive strings of data and then stripping any confidential details from them such as personal identifiers. “We modify every IP address we capture and adjust the last couple bits of it so that we’re not actually logging the real IP address,” says Brown.
ComScore is also ensuring field-level encryption on sensitive data by encrypting information such as credit card numbers and financial data in specific data fields. Drilling down deep into data sets, and encrypting data based on attributes rather than the environment in which it’s stored, will become an “essential” practice among security-minded CIOs, according to Wood. “One [technique] that’s currently getting quite popular is called attribute-based encryption,” he says. “Instead of worrying about encrypting an entire database, you actually encrypt the individual data by field level. It’s quite a radical approach; most businesses have not even thought about it.”
People and processes
Yet it takes more than technology to safely navigate today’s big data universe. People and processes are also key to keeping cybercriminals at bay. For instance, Bourne says it’s critical that CIOs not only focus on securing data but also the insights they glean from the data, and the algorithms being used to give it real value. “What you want to protect are the conclusions you reach from working on that data or the formulas you’re using,” says Bourne. “What you want to secure is what you consider valuable and that’s not always the data itself.”
It also pays to assess the strengths and weaknesses of in-house IT talent. After all, says Wood, delivering big data security requires “unusual and new skills. We’re talking about people with data analysis skills that typically aren’t used in a normal networking environment.”
Bourne agrees. “You should have analytics staff that understand security thinking – what are the assets, what are the attack vectors and how should we configure our environment to implement security?”
Not surprisingly, data security is also baked into comScore’s organizational structure. “We make sure everyone understands the terms and conditions of our data and every employee has to acknowledge those terms and conditions every year,” says Brown. “Every employee in the company, from finance to sales to HR. Everybody.”
To underscore the importance of data security, comScore requires employees to take mandatory training on data security and privacy. Specific policies and criteria determine who is – and who is not – permitted to access certain sources of data. And IT operations and development teams function as separate entities so that developers must meet a certain set of criteria before pushing code into the production environment.
In the meantime, vendors such as MapR, Zettaset, Cloudera and Hortonworks are fast developing data management tools for managing and controlling Hadoop clusters. And training programs are emerging that get technology professionals up to speed on how to deploy and manage Hadoop securely. Together, this blend of tools, processes and best practice promises to help companies take advantage of big data while minimizing security risks.