Dr Bissan Al-Lazikani, head of data science at The Institute of Cancer Research, reveals how the capability to integrate and analyze vast volumes of data is playing a vital role in the discovery of effective cancer drugs.
Cancer is a very human challenge. During their lifetime almost 40% of the world’s population will develop some form of the disease. But many in the cancer research community now believe a cure is likely to emerge not from pure biomedical research but from the intersection of cancer biology, mathematics, machine learning and data analytics. So it might not be surprising to find a data scientist leading a team of researchers, scientists and developers to find the drugs to effectively treat – and ultimately defeat – cancer.
The leap from biology to information technology isn’t that large. In fact Dr Bissan Al-Lazikani of The Institute of Cancer Research, notes that her field of computational biology has been around for more than 50 years. She says: “It’s come from the idea that in biology there is so much information. It became very clear that what we need to do is start applying mathematics and data analysis to the information that we’re collecting from biology.”
And as head of data science at the ICR, Al-Lazikani works with huge amounts of data generated by patients during their cancer diagnosis and treatment. “We are now capable of collecting amounts of data that we never thought were possible before,” says Al-Lazikani. She reveals that if her team were able to collect all the information they need about a patient to allow for tailored treatment, it would be around 50 terabytes per person. To put that in perspective, she explains: “That compares with 45 terabytes of data generated by the Hubble Space Telescope in 20 years of its operation.”
And while Al-Lazikani and her team might not yet be collecting those kinds of volumes for each patient, they are dealing with petabytes of data pulled from a variety of sources, including patient samples, genomic sequencing, medical images, lab results, experimental data and pharmacological data.
Al-Lazikani is far from concerned about the possibility of drowning in data, in fact she says: “What’s great about data science and data scientists is that we are infinitely data hungry. You can’t give us too much data.”Smarter, faster decisions
The benefits of collecting and analyzing increasingly large amounts of data far outweigh the technical challenges such as storing it. She says: “You can start becoming a lot smarter about what kind of treatment you can provide to the patients in order to address the specific disease that they have. We are starting to really understand the complexities of cancer at a level that is unprecedented. This has really revolutionised the way we do drug discovery.”
It is Al-Lazikani’s computational background that is the key to helping solve the problem of finding effective drugs to treat cancer, as her expertise could help speed up the process of cancer drug discovery. She explains that big data analysis is now used to decide where scientists should invest their efforts: “We have so many possible avenues to explore for drug discovery and any one of them could take us down many years and many hundreds of hours of scientists’ time – and we could end up at a dead end. So the question we asked is: ‘Is there a way to use the entirety of the public knowledge that we have to help us make better decisions and really focus our efforts on avenues that are more likely to deliver for cancer drug discovery?’ And so the first challenge was actually bringing all of this data together.”
Al-Lazikani and her team developed the world’s largest cancer disease knowledge base, canSAR, to house all the information they have collected. The platform – essentially a public service engine on cancer – collates and integrates the data taken from different areas of science. This has meant staff can ‘ask’ the platform questions and receive answers rapidly, rather than the weeks it would have taken to work out manually. CanSAR is a public resource and has been accessed by almost 200,000 scientists worldwide.Data sharing
The practice of sharing patient data is often met with reservation at best and panic and refusal at worst. Al-Lazikani admits it is not always seen in the best light: “Data sharing can end up, especially the way that it’s represented sometimes in the media, as a total Pandora’s box of nightmares.”
However, in reality it isn’t such a dramatic challenge in the cancer drug discovery field. Al-Lazikani says data issues are typically broken down into two concerned parties: the patients whose data is being used and the pharmaceutical companies who want the IP for a drug. Due to ICR’s status as an academic non-for-profit institute it can work precompetitively and share its knowledge with the community.
She says: “There is so much to do in cancer research and cancer drug discovery, we can’t afford to be protective. And the nice thing is that over the past five years even pharmaceutical companies are now increasingly working in this precompetitive mode. I think there is now a wide acceptance, certainly in drug discovery, that it’s really important to share this early knowledge and focus the IP much later downstream with the drugs themselves.”
Referencing patient data, Al-Lazikani says a lot of the research undertaken by ICR is based on anonymized data. She has found that often patients are keen to release their data for analysis as they can see the benefit to themselves and the wider community. And in fact it’s often researchers who are more sensitive about compliance issues. Looking to the future Al-Lazikani believes the answer is to involve patients more and more in the research and to have open conversations about the use of data. She says: “The more data we are gathering, the more patients we are profiling, the smarter the computer algorithms: the better we are becoming at discovering drugs for cancer.”