The Icahn Institute for Genomics and Multiscale Biology’s Eric Schadt joins our experts discussing the untapped potential of data analysis in medicine, education, and elsewhere, along with the pitfalls that may lie ahead.
How does supercomputing prove useful in medical research?
There are two main paths. One is in managing the amount of data that can be generated today in the medical arena, things like DNA sequencing. For example, a whole genome sequence of a cancer patient would generate a terabyte of data. If you imagine doing many hundreds of thousands of individuals, you’re now into the peta- and even exabyte scales of data. Managing and processing that information down to something that can be medically actionable requires supercomputing infrastructure and expertise.
And then the other path would be coming up with predictive models of disease based on the subtype of disease you have and what treatments may best target that subtype of disease, employing very sophisticated mathematical algorithms that require supercomputing to execute in a timely fashion.
How does that change the role of the individual doctor and the patient-doctor data relationship?
It’s pretty fundamental. What’s different with the kind of approach we’re taking is we’re saying we’re going way deeper into you as an individual, not as a population. Take something like diabetes. There may be 100 different subtypes of diabetes, different reasons of why you’re a diabetic vs. your neighbor. You may have beta cell dysfunction in your pancreas. You may have uptake receptors in your muscles that don’t effectively take up the glucose. You may have enzymes in the liver that don’t metabolize the glucose efficiently and so on. And those different reasons may demand a different treatment.
What the doctor sees is that end stage, but now what they’re able to see through all of these higher resolution technologies are the upstream drivers of those downstream consequences. That’s been invisible to doctors up until very recently. There are millions of variables, and the human mind can’t comprehend that.
Those are human shortcomings that math can help with. Where are the shortcomings in the math that need a human touch?
The game is to present the information in a way that engages the human mind, which is a pretty amazing pattern recognition machine. It’s very much a partnership at this point. Maybe 10, 20 years down the road, computers like Watson and so on are going to be good enough to where the human intervention is less. But today that’s not true.
Should we be concerned about organizations collecting medical data for their own uses?
If we really want to have an impact on human well-being, these data and models have to be open and accessible to all. This happens in physics; all the data from the Hadron Collider is publicly available. The question, of course, is the protection of privacy.
Are there technical solutions to the privacy issues?
Certainly one can protect and house data and secure computed environments and employ lots of security protocols to ensure the data aren’t compromised. But the one thing we know is that with any form of high-dimensional data there’s really no way to anonymize that. It’s almost like a photograph; there’s no reasonable expectation of privacy around how you look because it’s out in the open. You can’t hide it. I think DNA and other such molecular dimensions are going to ultimately fall under the same kind of ruling just because the technologies will be good enough that sequencing your genome will be as easy as taking a photograph and as cheap as taking a photograph.
What types of analyses have you held back on for ethical reasons?
We published a paper showing that lots of data that wasn’t protected in the same way that DNA is protected was being released in the public domain, and that we could uniquely identify individuals in those studies based on non-DNA-based information. So there may be enough, say, genetic information contained within morphological features in your face that we could bar-code you with a DNA tag, so with just a photograph we could derive a DNA bar code and be able to match that up to whatever genome that face belongs to.
How do you deal with information overload in your personal life?
You can’t always be immersed in the big data. I get out snowboarding, or racing motorcycles, or things that help you unwind and engage the more primitive components of the mind.
For more conversation and video, visit: www.businessweek.com/fix-this/big-data.