Creating the ‘Cancer Worldwide Web’
(This is part 1 of 2 of a series of articles excerpted from "Georgetown on Front Line of Effort to Create 'Cancer Worldwide Web'" in the Spring/Summer 2009 issue of Georgetown Medicine Magazine)
Today’s molecular-based cancer research generates vast amounts of proteomic and genetic information. As researchers laboriously gather these data from tens of thousands of patients, the growing mountain of knowledge could soon surpass their ability to effectively analyze or use it.
How will physicians and scientists manage and access such precious and potentially life-altering information?
Georgetown University is tackling this question through its participation with an initiative directed by the National Cancer Institute and the Center for Bioinformatics. This program, known as caBIG (Cancer Biomedical Informatics Grid) seeks to transform the way cancer research is translated into therapies. The project, which combines the expertise of more than 1,000 participants from some 80 organizations, is creating a network that will openly connect the cancer community to speed the discovery of novel approaches for cancer prevention and treatment.
“The grid is being described as the ‘cancer worldwide web,’ because it will allow people to connect the way we connect on the internet,” said Robert Clarke, PhD, DSc, professor of oncology and physiology & biophysics at Lombardi Comprehensive Cancer Center. He also serves as interim director of Georgetown’s Biomedical Graduate Research Organization (BGRO). “So conceptually, it’s big — an information grid that links cancer centers all over the country in real time.”
The initiative is best envisioned as an open-access infrastructure that allows people from multiple institutions to connect virtually through tools that are aggregated on one grid, according to Clarke.
“It will allow physicians, researchers, patients and their families to share data, to manipulate data, and to work together in ways that we couldn’t up to this point,” he said.
Lombardi, along with the university’s Protein Information Resource (PIR) and Advanced Research Computing (ARC) divisions, are helping to provide research infrastructure for the project. Housed at Georgetown, PIR is an integrated public bioinformatics resource to support genomic and proteomic research that is extending access to its database to more researchers through caBIG. ARC, a division of the University Information Services, is responsible for hosting the grid infrastructure to enable data-sharing.
ARC Program Director Steve Moore describes caBIG as an information network with over 40 highly specialized toolsets to allow researchers to share data and knowledge.
“CaBIG combines interoperable software tools, data standards and a computing infrastructure with many tools developed specifically by and for researchers,” said Moore. CaArray, a microarray data management system, and caTissue, an inventory of biospecimens, are among the first to have been implemented at Georgetown, he added.
In a sense, caBIG could be thought of as a container holding all the tools researchers and physicians need to store and share information, accessible online to others who can benefit.
“These tools are allowing us to see that nature is more complex than we thought, and while we don’t yet know what the overarching biological rules are — such as the interrelationship between multiple signaling pathways that can lead to cancer development — we are trying to play the game like we do,” said Clarke.
Recognizing Patterns in Enormous Data Sets
Empirical research demands vast amounts of data to produce valid, evidence-based conclusions. These data are often mined for specific findings and then left unused. Sometimes they are rediscovered later by another researcher who took several years to collect the same data for a different hypothesis. This inefficiency has created the need for a tool that can store information so more time is spent solving problems instead of understanding them. This evokes the idea of data mining, a technique that focuses on searching previously collected data for specific patterns.
“The sorts of data that we are generating in molecular medicine are huge data sets,” said Clarke. “There are tens of thousands of measurements on a single sample; a single specimen from a cancer patient. When you have a large study with hundreds of these and you have tens of thousands of measurements in each of those hundreds or thousands of specimens, you can get an idea of how big a single data set can be from one study.”
CaBIG will enable researchers to analyze and build upon existing data, rather than constantly looking for newer information.
“The answers to our questions are probably there in the data,” said Clarke, “but the issue is whether we can get to them using these complex tools and, also, how we will know they are right when we see them.”
Moore said the continuous movement forward is a critical aspect of systems-based medicine, the new prevailing health care vision at the Medical Center.
“Systems medicine not only looks at one area, it looks at the entire system of the patient. The model itself perpetuates the advancement of information,” he said.
The daunting range of variables that distinguishes one patient from another makes the old method of determining illness and treatment a guessing game compared to the highly personalized objectives of caBIG.
“Previously we’d span an entire population and conclude that, if you as a patient behaved as the mean of that population, this is your outcome. But how many of us conform to the mean?” said Clarke.
It therefore becomes necessary to accurately interpret this mass of information and formulate a hypothesis that could potentially change years of dogma. Among the most essential skills, Clarke said, is the ability to recognize patterns. The objective is to find a pattern exclusive to each patient, allowing physicians to determine the ideal treatment or produce a more accurate patient outcome.
Even with all the data housed in one organized database, finding these patterns is no simple feat and therefore nonconventional skill sets might become necessary. While the task of isolating patterns in cancer research has traditionally fallen to biostatisticians who have experience working with large epidemiologic studies, Clarke said other engineers, computer scientists and other such trained professionals will play an important role.
“Georgetown doesn’t have an engineering school,” said Clarke. “So we work with the engineering school at Virginia Tech. Virginia Tech doesn’t have a medical school, so the collaboration is a genuinely effective exchange of capabilities.”
Georgetown’s resources were expanded last year with the hire of Subha Madhavan, MS, PhD, who was instrumental in building the data integration platforms for caBIG at the NCI. Lombardi Director Louis M. Weiner, MD, recruited Madhavan from the NCI as Lombardi’s first director of clinical research informatics. Besides continuing her work on caBIG at Georgetown, Madhavan is helping to transfer the lessons and best practices from caBIG and other large-scale informatics projects to the creation of the Georgetown Database of Cancer, or G-DOC.
By Christine Cascella Reider and Yuse Lajiminmuhip, excerpted from the Spring/Summer 2009 issue of Georgetown Medicine Magazine.

