The brain is the most complex organ in the human body — and the least understood. Disorders of the brain or central nervous system (CNS) can be broadly divided into two camps: neuropsychiatric and neurodegenerative. Neuropsychiatric maladies cover depression, schizophrenia, bipolar disorder, and myriad anxiety disorders, while neurodegenerative diseases include Lou Gehrig’s, Alzheimer’s, Parkinson’s, and Huntington’s diseases. Strokes, brain tumors, head traumas, migraines, and spinal cord injuries also fall within the CNS realm.
Brain disorders represent a terrible drain on society. In the United States, they afflict tens of millions of people at an estimated cost of US$600 billion a year. Scores of startups and large pharmaceutical companies are investing in functional genomics, drug development, devices, and surgical techniques that could treat these disorders. In the past year alone, venture capitalists invested $240 million in CNS research. Brain disorders are especially vexing because unlike many other diseases, such as cancer, their incidence is not dropping, and the disorders render people profoundly and progressively disabled for the rest of their lives.
Clinical neuroscience aims to improve our understanding of the nosology, etiology, and pathophysiology of brain disorders in order to provide better methods of diagnosis, prognosis, and therapy. This would result in more accurate and less invasive procedures, reduced morbidity in disease treatment, reduced costs of care, and ultimately an enhanced quality of life. Improved patient care can only arise from better clinical research, which in turn requires more efficient hypothesis formulation and evaluation, and the integration of information from the level of the gene to the level of behavior.
At each of these diverse levels, there has been an explosion of information in the past few decades. The range of data acquisition devices now available in clinical neuroscience is simply staggering. These include conventional clinical procedures such as lab tests and neuropsychological exams; structural imaging techniques such as magnetic resonance imaging (MRI) and angiography, X-ray computed tomography, and electronic microscopy; functional and metabolic imaging methods such as positron emission tomography, magnetic resonance spectroscopy, functional MRI, and optical imaging; and high-throughput genomic techniques such as DNA microarrays.
The sheer complexity of operating these devices and interpreting the resulting information is inevitably leading to a concomitant specialization of neuroscientists, clinicians, and bioinformatics researchers. Meanwhile, clinical neuroscience studies are spawning vast heterogeneous databases organized in different — and often incompatible — ways. As a result, it is becoming increasingly difficult for anyone to maintain an integrated sense of clinical neuroscience and relate his or her narrow findings to this whole cloth.
Although the explosion of information is daunting, solutions to this problem can be found from enterprise information technologies. For decades, clinical and research communities have focused on developing informatics tools for processing and analyzing discipline-specific data. These databases are restricted to providing a vertical view of biomedical data associated with a specific project, laboratory, or department. A more powerful horizontal — or enterprise-wide — view cuts across such boundaries. An interesting example of this horizontal approach in bio-IT infrastructure is deCODE Genetics, the Icelandic genomics company recently granted a license to create and manage a database of the entire nation’s medical and genetic records. The time is ripe to apply this enterprise-wide view to create integrated IT solutions for clinical neuroscience.
There are two major motivations for merging enterprise solutions into clinical neuroscience. The first is the need to scale up the capacity for data management, the rate of scientific discovery, and the complexity of computational modeling. The second is the economic benefits of data sharing, software reuse, and infrastructure build-out while reducing costs.
The problems facing the clinical neuroscience community are similar to those that confronted health science and medical imaging communities decades ago in scaling up their data management, operations, and productivity. The time and money spent in collecting, reformatting, verifying, organizing, and archiving data for every new research study or clinical endeavor would be better spent on experimental hypothesis testing. Despite the complex and diverse data types and applications in clinical neuroscience, its infrastructure development can still benefit from decades of enterprise integration experience in the clinical and imaging IT communities, especially electronic medical record (EMR) systems and picture archiving and communication systems (PACS).
These are the major infrastructure challenges facing the neuro-IT community:
Data Acquisition: The array of data-generation devices today ranges from the level of gene and protein, to organ and subject. Fortunately for neuroimaging and clinical information systems, the adoption of interface standards such as DICOM (Digital Imaging and Communications in Medicine) and HL7 (Health Level 7) should facilitate integration.
Data Archive: The growth of biomedical database repositories has paralleled advances in data acquisition capabilities.
A genomic informatics system typically reaches about 1 terabyte (TB), whereas proteomic data sets reach 50 to 100TB. A mid-size, 600-bed hospital seeking to capture all diagnostic, therapeutic guidance, and pathologic images would require 4 to 5TB of digital data sets yearly. A medical center or a clinical neuroscience institute will likely need data storage in the petabyte range.
The hierarchical storage system is commonly used in PACS and EMR systems to stratify data to different types of storage systems based on price, performance, and frequency of access. On the other hand, life science communities have been taking relatively new technologies such as SANs (storage area networks) and scaling them up to extremely large sizes. For instance, Celera Genomics Group has 120TB of data stored on SANs, all set up using off-the-shelf commercial products.
Database Management Systems (DBMS): It is not practical for any single database to capture the diversity of data needed by the clinical neuroscience community. Examples include file-based systems (EMBL and GenBank), message-based systems (PACS), and relational systems (hospital information systems (HIS), microarray databases). Many commercial and research middleware systems use the mediator, data-wrapper, or natural language approach to query a variety of databases and data sources and retrieve one result. These approaches, however, have produced mixed results when applied to the related health science domain. The optimal approach for integrating diverse neuroscience-related biomedical databases into one logical structure that allows efficient search and retrieval remains to be identified.
User Interface: Most bioinformatics tools and applications are developed for scientists who generally tolerate cryptic or buggy user interfaces. But the diverse backgrounds of users involved in the clinical neuroscience enterprise are less forgiving, making development of an effective user interface to deliver information in a uniform and intuitive manner essential. Decades of user interface work have been performed in soft-copy image review and medical record access, but the diversity and complexity of clinical neuroscience data pose an altogether different scale of visualization and user-interface challenge to the bio-IT community.
Interoperability: The sheer diversity and scale of contemporary clinical neuroscience research present enormous challenges for data sharing. There are two different kinds of interoperability challenges: the ability to share data among databases containing the same level of biological data and organization, and the ability to aggregate data from databases describing different levels of biological data and organization. Several XML (extensible markup language) schemas have been proposed to address the first kind of interoperability, but little attention has been paid in solving the cross-level interoperability problem.
Architecture: Innovations in software architecture are required to integrate different devices and systems in the enterprise seamlessly into one coherent system. The neuro-IT infrastructure will require a scalable, modular, and open architectural framework as in other enterprise systems. Three major architectural frameworks can be found in clinical enterprises: centralized data storage, such as HIS and PACS; mediator or broker middleware such as multitier electronic medical records; and data warehouses, which extract data from operational databases into a separate database for research analysis and data mining. Whether any one or even a combination of these, or a brand new architecture, will serve the right framework for neuro-IT infrastructure has yet to be determined.
Quality Assurance: Many biomedical databases contain a certain amount of inaccurate or misrepresented data because of poor annotation and quality control. Medical record and drug trial databases, governed by federal regulations and privacy concerns, generally maintain better data quality but still need improvement. The clinical neuroscience enterprise needs to develop a comprehensive and reliable quality assurance process that can be integrated into its information infrastructure.
Standards: Common data and communication standards accelerate the construction of information infrastructure. The adoption of DICOM, HL7, TCP/IP, and relational databases accelerate the formation of PACS and EMR as enterprise solutions for health-care institutions. The establishment of SRS (sequence retrieval system) expedites the formation of public genomic databases. Current efforts to standardize data exchange methods, database schemas, and acquisition protocols often compete with each other. For example, GEML (Gene Expression Markup Language) and MAML (Microarray Markup Language) compete as standards for gene expression data sharing. As the field progresses, we will see a shakeout and consolidation of competing proposals, moving the neuro-IT infrastructure development forward.
Legal/Privacy: Information security and privacy is another area where clinical neuroscience communities have the most demanding requirements. Many pharmaceutical companies still transport floppy or optical disks by courier rather than trust the Internet. With the exception of drug and clinical trial projects, technological and sociological safeguards are generally absent in the laboratories, potentially exposing the data to misinterpretation, misuse, and misappropriation. As studies move from bench to bedside, this culture must change. In addition, federal regulations, such as HIPAA (Health Insurance Portability and Accountability Act), further complicate the development and sharing of patient data. Indeed, controversies over exposing a person’s credit card information pale next to the issues raised by exposing someone’s potential genetic predisposition to disease and other medical conditions. The impact of such privacy and regulatory constraints on the neuro-IT infrastructure development must be addressed carefully.
The field should adopt an enterprise-wide perspective and learn from the experience of the lateral application domains, namely, health science and medical imaging communities. The motivation for developing the neuro-IT infrastructure is to scale up data management, operations, discovery, and productivity. The neuroscience and bioinformatics communities have been extremely productive in making tools and applications in individual disciplines during the past few decades. We now need a horizontal view to build the infrastructure that can scale up the operations and discoveries to better combat brain diseases and improve the quality of patient care.
Stephen T.C. Wong is an assistant professor in the departments of radiology, neurology, and bioengineering at the University of California at San Francisco.