To streamline drug discovery and development in the face of mind-numbing amounts of data, the need for integration and interoperability among applications, databases, and hardware is greater than ever before, according to an array of speakers at the BioSilico 2002 conference, which opened Tuesday in new York.
At the conference vendors outlined problems facing IT in the life sciences, and touted new products aimed at solving them.
One of the most basic problems in the life sciences today is that both expertise and data exist in silos, according to panelists here. Researchers need to be familiar with different disciplines related to biology and chemistry, share their research, and have access to more robust tools that integrate disparate pools of data, they said.
“Genome sequencing is complete – so what?” said Paul Caron, head of informatics for Vertex Pharmaceuticals Inc. in Cambridge, Mass. Sequencing the genome has actually added to the complexity of life sciences research, according to Caron. “We have a longer list of potential (drug) targets, and we already had a lot.”
Though the recent sequencing of the human genome has fueled general interest in life sciences and funding for startups, according to speakers and industry insiders here, the growing complexity facing life science researchers, in particular those involved in drug discovery, is daunting.
“Complexity is something we all struggle with,” said Richard Klausner, a keynote speaker, former director of the U.S. National Cancer Institute and now senior fellow and special advisor to President George Bush for counter-terrorism.
“Part of the problem is that disease develops in the context of an individual, with extraordinary filters of common genetic variations,” Klausner said. “We’re just beginning to develop ways to study the ways these filters modify … the disease process.”
Industry insiders said that life science requirements would be the driving force for IT development in the years ahead.
“Physics drove computing for the last 20 years; biology will take the lead for the next 20 years,” said Bill Blake, a BioSilico speaker and vice-president of the high performance technical computing group at Compaq Computer Corp. in Houston, Texas.
One of the problems that IT has had up to now, lack of integration among applications and hardware platforms, is that it mirrors some of the issues in pharmaceutical and biotech companies – the fact that expertise often exists in isolated groups, according to speakers.
Pharmaceutical companies are structured in much the same way that auto makers were 20 or 30 years ago, according to Srini Chari, senior manager for solution architecture and strategy in the Life Sciences Solutions unit at IBM Corp. of Armonk, N.Y.
While automakers used to have, for example, separate design, manufacturing and marketing units, they are now organized around the concept of concurrent engineering, where a team takes a car from concept to test to marketing.
“Pharma is not yet at that stage,” Chari said.
However, streamlining drug development, at least up to the clinical stage, is of utmost importance, considering the vast amount of resources that are being spent, he said. Pharmaceutical companies may spend 15 per cent to 30 per cent of revenue on research and development, compared to six per cent or seven per cent of revenue at IT vendors, he noted.
In the face of enormous data sets, the ability to find patterns that lead to meaningful discoveries in emerging areas of expertise is difficult.
“Molecular profiling is important but in its infancy,” said Klausner. One of the main problems is “the ability to discern meaningful patterns when the dimensionality of data is enormously high.”
Making this even more difficult is that data often resides in disparate and geographically disperse applications and databases.
“Application and data integration is a crucial need,” IBM’s Chari said.
One of IBM’s main offerings to life science researchers is the DiscoveryLink program, which eliminates the need for SQL (Structured Query Language) ability on the part of users, according to Chari. Users can pose complex, natural-language questions. The software breaks the query down into SQL commands, and runs it against multiple databases on heterogenous computing systems, he said.
To permit complex queries across different data sets, applications must incorporate the technology specifications and protocols that are becoming standard as IT vendors seek to make diverse applications and hardware interoperate across the Web, noted Chari.
These technology standards include XML (Extensible Markup Language), SOAP (Simple Object Access Protocol), UDDI (Universal Description, Discovery, and Integration) as well as Sun Microsystems Inc.’s Java programming tools, according to Chari and other speakers.
Lion Bioscience AG, based in Heidelberg, Germany, for example, has incorporated XML into the latest versions of its software, according to Manuel Glynias, senior vice-president of strategic planning at the company. The software includes the SRS (Sequence Retrieval System) database query program and the DiscoveryCenter data integration and workgroup software it recently acquired along with the recent purchase of Netgenics Inc. XML enhances the software’s ability to integrate with a variety of data sources, particularly over the Web, he noted.
DiscoveryCenter can be used to share information across an enterprise, by allowing users to share worksheets listing areas of interest, thus allowing for example, the chance that researchers in different parts of a company might end up performing the same analysis or running the same tests.
The latest version of PathwayPrism, announced at BioSilico Tuesday, also offers features that allow for data sharing among users, according to a statement from the company, based in Princeton, New Jersey. PathwayPrism 2.0, like earlier versions, is designed to allow users to model, simulate and annotate biochemical pathways and molecular signaling networks. The upgrade adds the ability for users, through user-access controls, to share information over an intranet and merge pathways they have created. The software also now includes an interface to MatLab, from The Mathworks Inc., which provides users with analytical and graphical tools for data analysis, and algorithm and application development. Pricing for Pathways Prism was not immediately available.
Gene Logic Inc., of Gaithersburg, Maryland, is approaching the problem of data integration in two ways. It is building up its own reference library, which helps researchers discriminate between normal and abnormal gene expression, and working with third-party databases and programs, according to Jeffrey Cossman, the company’s vice president and medical director.
Gene Logic works with hospitals around the world to collect tissue samples, the data from which are incorporated into its GeneExpress multimodule reference library. But it has also, for example, obtained an exclusive license to BioCarta Inc.’s collection of signal transduction pathways, in order to let researchers visualize protein interactions linked to data in GeneExpress. A new version of BioCarta is due out in April, according to Cossman.
In addition to database and application interoperability, there is a growing demand for computing power and the ability to harness computing clusters and grid networks to solve massive data analysis problems. In grid networks, multiple computers, often geographically dispersed, work in parallel on a computation problem, either by taking spare CPU cycles from workstations or using networked supercomputers.
Managing all this hardware can be tricky.
“At Celera (Genomics Group) they had US$60 million of computing power …and researchers were still complaining that wasn’t enough,” noted Yury Rozenman, chief scientific officer for Platform Computing Inc. of Markham, Ont. “In reality what really was needed was to utilize resources correctly.”
Platform on Tuesday announced the beta version of Platform Globus, developed by the Globus Project, a research organization,. Platform however, has commercialized the Globus tool set, which offers features to fine-tune management of geographically dispersed and local grid computers. The suite includes database support tools and integration with J2EE (Java 2 Enterprise Edition.)
Despite the growing number of computing tools available for drug discovery and development, IT is at the early stages of helping life science researchers integrate the various areas of expertise and technology resources to solve increasingly complex problems.
“Are we at a transition point in biomedical research? Often I think we are,” said keynote speaker Klausner. “But … the transition from science into science-based medical exploration may be overstated … It’s a greater challenge than what we often talk about.”