British naturalist Charles Darwin is credited with the theory of evolution, but a crucial part of his theory came from his mentor, a lesser known but equally bright Cambridge University professor.
Darwin’s teacher, John Stevens Henslow, may have been relegated to an eternity of obscurity were it not for database analytics. For more than two years, a team of researchers worked to uncover a secret hidden in Henslow’s vast, brittle collection of plants — called an herbarium — from more than 160 years ago.
Using the original collection at the University of Cambridge in England, researchers transferred the data into an early beta version of Microsoft Corp.’s SQL Server 2005, said Mark Whitehorn, a database expert who worked on the project. He said the project illustrates SQL Server’s cost efficiency, and Microsoft has been actively promoting the case study.
Microsoft was unaware a beta version of SQL Server 2005 was being used for the project until it was under way, Whitehorn said.
Henslow’s herbarium consists of 3,500 sheets with more than 10,000 dried plant samples he and others collected. At the time, Henslow carefully documented the samples’ origin, quantity, date collected and species, among other data.
Once in the database, scientists with expertise in other areas outside of computer science could pursue inquiries. The team included Professor David Kohn, a Darwin scholar at Drew University in New Jersey; Professor John Parker, a botanist at Cambridge; and Gina Murrell from Cambridge’s herbarium.
Using database analytics, researchers graphed different sets of data and rearranged them until patterns emerged, Whitehorn said.
Researchers were able to draw connections from the plants samples, such as when two collectors may have been traveling with each other at the same time. Plant species could be plotted against time to highlight collection trends from certain years.
Manually extracting the statistics would have been possible, but arduous. Professional scientists have spent their lives “fighting data,” Whitehorn said. “What we were able to do was just remove the barriers.”
A striking conclusion became evident. The concept of variation — meaning differences within members of a species necessary for survival as a whole — was observed first by Henslow, Whitehorn said.
“What is now very clear to us is that Henslow started studying this variation quite systematically in about the 1820s,” Whitehorn said. “He actually trained Darwin to observe variations between the species.”
Microsoft is hosting an event Wednesday in Cambridge showing how the researchers performed the experiment. Whitehorn said the team will soon publish a paper with more detailed data and analysis of the work.
Henslow aided Darwin in gaining a berth on the HMS Beagle for his 1831 trip to the Galapagos Islands. The trip was influential in Darwin’s writing of “On the Origin of Species,” his 1859 work detailing evolutionary theory.
Why use Microsoft’s SQL Server 2005? Whitehorn, a self-described “database geek,” said the analytics capabilities of the product have improved dramatically.
Microsoft’s analytical engine allowed for a transparent, organized walk through the data, Whitehorn said. The researchers used a product from ProClarity Corp., which Microsoft on Monday agreed to acquire, as a querying tool.
Databases from Oracle Corp. and IBM Corp. could have made the same discovery, but at a higher cost, Whitehorn said.
Microsoft, which had held solid ground with SQL Server 2000 among smaller businesses, thrust into the enterprise market last fall with upgraded capabilities with the release of SQL Server 2005.