Australia manages the Antarctic data flow

Storing and maintaining the mountains of scientific data Australia’s Antarctic researchers generate is not a job for the faint-hearted. But keeping tabs on each and every piece of information while making it accessible to the public is the challenge the Australian Antarctic Division (AAD) Data Centre’s database management team has undertaken.

The aim of the data center division is to fulfill the AAD’s obligation to the Australian Antarctic Treaty, which specifies that all scientific information collected from the region be made publicly available.

Underpinning this need is an online central repository database system for all its collective information. Based on Oracle Corp.’s database suite, the repository sits on a Sun Microsystems Inc. server, running on Solaris 8. In all, the database holds about 0.5TB of data, all stored in-house.

AAD data center manager Lee Belbin said that, when the central database system was being set up some years ago, an Oracle database was selected on three key factors: stability, usability and interoperability. At that time, it was the only product on the market with an effective Web-based design, he said. At that stage, according to AAD applications programmer/database management Kirk Mower, users had to connect to each AAD database separately via a Telnet connection and text-based interface.

The team’s decision to base selection on stability and interoperability has borne fruit and there are now some 30 databases covering a plethora of scientific and geographic subjects, including flora and fauna species, marine science, biodiversity, weather and geographic information systems. While most of the data is from Australian researchers or expeditions, a small proportion comes from international sources.

With the central database, almost all of the information is accessible via the Web, Mower said.

“This means that remote users can query data, as well as international users.”

Belbin said this was particularly important in providing real-time, 24/7 access to AAD’s international users.

In addition, other top international data centers use Oracle, he said, and being able to communicate and collaborate with them was a key reason for choosing the proprietary database over freeware solutions.

The data center’s database strategy has four parts: metadata, data, linkages and analysis, Belbin said.

The first, metadata, refers to the raw data submitted by Antarctic expeditioners and scientists, also known by the data center as “datasets”. Initially metadata is recorded in Microsoft Excel or Access format and posted online as independent pieces of information. The collective metadata however, is catalogued much like books are in a library.

By posting this raw data online, the team provides quicker access for users to information, Belbin said.

The data center team then performs regular reviews of these datasets to see whether some of it is “linkable”.

“If we have 10 or more datasets dealing with something similar, we will pull them into one database,” Belbin said.

The team has also developed linkages to cross reference and query data across all its databases using mostly ColdFusion Web application language.

Belbin said more recent technological innovations to the center’s central database system have included moving its entire front-end database requirements on to a ColdFusion platform. The back-end of its database system remains on Oracle.

One of the key reasons for the transition across to ColdFusion was its quicker development time, Belbin said.

“It (ColdFusion) is faster than PLSQL, which is what Oracle uses, and is faster and more robust than Java,” he said.

“Java still has problems – it’s a bit flaky. We have used Tom Cat on the Java/Web component, but it falls over regularly, so (now) we mainly only use it for the graphics component.”

The newer version of ColdFusion MX also now includes Java support, allowing the team to write applications using the ColdFusion programming language but have them translated into Java, he said.

But while the database system appears well established, Belbin said his team continues to try and improve its usability and usefulness. This has prompted the data center’s latest major initiative: data mining.

Using a statistical software solution such as S+, Statistica or Interactive Data Language (IDL) alongside its database computing power, AAD’s data center has been able to research and release several recent papers dealing with data not previously linked together or compared.

To cite an example, Belbin said catchment data recorded from whaling expeditions undertaken in the previous century has recently been used to gauge the amount of change occurring across ice sheets off the coast of Antarctica. By being able to compare where whales were caught 100 years ago to whale migration patterns today, as well as current geographical information on the area, the researcher could make comparative studies on the amount of climactic change in Antarctica.

“Those fisherman recording their catchment information would have had no idea that they could contribute to studying climactic change,” he said.

Belbin said the next step for the data center would be to publish some of its analysis on its Web site.

In addition, the team is currently working on both a new Geographic Information System (GIS) and field trip database system. According to Mower, the field trip database will allow expeditioners to post information about the condition of the various base stations while onsite. This will include data on repairs or clean-up requirements, and even an inventory on “toilet paper” stock.

“Then we can track the human footprint in Antarctica,” he said.