Collections of data

The explosion of e-business and the massive amount of data it created has made data management and organization more important than ever. We often hear the terms database, data warehouse and data mart, but the differences among them aren’t always clear. Some experts say that the difference between, say, a data mart and a data warehouse is more conceptual than real. Nonetheless, here are some general rules of thumb to sort out these terms.

In the Beginning . . .

A datum is a raw piece of information that’s capable of being moved and stored. In the broadest sense, a database is a collection or aggregation of such data, along with information on how pieces of data relate to one another.

A database is typically organized into records one record per item, such as an order that are themselves divided into several fields, with each field containing information about a specific aspect or attribute of the item. For an order, these could include customer data, part numbers, prices and discounts.

In theory, a database doesn’t even require a computer, but it certainly makes its use a lot more scalable and efficient, says Mike Schiff, an analyst at Current Analysis Inc. in Sterling, Va. A pocket address book is certainly a database, but searching contact entries by city or industry requires flipping through each page.

Database management systems, such as those from Microsoft Corp., Oracle Corp. or IBM Corp., act as the underlying vault and retrieval technology.

In addition to storing data, a database management system handles security and access control, says Schiff. Business intelligence tools then access this data for analysis. However, databases rarely exist just to run analytical operations; in general, they’re vital to running a business.

Database management systems can be organized in different ways. A relational database stores information in tables and then joins or combines those tables across common fields. A hierarchical database stores data in a tree structure; an order record might have every line item underneath it. An object-oriented database encapsulates both data and business logic.

Wholesale, Retail, Slice and Dice

Data warehouses and data marts are very similar technologies, say experts, but they usually service different types of clients. For instance, a warehouse typically contains a massive amount of data from across an enterprise, says John Kopcke, chief technology officer at Hyperion Solutions Corp., a maker of analytical software in Sunnyvale, Calif.

Data marts tend to be smaller and dedicated to a single division or line of business. Data warehouses are “similar to a real food warehouse, storing massive amounts of food and then distributing subsets of food to grocery stores [the marts] for people to access [or] purchase,” says Kopcke.

A data mart can run in size from megabytes to gigabytes, says Tho Nguyen, director of data warehousing strategy at SAS Institute Inc. in Cary, N.C., whereas data warehouses usually run from gigabytes to terabytes.

Consider a data mart that supports a firm’s cellophane-tape division. It might contain relevant facts about making cellophane tape suppliers, deliveries, rates, quality control information says Schiff.

However, the uncontrolled proliferation of such data marts can become an IT nightmare unless each data mart uses standard naming and cataloguing schemes and compatible data types. The last thing you want are data marts that can’t talk to one another.

Users tend to assemble a warehouse from different pieces of technology, then customize it to meet their needs, rather than just put it together out of the box. Schiff notes that warehouses are often built using relational databases, because the relational model can more efficiently store and organize the huge amounts of information that make up a high-volume, multipurpose data warehouse. However, getting data from many large relational tables can require massive amounts of processing and storage.

For that kind of slice-and-dice analysis, data marts use multidimensional databases geared for quick responses with multiple elements. Often-selected data from a data mart is fed into a smaller database called a data cube for intensive processing.

Related Terms

Data mining is a process that finds relationships and patterns in data.

Data visualization is the graphical representation of a data collection, often in an interactive form. Modern data visualization tools present data as charts, graphs or maps and let users sort, divide and combine groups of data to help discover patterns and illustrate what they’ve discovered.

Metadata is information that describes the contents of a database. Metadata tells users when a piece of data was last updated, its format and intended uses. For example, that information can help users understand the meaning and context of financial data, customer records and business transactions.

Online analytical processing (OLAP) describes a class of tools that can extract multidimensional data and present it from many different points of view. Designed for managers looking to make sense of their information, OLAP structures data hierarchically. Common OLAP functions include trend analysis, drilling down to more complex levels of detail, summarization of data and data rotation for comparative viewing in other words, slicing and dicing.

Repository can be another name for a database, but it can also refer to the aggregation of data into some accessible storage location, without necessarily having the ability to manipulate or extract data.