IBM creates multimedia data mining solutions

Imagine going through a warehouse stacked with rows and rows of videotapes and having to index each one.

This is something that not too many organizations have the time or personnel to handle. But as broadband becomes less of an issue, more corporations will be turning to multimedia for a variety of applications – such as distance learning over the Web. And that means companies will need to store, index and easily retrieve video and audio information.

“Eventually, it’s going to be of increased importance as more content is stored in those kinds of things,” said Kathleen Hall, an associate analyst at Giga Information Group in Cambridge, Mass. “You can just imagine the kind of power there would be in being able to search the full text of a CNN video,” Hall said.

But computers are still a long way from accurately transcribing video and audio recordings. Speaker-independent speech recognition technology is not that accurate yet, said Dr. Dragutin Petkovic, the manager of visual media management at IBM’s Almaden Research Centre in San Jose, Calif. Computer vision also has a long way to go before computers are able to recognize objects on screens, he said.

So Petkovic’s group decided instead to focus on colours when they created QBIC (Query by Image Content), which Russia’s State Hermitage Museum uses on its Web site to let users look for images similar to ones they like.

Although computers can’t recognize objects, they can recognize a picture’s colour make up. IBM also uses colour to create storyboards for videos using Cue Video. Any time the computer detects a change in scene, it creates a key frame. End users can click on a frame and watch that portion of the video. This will make it easier for companies to use multimedia tools for distance learning.

“We are making video much more interactive and browsable,” Petkovic said. “In other words, we are creating something like a knowledge database or a training course by digesting and indexing videos and related materials automatically. And that would make video [manipulation] much more feasible…on the Web.”

Cue Video breaks each video frame into three by three squares and then takes a colour signature of each square by counting which colours appear. Then it compares one frame to another.

“The problem with that is you will miss some gradual changes to the video,” Petkovic said. To counter this, IBM created algorithms that can detect fades and dissolves. Using IBM’s speech recognition technology, Via Voice, Cue Video also creates an index of keywords.

“To get a full transcript is still very hard because of the slight errors that speech recognition makes. But you can extract a lot of keywords,” Dragutin said.

Viewers can listen to everything in a training video on a given topic by doing a keyword search. They can also click on a PowerPoint slide and view the part of the video in which the slide is used.

Dragon Systems Inc. in Newton, Mass., is trying to tackle a similar type of problem with Audiomining which is designed for recording and browsing conversations in call centres.

“The problem that happens with this kind of data is that it becomes impossible to find your way around it. When you have a large database of phone calls that you’ve recorded, you get a huge database that becomes quickly useless,” said David Wald, a senior technology advisor at Dragon.

Phone calls can be recorded, indexed and then mined for information, Wald said. “If you have the indexes for all the calls, then you have a large database which you can start gathering statistics on. You can see which percentage mentioned a certain problem, what sorts of words come together a lot,” he said. “That’s where you get more into conventional data mining but on a kind of data that you couldn’t do data mining before.”

disparite information

But as the number of databases grow and companies start gathering different types of data, information can become very fragmented. IBM’s Garlic project is designed to let users make new connections by bringing together data from disparate sources.

Garlic is middleware that can talk to several different data sources. Users connect to Garlic using a SQL interface. Under the covers, Garlic connects to any data source using wrapper technology.

“The focus really is on making it much easier for people to make connections than it is possible today. They’re doing that by letting them say everything they are looking for in a single query, and then having the software figure out where the data is located, how to best get to that data, and all those sort of detailed, logistical questions,” said Laura Haas, the heterogeneous information systems group manager at IBM’s Almaden Research Centre in San Jose, Calif.

Scientists are using Garlic to bring together information from test results, data about chemical compounds and information on different illnesses. But Garlic will have business applications as well, Haas said.

Garlic can be used in call centres, for example, to give sales people information on a caller. That data is often stored in different systems throughout the company, Haas said. News organizations can use it to pull information on any topic from a number of sources.

“It’s the puppeteer in some sense because Garlic pulls the strings and makes the marionette dance. And the different pieces of it – the arms and the legs, so on – are the different data sources, and Garlic is the control point. It’s the thing that says which of them need to move now to get the information that it needs back to the end user,” she said.