Imagine going through a warehouse stacked with rows and rows of videotapes and having to index each one.
This is something that not too many organizations have the time or personnel to handle. But as broadband becomes less of an issue, more corporations will be turning to multimedia for a variety of applications – such as distance learning over the Web. And that means companies will need to store, index and easily retrieve video and audio information.
“Eventually, it’s going to be of increased importance as more content is stored in those kinds of things,” said Kathleen Hall, an associate analyst at Giga Information Group in Cambridge, Mass. “You can just imagine the kind of power there would be in being able to search the full text of a CNN video,” Hall said.
But computers are still a long way from accurately transcribing video and audio recordings. Speaker-independent speech recognition technology is not that accurate yet, said Dr. Dragutin Petkovic, the manager of visual media management at IBM’s Almaden Research Centre in San Jose, Calif. Computer vision also has a long way to go before computers are able to recognize objects on screens, he said.
So Petkovic’s group decided instead to focus on colours when they created QBIC (Query by Image Content), which Russia’s State Hermitage Museum uses on its Web site to let users look for images similar to ones they like.
Although computers can’t recognize objects, they can recognize a picture’s colour make up. IBM also uses colour to create storyboards for videos using Cue Video. Any time the computer detects a change in scene, it creates a key frame. End users can click on a frame and watch that portion of the video. This will make it easier for companies to use multimedia tools for distance learning.
“We are making video much more interactive and browsable,” Petkovic said. “In other words, we are creating something like a knowledge database or a training course by digesting and indexing videos and related materials automatically. And that would make video [manipulation] much more feasible…on the Web.”
Cue Video breaks each video frame into three by three squares and then takes a colour signature of each square by counting which colours appear. Then it compares one frame to another.
“The problem with that is you will miss some gradual changes to the video,” Petkovic said. To counter this, IBM created algorithms that can detect fades and dissolves. Using IBM’s speech recognition technology, Via Voice, Cue Video also creates an index of keywords.
“To get a full transcript is still very hard because of the slight errors that speech recognition makes. But you can extract a lot of keywords,” Dragutin said.
Viewers can listen to everything in a training video on a given topic by doing a keyword search. They can also click on a PowerPoint slide and view the part of the video in which the slide is used.
Dragon Systems Inc. in Newton, Mass., is trying to tackle a similar type of problem with Audiomining which is designed for recording and browsing conversations in call centres.
“The problem that happens with this kind of data is that it becomes impossible to find your way around it. When you have a large database of phone calls that you’ve recorded, you get a huge database that becomes quickly useless,” said David Wald, a senior technology advisor at Dragon.
Phone calls can be recorded, indexed and then mined for information, Wald said. “If you have the indexes for all the calls, then you have a large database which you can start gathering statistics on. You can see which percentage mentioned a certain problem, what sorts of words come together a lot,” he said. “That’s where you get more into conventional data mining but on a kind of data that you couldn’t do data mining before.”