Search tools look for context

Attempting to improve the accuracy of corporate information retrieval technology, several vendors are incorporating techniques designed to identify people, proper names, and places buried within text and analyze relationships between those assets.

To that end, unstructured data management vendor Recommind Inc. on Friday rolled MindServer 2.1, which adds entity extraction capabilities to its concept-based search and classification system.

Referred to as both entity- and fact-based extraction, the technology complements search engines, content management systems, and portals by helping improve relevancy of results, according to analyst Laura Ramos, director of research at Giga Information Group, in Cambridge, Mass.

“The key thing here is this ability to pull specific words and phrases out of documents and automate figuring out what they mean,” Ramos said.

“With keyword search you are looking for individual words, but individual words can mean different things in a context. For example, [a search for] ‘President Bush lives at the White House’ means something different than ‘white house paint.'”

The ability to pull out names, places, and dates via entity and fact extraction is becoming critical because it helps disambiguate information retrieval and make it more relevant, Ramos added.

MindServer 2.1 delivers accuracy and analysis capabilities by combining the ability to identify people, product names, and places with retrieval and categorization, according to Bob Tennant, CEO of Recommind, in Berkeley, Calif.

“What it allows is analysis of the textual data; cutting [the data] not just by subject matter but by other elements that can be identified, [such as] who individuals are, what the products are,” Tennant said. “This lets organizations take cuts of this data and make it more usable.”

Another unstructured data management player, Inxight Software, is ramping up fact extraction capabilities in its SmartDiscovery information retrieval product, which combines natural-language processing, linguistics, and classification technologies.

Inxight’s technology has gained strong traction within government agencies where it is often used to further counter-terrorism and intelligence efforts. The Sunnyvale, Calif.-based company this month signed 10 contracts worth US$3 million dollars with the U.S. Department of Defense.

Later this year Inxight plans to roll out an updated offering that adds to its natural-language processing platform the fruits of its acquisition last August of the technical assets of WhizBang Labs, which developed technology designed to extract facts and relationships from unstructured data.

The new technology “goes beyond identifying things to establish the relationships between things,” according to David Spenhoff, vice-president of marketing at Inxight. “Extracting facts and relationships provides a higher level of understanding beyond what search engines can provide.”