Easy Content Searching with EasyAsk.
We all speak in natural language, and Internet users have come to expect natural language capabilities when searching for information. The search engines that purport to search the entire Web (although studies show that most can't possibly reach the "entire" Web) have some difficulties placing search words in context. There is no frame of reference, no specified area of interest.
The human brain supplies context to spoken words. Mention "pants" in a discussion of clothing and the human brain automatically supplies synonyms such as trousers, jeans, or bell-bottoms, depending upon verbal context and, if this is a face-to-face conversation, what the person is actually wearing. However, suppose the word "pants" is uttered in the context of dog behavior. Suddenly trousers is no longer an appropriate substitution. Instead, an image of a dog breathing heavily with its tongue hanging out comes to mind. (The dog is not wearing trousers).
SIMULATING NATURAL LANGUAGE
Information scientists have been experimenting with retrieval programs that simulate natural language queries for decades. Originally, the goal was to replace, or at least enhance, the command languages then in vogue to access flat ASCII files (and still in existence on many subscription-based search services). One of the first commercial varieties was designed for the legal literature. Introduced over a decade ago, West Group's Westlaw Is Natural, abbreviated to WIN, was quickly followed by other attempts to "naturalize" command-based search languages, such as Dialog's TARGET. Within corporations, relational databases presented similar search problems. Although more flexible that flat files, they still did not respond well when queried by employees lacking an extensive computer background. With the advent of the Internet as both a shopping and information medium, it became imperative to design search software that reflected how people naturally ask for information.
One who has been intimately involved with natural language research is Larry R. Harris, now the founder and chairman of EasyAsk Inc., formerly Linguistic Technology Corporation, which began in 1994 and changed names in 1999. He founded Artificial Intelligence Corporation in 1975, bringing it public as AICorp in 1991. The current company is funded by venture capital, with some 6,500 companies using its software, including government agencies, major corporations, and software developers. As a product, the EasyAsk search language initially made its debut inside corporations, providing employees with access to internal information. Only in the past year has the company expanded its purview to the public Internet, primarily in the catalog-shopping arena.
DATA DICTIONARIES DEFINE CONTENT
According to David Harris, vice president of marketing, the trick to making natural language work is providing a dictionary that is constantly under observation and growth. Domain definitions would tell the software whether an inquiry for pants refers to clothing or animals. Logging searches and examining those for which no answer was found leads to growth in the dictionary. Perhaps a new term is being used that the original dictionary didn't include. It needs to be added to the word or phrase list.
When an organization wants to put EasyAsk to work on its internal data, creating the requisite domain dictionary is the first critical task. Jump Start brings EasyAsk staff on-site for one to three days to work with the client in constructing the dictionary as well as training staff in basic functionality, information architecture, and deployment options. Alternatively, the dictionary creation task can be completely outsourced by EasyAsk. Maintaining the dictionary tends to be a joint process. Some types of basic corporate information inquiries like "What's the stock price?" or "What health benefits do we offer for pregnancy?" are likely to be similar across corporations. For others, the participation of subject experts to adequately construct highly specific data dictionaries, with technical equivalencies and common usage synonyms, might be required. One EasyAsk customer, the Board of Governors of the Federal Reserve System, deploys the software to its bank examiners so that they can monitor stock transactio ns and other financial metrics. As an analytical tool, the dictionary must have precise definitions.
Although the end product may look simplistic, there is a significant amount of technological complexity behind the scenes. The natural language query generates an SQL query that looks at the categorizations assigned in the data dictionary and does advanced text string searching. The categorizations essentially allow EasyAsk to access a domain structure without having to actually know that structure. Security is a consideration. There are built-in restrictions as to who can access which data. Presentation also differs. When searching data as straightforward as ROI, for example, the marketing and finance departments would need the information shown differently to accommodate their working styles and philosophies. "We apply the organization's business rules within the technology," David Harris states.
Resellers of EasyAsk create specific applications unique to their customer base. For example, epixtech uses EasyAsk as part of its Remote Patron Authorization system to enable the library's system administrator to allow qualified users to search the library's materials, and to track detailed usage. The EIS data warehouse, which is extracted from epixtech's Dynix system, is a circulation management system that lets users see what books are circulating, on order, or checked out.
WORKING LIKE AN END-USER
As anyone who has worked with unsophisticated end-users will testify, telling them to use Boolean search operators is doomed to failure. Although the concept of Boolean is gradually making its way into common parlance, most people are more comfortable asking a question as they would pose it to another human being. The second characteristic of end-users: they have no patience. If the search doesn't return what they want, they are unlikely to question whether they asked for information correctly or whether the content was available. Also, they'll probably blame the search engine and quickly move on to another option for finding the information they want.
Therefore, any organization that is going to implement search functionality, either on an intranet or a public Web site, should think carefully about natural language searching and precision of retrievability. It won't be long before even the plus and minus signs used by newer Internet search engines to denote Boolean AND and NOT commands will appear archaic. However, the natural language products such as EasyAsk, at least so far, work better in a controlled subject matter arena rather than the wide-open spaces of the Internet. The proven power of EasyAsk presents many opportunities for producers of specialized content.
MARYDEE OJALA is EContent's editor-at-large.
CEO: Larry R. Harris
No. employees: 35
Founded: 1994 as Linguistic Technology Corporation, changed name to EasyAsk Inc. in 1999.