Printer Friendly

Semiotics for enterprise search.

Readers of KMWorld are comfortable with the semantic processes that knowledge management systems employ to make sense of business information. There are knowledgebases, taxonomies, controlled vocabularies and access control tags. Knowledge management is more than key word search and retrieval, although finding information is part of the discipline.

Many of the flagship companies offering knowledge management systems have wrapped a basic string matching system with various enhancements. Some systems process e-mail and use data from those analyses to pinpoint an individual who is a "hub" for information exchange in an organization. Other vendors' systems generate relationship maps, sometimes described as social graphs. With those maps, a manager can learn who is an expert in a particular topic based on the content flowing through the system. There are many variations, and organizations have experienced major successes with systems from different vendors. At the same time, other licensees of the same vendors' systems report less helpful results.

I had an opportunity to learn about a new approach to access information within an organization. Unlike most of the systems we have tested, Sophia Search (, headquartered in Belfast, Northern Ireland, has enlisted the art and science of semiotics to make information access and management more useful. Collaborating with colleagues at the State University of St. Petersburg, Russia, Dr. David Patterson from the University of Ulster in Northern Ireland, developed a patented approach to information retrieval. He and his co-founder, Dr. Vladimir Dobrynin, established Sophia Search in 2007. The company is now competing in the enterprise software sector with the likes of Autonomy (, Endeca (, Exalead (exa, Google (, and Microsoft (, among others.

Semiotics focuses on signs and symbols as indicators of meaning. As implemented, the approach enables "Sophia to understand and interpret the meaning and context of information within documents," Patterson said when I interviewed him in February. Sophia has tuned its approach so that the "meaning and relevancy of a document depends on both the user's query and, importantly, all the other documents within the organization," he added.

Contextual discovery

Patterson explained, "Sophia is based on a model of linguistics, called semiotics, which is the science behind how we as humans understand the meaning of information in context. This is the power behind the technology that drives our discovery engine and ability to improve the findability of information."

On the surface, Sophia works like or or one of the big enterprise search solutions from Autonomy or Exalead. Sophia describes its approach as a "contextual discovery engine." The system processes the words in the source document and then automatically disambiguates the different means of words based on their context in the source document.

Autonomy's "meaning-based computing" and Recommind's (recom approach appear to be similar to the Sophia system and method. However, Patterson pointed out that semiotics is the key to the firm's technology. Sophia searches by the meaning of what users are looking for as opposed to just the key words they use in their query. Sophia is designed to enable users to discover contextually relevant information of which they were previously unaware, and it increases users' understanding of their content.

Most information processing systems require the system administrator or a subject matter expert to tune the system. Patterson said, "One of the benefits of our technical approach is that Sophia operates without human guidance or training, and it does not require taxonomies, ontologies or thesauri."

According to Patterson, Sophia's approach is novel, possibly unique, in the world of information retrieval and content processing. "What motivated me was solving the problems that faced the world of search from a research perspective. I was aware of the limitations of some commercial systems through my own research and experience," he said. "The more I thought about the problems of locating the information I needed, I began to question the basic assumptions that conventional search vendors make."

The knowledge management application of Sophia is that the system can work with other enterprise software systems. If an organization has licensed a major vendor's search system, Sophia can enhance that system. It also works with the Google Search Appliance. Patterson said, "Sophia was developed with an open architecture to enable ease of integration through our Java APIs and RESTful Web services. In this way, we have made it easy to augment other search tools with Sophia's contextual capabilities and to build additional applications based on thirdparty products."

Crafting query, not so easy

The core of Sophia's approach is that the developers worked to avoid the pitfalls that have plagued other information retrieval systems. Those included a realization that the user knows what he or she is looking for before running a query. The Sophia team recognized that many users cannot express a specific information need. As a result, Sophia's developers wanted to provide a system that worked around forcing the user to craft a query. He said, "Experience and research revealed that users often find creating a quality query a challenge."


The Sophia system features a search box, but the system also displays links to other relevant content. It depends on its patented semiotics approach. According to Patterson, "The system method organizes and presents information contextually, users spend less time sifting through irrelevant information and can focus on information that they know is of value."

The benefits of the approach pivot on reducing the time a user needs to locate desired information. Patterson gave me this example: "Imagine that you are interested in queries like 'Where is Joe's office?' or 'What is his phone number?' In those instances, deploying Sophia is overkill. We don't add any further value over other tools such as Google Search Appliance or some of the other basic key word systems in this instance. But if you are interested in discovering what topics exist in your data, or unearthing new information related to your query that you didn't know existed, or deciphering how documents are semantically linked to one another within a particular context, or understanding the meaning of your information at a glance, then Sophia is a tool that is worth spending time evaluating."

The Sophia system can create customized search reports that can be exported as PDF or XML to enable ease of integration with other analytical tools within an organization. Patterson said, "Sophia enables and encourages results sharing among employees to reduce the amount of time people spend re-executing queries already carried out by others, and it can automatically watch the corpus for new information indexed after a result set has been returned to the user."

The challenge facing Sophia Search is difficult. The knowledge management and content processing sector is characterized by fierce competition. The emergence of open source options for search such as Lucene/Solr (, for business intelligence such as Pentaho (, and for data management such as Cassandra puts pressure on commercial, proprietary enterprise software.

The opportunity Sophia has is to demonstrate a better way to tap into the unstructured content that an organization possesses. If Sophia can significantly reduce the time required for a user to locate the item of information needed to close a deal or resolve another business issue, Sophia could gain traction in both the enterprise knowledge management and the search-and-retrieval markets.

The present business climate remains unforgiving. A number of enterprise content processing firms have undergone some organizational shifts. Executives have rotated at Lucid Imagination, MarkLogic (mark and Sinequa (sine since the first of the year. Other firms have been repositioning themselves into a wide range of vertical markets, including financial services, customer support, healthcare and competitive intelligence. The payoff from those changes is not yet known.


Sophia's reliance on semiotics may be the breakthrough required in content processing and information retrieval. Vendors with more traditional methods face longer sales cycles and the shadow of the free, open source software option.

Patterson sees today's troubled marketplace as benefiting the buyer. Sophia's sales approach focuses on the customer's need, according to Patterson. "At Sophia, we discuss with the customer right at the start where the strengths of Sophia lie, because we don't want to waste their time if we don't believe it is the best tool for their needs," he said.

Sophia wants to make partnering a key component of the customer's business strategy. "We believe our technology is not just a standalone product," Patterson said, "but also it is complementary to many existing solutions and can be used to enhance their capabilities. "

Reliance on key word indexing causes some knowledge management systems to crash at take off. To get over the hurdles to enterprise knowledge management and information retrieval license deals, Sophia will rely on the lift from the power of semiotics to take flight.

Stephen E. Arnold is a consultant. More information about his practice is available at and in his Web log at
COPYRIGHT 2011 Information Today, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2011 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Arnold, Stephen E.
Date:Jun 1, 2011
Previous Article:Collaboration & work force management.
Next Article:DPLA: a good idea that has a shot.

Terms of use | Privacy policy | Copyright © 2018 Farlex, Inc. | Feedback | For webmasters