Printer Friendly
The Free Library
14,716,650 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Search engine algorithms.


As a searcher or search engine optimization Designing a Web site so that search engines easily find the pages and index them. The goal is to have your page be in the top 10 results of a search. Optimization includes the choice of words used in the text paragraphs and the placement of those words on the page, both visible and hidden  specialist, do you really need to understand the algorithms and technologies that power search engines? Absolutely, said a panel of experts at a recent Search Engine Strategies conference, including Rahil Lahiri, Mike Grehan, and Dr Edel Garcia, whose opinions are contained below.

What's the fuss all about?

Do we really need to know all this scientific stuff about search engines?' asked Mike Crehan. 'Yes!' he answered unequivocally and proceeded to explain the practical competitive edge you gain when you understand search algorithm In computer science, a search algorithm, broadly speaking, is an algorithm that takes a problem as input and returns a solution to the problem, usually after evaluating a number of possible solutions.  functions.

'If you know what ranks one document higher than another, you can strategically optimize and better serve your clients. Also if your client asks, 'Why is my competitor always in the top 20 and I'm not? How do search engines work?' If you say 'I don't know-they just do'--how long do you think you're going to keep this account?'

Grehan illustrated his point by quoting Brian Pinkerton, who developed the first full text retrieval search engine back in 1994. 'Picture this,' he explained, "A customer walks into a huge travel outfitters store, with every type of item, for vacations anywhere in the world, looks at the guy who works there, and blurts out, 'Travel.' Now where's that sales clerk sales clerk n (US) → dependiente/a m/f

sales clerk n (US) → commesso/a 
 supposed to begin?"

Search engine users want to to achieve their goals with minimum cognitive load Cognitive Load is a term (used in Educational psychology and other fields of study) that refers to the load on working memory during problem solving, thinking and reasoning (including perception, memory, language, etc.).  and maximum. They don't think carefully when they are entering queries; they use inaccurate three word, and haven't learned proper query formulation. This makes the search engines job more difficult.

Heuristics heu·ris·tic  
adj.
1. Of or relating to a usually speculative formulation serving as a guide in the investigation or solution of a problem:
, abundance problems, & the evolution of algorithms

Grehan went on to explain the important role that heuristics play in ranking documents. "A fascinating combination of things come together to produce a rank. We need to understand as much as we possibly can, so at least when we're talking about what ranks one document higher than another, we have some indication about what is actually happening."

Grehan described the progression of search algorithms over time. In early search engines, text was extremely important. But then search researcher Jon Kleinberg Jon Kleinberg is a Professor of Computer Science at Cornell University. He received his B.S. from Cornell in 1993 and his Ph.D. from MIT in 1996. His current research is focused on the mathematical analysis and modeling of the combinatorial structure of networks and information.  discovered what he termed 'the abundance problem.' The abundance problem occurs when a search for a query returns millions of pages all containing the appropriate text. Say a search on the term 'digital cameras' will return millions of pages. How do you know which are the most important or authoritative pages? How does a search engine decide which one is going to be the listing that comes to the top? Search engine algorithms had to evolve in complexity to handle the problem of over- abundance.

Insights from Ask Jeeves Noun 1. Ask Jeeves - a widely used search engine accepting plain English questions or phrases or terms
trademark - a formally registered symbol identifying the manufacturer or distributor of a product
 

Ask Jeeves is the seventh ranked property on the web and the number 4 search engine, according to according to
prep.
1. As stated or indicated by; on the authority of: according to historians.

2. In keeping with: according to instructions.

3.
 Rahul Lahiri from Ask Jeeves. Lahm described a number of components that are key to Ask Jeeves search algorithms, including index size, freshness of content and data structure. Ask Jeeves' focus on the structure of data is unique and differentiates its approach from other engines, he said. Them are two key drivers in web search: content analysis and linkage analysis linkage analysis Genetics A gene-hunting technique that traces patterns of heredity in large, high-risk families, in an attempt to locate a disease-causing gene mutation by identifying traits co-inherited with it; the formal study of the association between the . Lahiri confirmed that Ask Jeeves looks at the web as a graph and looks at the link relationships between them, attempting to map clusters of related information.

By breaking down the web into different communities of information, ask Jeeves can rely on the 'knowledge' from authorities in each community to better understand a query and present more on-topic results to the searcher. If you have a smaller site, but one that is very relevant within your community, your site may rank higher than some larger sites that provide relevant information but axe not part of the community.

Why co-occurrence is important-Dr Garcia

Dr.Garcia is Garcia I might refer to:
  • García I of Castile (d.995)
  • García I of León (d. 914)
  • García I of Pamplona (d. 870)
 a scientist with a special interest in Artificial Intelligence and Information Retrieval information retrieval

Recovery of information, especially in a database stored in a computer. Two main approaches are matching words in the query against the database index (keyword searching) and traversing the database using hypertext or hypermedia links.
. He explained that terms that co-occur more frequently tend to be related or 'connected.' Furthermore, semantic associations affect the way we think of a term. When we see the term 'aloha' we think of 'Hawaii' because of the semantic associations between the terms. Co-occurrence theory, according to Garcia, can be used to understand semantic associations between terms, brands, products, services, etc.

Dr.Garcia then posed a question. Why should we care about term associations in a search engine'? His answer: Think about keyword-brand associations. This has powerful implications for search marketing. Where is the evolution of the search algorithm going? Grehan had a ready answer- He expects the introduction of probabilistic (probability) probabilistic - Relating to, or governed by, probability. The behaviour of a probabilistic system cannot be predicted exactly but the probability of certain behaviours is known. Such systems may be simulated using pseudorandom numbers.  latent semantic indexing and probabilistic hyper A Greek work meaning "above" or "more than." It is used as a prefix to technical concepts and products to convey a more advanced or more automatic capability.  text induced topic search.

Web Services (1) Loosely, any online service delivered over the Web. Such usage appears in articles from non-technical sources, but not in IT-oriented publications, because definition #2 below describes the correct use of the term.  Transactions and Heuristics

Mark Little

The current forerunners for the title of Web Services transactions standard are the WS-AtomicTransaction or WS-Transaction Management specifications. Both of these provide protocols intended for interoperability The capability of two or more hardware devices or two or more software routines to work harmoniously together. For example, in an Ethernet network, display adapters, hubs, switches and routers from different vendors must conform to the Ethernet standard and interoperate with each other.  of existing transaction processing systems A Transaction Processing System (TPS) is a type of information system. TPSs collect, store, modify, and retrieve the transactions of an organization. A transaction is an event that generates or modifies data that is eventually stored in an information system. .

In an earlier article, we saw how the concepts of ACID transactions have played a cornerstone role in creating today's enterprise application environments. They provide guaranteed consistent outcome in complex multiparty mul·ti·par·ty  
adj.
Of, relating to, or involving more than two political parties.
 business operations Business operations are those activities involved in the running of a business for the purpose of producing value for the stakeholders. Compare business processes. The outcome of business operations is the harvesting of value from assets  using a two-phase commit protocol In computer networking and databases, the two-phase commit protocol is a distributed algorithm that lets all nodes in a distributed system agree to commit a transaction. The protocol results in either all nodes committing the transaction or aborting, even in the case of network . The current forerunners for the title of Web Services transactions standard are the WS- AtomicTransaction or WS-Transaction Management specifications. Both of these provide protocols intended for interoperability of existing transaction processing systems. However, as we'll see in this paper, only one of those specifications addresses an important aspect of current transaction processing systems: heuristic A method of problem solving using exploration and trial and error methods. Heuristic program design provides a framework for solving the problem in contrast with a fixed set of rules (algorithmic) that cannot vary.

1.
 errors and how they can be dealt with in an interoperable manner.

The problem with failures

Imagine you walk into a bank and want to perform a transaction (banks are very useful things in transaction examples). That transaction involves you transferring money from one account (savings) to another (current. You obviously want this to happen with some kind of guarantee, so for the sake of this example lets assume we use an ACID transaction.

To ensure atomicity at·o·mic·i·ty  
n.
1. The state of being composed of atoms.

2. Chemistry
a. The number of atoms in a molecule.

b. Valence.
 between multiple participants, a two phase commit mechanism is required: during the first (preparation) phase, an individual participant must make durable any state changes that occurred during the scope of the atomic transaction, such that these changes can either be rolled back (undone) or committed later once consensus to the transaction outcome has been determined amongst all participants, i.e., any original state must not be lost at this point as the atomic transaction could still roll back.

Assuming no failures occurred during the first phase (in which case all participants will be forced to undo their changes), in the second (commitment) phase participants may 'overwrite' the original state with the state made durable during the first phase.

In order to guarantee atomicity, the two-phase protocol is necessarily blocking. If the coordinator fails, for example, any prepared participants must remain in their prepared state until they hear the transaction outcome from the coordinator. All commercial transaction systems incorporate a failure recovery component that ensures transactions that were active when a failure occurred are eventually completed. However, in order for recovery to occur, machines and processes obviously need to recover! In addition, even if recovery does happen, the time it takes can be arbitrarily long. So, in our bank example, despite the fact that we're using transactions and assuming that the transaction system is reliable, certain failures will always occur, given enough time and probabilities. The kinds of failure were interested in for this example are those that occur after the participants in the two-phase commit A technique for ensuring that a transaction successfully updates all appropriate files in a distributed database environment. All DBMSs involved in the transaction first confirm that the transaction has been received and is recoverable (stored on disk).  transaction have said they will do the work requested of them (transfer the money) i.e., during the second (commit) phase. So, the money has been moved out of the current account (it's really gone) and is being added to the savings account Savings Account

A deposit account intended for funds that are expected to stay in for the short term. A savings account offers lower returns than the market rates.

Notes:
, when the disk hosting the savings account dies, as shown in the diagram. Usually what this means is that we have a non-atomic outcome, or a heuristic outcome [1]: the transaction coordinator has said commit, one participant (savings account) has said 'done", but the second one (current account) has said 'oops!". There's no going back with the work the current account participant has done, so this transaction isn't going to be atomic (all or nothing).

Why is this a Problem?

Most enterprise transaction specifications, such as the OMG's Object Transaction Service and the X/Open XA In computing, the XA standard is a specification by The Open Group for distributed transaction processing (DTP). It describes the interface between the global transaction manager and the local resource manager.  specification [2], and implementations (like CICS (Customer Information Control System) A TP monitor from IBM that was originally developed to provide transaction processing for IBM mainframes. It controls the interaction between applications and users and lets programmers develop screen displays without , Tuxedo and the Arjuna Transaction Service) allow for heuristics to occur. This basically means that the transaction system can be informed (and hence can inform) that such an error has happened. There's not a lot that can be done automatically to fix these types of error. They often require semantic information about the application in order to restore consistency, so have to be handled by a system administrator. However, the important thing is that someone knows there's been a problem,

Imagine that this error happens and you don't know Don't know (DK, DKed)

"Don't know the trade." A Street expression used whenever one party lacks knowledge of a trade or receives conflicting instructions from the other party.
 about it! Or at least don't know about it until the next time you check your account. Not good. Personally I'd like to know if there's been an error as soon as possible. In our bank scenario, I can go and talk to someone in the branch. If I was doing this via the internet there's usually a number I can call to talk to someone.

Now why is this important? Well, there are a few Web Services transactions specifications around that can be used in this scenario: WS- AtomicTransaction and WS-ACID Transaction (part of WS-Transaction Management). Only WS-ACIDTransaction provides support for heuristic errors to be sent from participant to coordinator and from coordinator to end-user. This seems like a strange omission to the other specification, because these kinds of errors do happen.

One obvious question is: why don't heuristics appear in the WS- AtomicTransaction specification? Without a definitive answer from the authors we can only speculate. Maybe they don't think heuristics can happen in real life? However, given that heuristics occur in other specifications and implementations, another possibility is that the authors believe that these kinds of faults can and should be dealt with behind the service boundary and not exposed to applications. Unfortunately that's not the case; in most cases heuristic errors can only be resolved with semantic information about the application that they occurred within. Automatic resolution rarely happens and manual resolutions can take arbitrarily long periods-, so knowing that an error has happened (and an error that potentially blows all ACID guarantees away) as quickly as possible, is important to users and applications.

In reality this apparent omission from that specification is not as bad as might first seem. The WS-AtomicTransaction specification does have an error InconsistentinternalState that can be returned by participant to indicate 'it cannot fulfil its obligatons. However, this isn't sufficient, because there isn't enough information given for the recipient (the coordinator) to know what heuristic decision the participant took: did it commit, did it roll back, or did it do something else entirely different? Of course you can use WS-AtomicTransaction to communicate these errors.

Unfortunately you just can't do it within the specification. You would have to overload SOAP faults (for example), or maybe use some proprietary extension (repeat after me: vendor lock-in In economics, vendor lock-in, also known as proprietary lock-in, customer lock-in, lock-in is where a customer is dependent on a vendor for products and services and cannot move to another vendor without substantial switching costs, real and/or perceived.  is not good). Not a good idea for interoperability and/or portability. WS-Atomic Transaction and WS-ACID Transaction are really meant for interoperability of existing transaction service implementations, where heuristics originated. This makes the omission of heuristics in WS-Atomic Transaction even more striking and hopefully something that will be addressed through a standards body. With any luck, the experiences of both sets of specifications ran be leveraged into a single industry standard.

References

[1] Gray, Jim and Andreas Reuter. Transaction Processing Updating the appropriate database records as soon as a transaction (order, payment, etc.) is entered into the computer. It may also imply that confirmations are sent at the same time.

Transaction processing systems are the backbone of an organization because they update constantly.
: Concepts and Techniques. Morgan Kaufman, 1993.

[2] Distributed Transaction A distributed transaction is an operations bundle, in which two or more network hosts are involved. Usually, hosts provide transactional resources, while the transaction manager  Processing: The XA Specification, The Open Group, February 1992.
COPYRIGHT 2005 A.P. Publications Ltd.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2005, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:analysis
Author:Churchill, Christine
Publication:Software World
Geographic Code:1USA
Date:May 1, 2005
Words:1923
Previous Article:Defining null values in Microsoft Access.(SOFTWARE INTELLIGENCE)
Next Article:Comprehensive spyware solution for home users.(new software from Symantec Corp.)
Topics:



Related Articles
Sifting through the Web's data jumble.(researchers develop algorithm to compile lists of authoritative World Wide Web resources)(Brief Article)
Neural shows "Shakespeare did write Henry VIII".(SER Systems' intelligent learning engine, SERbrainware used in analysis of play)(Brief Article)
Finding networks within networks. (Computing).(search engine algorithm)(Brief Article)
Key words and phrases.(Peak Placement provides search engine strategies to companies )
Google's fatal flaw?(High Priority!)
Sense Engine.(Database & Network Digest)(Crystal Semantics )
'Optimization' steps.(Technology)
Search engine marketing: grow your online potential.(NEW MEDIA MARKETING)
Search by shape: new CAD-independent search engines automatically code and classify shapes for quick retrieval.(FEATURE)
The search engine--a history.(DEFINITIONS)

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles