Printer Friendly

The big squeeze: closing down the junk e-mail pipe.

Anti-spam measures are big on the legislative horizon, ranging from strict legislation at the state level to not-so-strict federal bills like CANSPAM (which doesn't do much to can spam, according to states' attorneys general.) The House/Senate/state wrangling is intense as lawmakers try to balance anti-spam measures while protecting legitimate commercial interests. No matter who wins, spammers aren't going to shut down and go home. Just as they do now, companies will have to continue to run anti-spam filters on their corporate e-mail. Catching junk e-mail is a technical challenge due to its volume and sophistication. (Individual junk e-mailers may be bottom feeders, but the spam companies who enable them have deep pockets and smart programmers.)

Organizations use a variety of locations and technologies when using anti-spam technology. Spam filters come in both client-based and server-based flavors (client-based runs on end-user machines and serverbased can run from an outside Internet service, to the firewall, to the e-mail server). Spam filter approaches fall into four major categories, with many filters combining the technologies: Whitelisting/blacklisting, pattern matching, signature filtering, and natural language processing.

Whitelist/Blacklist

This basic filtering level works on lists of good e-mail addresses (whitelist) or spammer e-mail addresses or domains (blacklist). Blacklist filters reject any messages originating from or routed through blacklisted addresses or domains, while whitelists only accept any messages from an address or domain on a user-approved list. Some filtering applications use one or the other but many combine them.

Whitelisting

Whitelisting, or positive filters, checks incoming e-mail against a list of approved addresses. If the e-mail sender is not on the list, the filter can delete it, send it into a quarantined folder, or send back a challenge e-mail to the sender. If the sender personally replies to the challenge, the whitelist believes there is a real person at the other end and adds the address to the approved list. This option is extremely selective about incoming e-mail, but challenge responses can seriously annoy legitimate senders. It is also susceptible to sophisticated address forging.

E-mail users should be able to add addresses to the whitelist. Most whitelist filters will start by building themselves from e-mail addresses found in the user's existing mailbox and address book. Whitelists won't catch spammers who have hijacked good known addresses, but will catch spammers who haven't. They will also catch e-mail from your mother if she isn't in your e-mail whitelist, so users should check the list periodically.

[ILLUSTRATION OMITTED]

Blacklisting

Blacklisting, or negative filters, compares incoming addresses, subject lines and messages to a blacklist. It intercepts any offending messages and deletes or moves them into a quarantine folder. For example, common filters include rules for blocking mail with "free" or "cash" in the subject line as well as shady words we won't mention. Filters can also block certain ISPs or specific addresses. Blacklisting used to be simpler, but must now adjust to ridiculous punctuation use in spam message subject lines. Blacklisting also requires a large number of filters and CPU processing time, and often returns false positives--identifying an innocent message as spam. In fact, blacklists are better at blocking known viruses than spam--e-mail administrators can use them to deny attachments with common virus extensions such as .exe, .bat and .vbs. Blacklisting needs carefully maintained lists to work since spam programmers are flexible, creative and can turn on a dime. Companies using blacklists can keep a database in-house, though this is labor intensive. Many sign-up with a third-party service provider who constantly updates its blacklists for client companies.

Pattern Matching

Pattern matching defines a set of criteria that classify messages as spam. Characteristics include such items as all capitalized subject lines, frequent spam phrases, and suspicious header lines. Administrators and users can assign point values to individual characteristics (for example, a high value for porn and a lower one for business offers). The filter then marks any messages scoring at, or higher than, the threshold as spam. Some systems allow the user to train the software to recognize spam or to exempt messages from spam blocking.

Pattern matching filters often use whitelist/blacklist techniques as well, but depend on more sophisticated technologies like content pattern recognition and flexible content filtering. Typical approaches include:

* Identifying invalid HTML tags: Spammers try to disguise HTML-enabled spam by inserting meaningless content within specific HTML tags

* Making case-sensitive checks: Another common spammer technique is displaying subject lines exclusively in upper case

* Practicing intelligent word recognition: To avoid blacklists, spammers will deliberately alter the subject line by adding or removing punctuation, adding nonsense phrases, misspelling words or compressing spaces.

* Blocking MIME content types: Suspicious types include perennial spam favorite HTML, and some viruses that present as specific MIME types.

Bayesian Filtering

A type of pattern-matching filter, Bayesian filters don't require whitelists or blacklists. Bayesian filters learn from the user's own classification: users will run a new Bayesian filter against two folders, one containing wanted mail and the other mail that the user considers is spam. The more messages there are, the better the filters will work. This is just the beginning, since Bayesian spam filters are trainable (autoadaptive) and will adjust their matches according to subsequent user actions. Bayesian filters view characteristics such as words in the body of the message, headers, HTML code, word pairs, phrases and meta information. For example, if you are a business owner you may get a good amount of legitimate mail with the word "client." This filter will identify this word as overwhelmingly belonging in your good e-mail store. But if you also receive a good deal of spam with "mortgage" in it, the filter will classify that as a probable spam message but will count mitigating factors. This way, if you really are buying a house and your lender sends you an e-mail, the Bayesian filter won't automatically relegate the message to the spam folder.

Signature Filtering

Similar to blacklisting but more flexible, signature filtering depends on algorithms. An e-mail's signature is combined from several different characteristics such as address, content, subject and domain. Signature filters use algorithms to produce a short character string to uniquely identify e-mail signatures. The signature filter captures incoming messages, compares their resulting strings to a database of suspect signatures, and blocks the spam signatures. Users can submit new spam addresses directly to the database, and third-party lists include regular updates on changing spam addresses. Database administrators use several validity checking techniques against false positives, including a requirement that multiple users submit a possible spam message before the database adds a signature.

NLP

Natural Language Processing (NLP) tries to replicate intuitive human understanding of written information. NLP-based spam filters work by recognizing all probable forms of single words, which means that if a spammer substitutes "mor@gage" for "mortgage," NLP will recognize it anyway. NLP filters also identify phrase and sentence structures and relationships, assign dictionary definitions from context, and can process common sense information. Spam filtering NLP is trainable, which gives it a measure of artificial intelligence.

For example, Corvigo's intent-based filtering technology uses NLP to analyze an e-mail's intent: (a) the sender wants to sell you something--i.e., commercial e-mail; or (b) it doesn't want to sell you something--non-commercial e-mail like from a boss, a client, or uncle David. It further defines commercial e-mail as unwanted junk e-mail or a bulk mailing from a legitimate advertiser. It then sends on the desired e-mail to user inboxes and files the two types of commercial e-mail into separate folders. Users can train the system by rejecting messages from legitimate advertisers or non-commercial e-mail, and can correct spam categories if it's something they want to receive.

Most spam filtering technologies include a variety of these techniques. The challenge in building anti-spam features is there's a lot of money in spamming, and heavy-weight spammers employ good programmers to constantly beat the system. Filtering technologies that auto-adapt to spam challenges have the best chance of staying two steps ahead of the spammer's threatening game
Technology What it Does

Whitelisting/Blacklisting Checks incoming e-mail against lists of
 approved users and/or lists of suspected
 spammers and suspicious domains. Users and
 IT administrators can keep lists inhouse
 or rely on third-party database services.
Pattern matching Catches spam by using content pattern
 recognition and content filtering. Bayesian
 filters use algorithms to assign spam
 probabilities to incoming content.
Signature filtering Compares known spam elements against
 subjects, messages, sending ISP, headers
 and restricted sender names. Often works
 in conjunction with blacklists.
Natural language NLP reproduces human interpretation of
processing (NLP) language. NLP-based spam filters analyze
 words and phrases to determine message
 intent, much as a human reader would.

Table 1


Jeff Ready is CEO at Corvigo, Inc. (Mountain View, CA)

www.corvigo.com
COPYRIGHT 2003 West World Productions, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2003, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Internet
Author:Ready, Jeff
Publication:Computer Technology Review
Geographic Code:1USA
Date:Dec 1, 2003
Words:1452
Previous Article:iSCSI advantages and solutions for businesses.
Next Article:Simplifying disaster recovery solutions to protect your data.
Topics:


Related Articles
One hundred percent of poll respondents dislike receiving marketing e-mails.
BIZWATCH : MARKETS.
Hold the spam, please: screeners a blessing for those who are not junkmail junkies.
Monster in your computer: infectious spam weaks inbox havoc. (Spotlight).
Lost in a sea of spam? (TMC Labs).
Blue squirrel: spam sleuth. (New Products).
QUALCOMM releases Eudora 6.0.
Cleaning up e-mail: an end to spam?
College students reveal how they prefer to get information and marketing pitches.
Is email taking over your life? The 4-step plan for regaining control, getting more done and making more money.

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters