Printer Friendly
The Free Library
14,537,391 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

The big squeeze: closing down the junk e-mail pipe.


Anti-spam measures are big on the legislative horizon, ranging from strict legislation at the state level to not-so-strict federal bills like CANSPAM CANSPAM Controlling the Assault of Non-Solicited Pornography and Marketing Act  (which doesn't do much to can spam, according to according to
prep.
1. As stated or indicated by; on the authority of: according to historians.

2. In keeping with: according to instructions.

3.
 states' attorneys general.) The House/Senate/state wrangling is intense as lawmakers try to balance anti-spam measures while protecting legitimate commercial interests. No matter who wins, spammers aren't going to shut down and go home. Just as they do now, companies will have to continue to run anti-spam filters on their corporate e-mail. Catching junk e-mail See spam.  is a technical challenge due to its volume and sophistication so·phis·ti·cate  
v. so·phis·ti·cat·ed, so·phis·ti·cat·ing, so·phis·ti·cates

v.tr.
1. To cause to become less natural, especially to make less naive and more worldly.

2.
. (Individual junk e-mailers may be bottom feeders bottom feeder - slopsucker , but the spam companies who enable them have deep pockets and smart programmers.)

Organizations use a variety of locations and technologies when using anti-spam technology. Spam filters come in both client-based and server-based flavors (client-based runs on end-user machines and serverbased can run from an outside Internet service, to the firewall, to the e-mail server See mail server. ). Spam filter approaches fall into four major categories, with many filters combining the technologies: Whitelisting/blacklisting, pattern matching 1. pattern matching - A function is defined to take arguments of a particular type, form or value. When applying the function to its actual arguments it is necessary to match the type, form or value of the actual arguments against the formal arguments in some definition. , signature filtering, and natural language processing Natural language processing

Computer analysis and generation of natural language text. The goal is to enable natural languages, such as English, French, or Japanese, to serve either as the medium through which users interact with computer systems such as
.

Whitelist/Blacklist

This basic filtering level works on lists of good e-mail addresses (whitelist) or spammer e-mail addresses or domains (blacklist (1) A list of e-mail addresses of known spammers. See spam, spam filter, Blacklist of Internet Advertisers, greylisting and blackholing. Contrast with white list.

(2) A list of Web sites that are considered off limits or dangerous.
). Blacklist filters reject any messages originating from or routed through blacklisted addresses or domains, while whitelists only accept any messages from an address or domain on a user-approved list. Some filtering applications use one or the other but many combine them.

Whitelisting

Whitelisting, or positive filters, checks incoming e-mail against a list of approved addresses. If the e-mail sender is not on the list, the filter can delete it, send it into a quarantined folder, or send back a challenge e-mail to the sender. If the sender personally replies to the challenge, the whitelist believes there is a real person at the other end and adds the address to the approved list Approved list

A list of equities and other investments that a financial institution or mutual fund is allowed to invest in. See: Legal list.


approved list

See legal list.
. This option is extremely selective about incoming e-mail, but challenge responses can seriously annoy legitimate senders. It is also susceptible to sophisticated address forging.

E-mail users should be able to add addresses to the whitelist. Most whitelist filters will start by building themselves from e-mail addresses found in the user's existing mailbox A simulated mailbox in the computer that holds e-mail messages. Mailboxes are stored on disk as a file of messages, a database of messages or as an individual file for each message. The standard mailboxes are usually In, Out, Trash and Junk (Spam).  and address book. Whitelists won't catch spammers who have hijacked good known addresses, but will catch spammers who haven't. They will also catch e-mail from your mother if she isn't in your e-mail whitelist, so users should check the list periodically.

[ILLUSTRATION OMITTED]

Blacklisting

Blacklisting, or negative filters, compares incoming addresses, subject lines and messages to a blacklist. It intercepts any offending messages and deletes or moves them into a quarantine quarantine (kwŏr`əntēn), isolation of persons, animals, places, and effects that carry or are suspected of harboring communicable disease.  folder. For example, common filters include rules for blocking mail with "free" or "cash" in the subject line as well as shady words we won't mention. Filters can also block certain ISPs or specific addresses. Blacklisting used to be simpler, but must now adjust to ridiculous punctuation use in spam message subject lines. Blacklisting also requires a large number of filters and CPU CPU
 in full central processing unit

Principal component of a digital computer, composed of a control unit, an instruction-decoding unit, and an arithmetic-logic unit.
 processing time, and often returns false positives--identifying an innocent message as spam. In fact, blacklists are better at blocking known viruses than spam--e-mail administrators can use them to deny attachments with common virus extensions such as .exe, .bat and .vbs. Blacklisting needs carefully maintained lists to work since spam programmers are flexible, creative and can turn on a dime. Companies using blacklists can keep a database in-house, though this is labor intensive Labor Intensive

A process or industry that requires large amounts of human effort to produce goods.

Notes:
A good example is the hospitality industry (hotels, restaurants, etc), they are considered to be very people-oriented.
See also: Capital Intensive, Trading Dollars
. Many sign-up with a third-party service provider who constantly updates its blacklists for client companies.

Pattern Matching

Pattern matching defines a set of criteria that classify messages as spam. Characteristics include such items as all capitalized subject lines, frequent spam phrases, and suspicious header lines. Administrators and users can assign point values to individual characteristics (for example, a high value for porn and a lower one for business offers). The filter then marks any messages scoring at, or higher than, the threshold as spam. Some systems allow the user to train the software to recognize spam or to exempt messages from spam blocking.

Pattern matching filters often use whitelist/blacklist techniques as well, but depend on more sophisticated technologies like content pattern recognition and flexible content filtering See Web filtering and parental control software. . Typical approaches include:

* Identifying invalid HTML tags: Spammers try to disguise HTML-enabled spam by inserting meaningless content within specific HTML tags

* Making case-sensitive checks: Another common spammer technique is displaying subject lines exclusively in upper case

* Practicing intelligent word recognition: To avoid blacklists, spammers will deliberately alter the subject line by adding or removing punctuation, adding nonsense phrases, misspelling mis·spell·ing  
n.
1. The act or an instance of spelling incorrectly.

2. A word spelled incorrectly.

Noun 1.
 words or compressing spaces.

* Blocking MIME content types: Suspicious types include perennial spam favorite HTML HTML
 in full HyperText Markup Language

Markup language derived from SGML that is used to prepare hypertext documents. Relatively easy for nonprogrammers to master, HTML is the language used for documents on the World Wide Web.
, and some viruses that present as specific MIME types.

Bayesian Filtering An analysis technique that has been applied to eliminating spam. It "learns" to differentiate real mail from advertising by examining the words and punctuation in large samples of both types of messages.

A type of pattern-matching filter, Bayesian filters don't require whitelists or blacklists. Bayesian filters learn from the user's own classification: users will run a new Bayesian filter against two folders, one containing wanted mail and the other mail that the user considers is spam. The more messages there are, the better the filters will work. This is just the beginning, since Bayesian spam filters are trainable (autoadaptive) and will adjust their matches according to subsequent user actions. Bayesian filters view characteristics such as words in the body of the message, headers, HTML code, word pairs, phrases and meta information. For example, if you are a business owner you may get a good amount of legitimate mail with the word "client." This filter will identify this word as overwhelmingly belonging in your good e-mail store. But if you also receive a good deal of spam with "mortgage" in it, the filter will classify that as a probable spam message but will count mitigating factors. This way, if you really are buying a house and your lender sends you an e-mail, the Bayesian filter won't automatically relegate rel·e·gate  
tr.v. rel·e·gat·ed, rel·e·gat·ing, rel·e·gates
1. To assign to an obscure place, position, or condition.

2. To assign to a particular class or category; classify. See Synonyms at commit.
 the message to the spam folder The location for storing unwanted e-mail as determined by a spam filter. Also called a "junk folder," spam folders are created by mail servers as well as the user's e-mail program. .

Signature Filtering

Similar to blacklisting but more flexible, signature filtering depends on algorithms. An e-mail's signature is combined from several different characteristics such as address, content, subject and domain. Signature filters use algorithms to produce a short character string to uniquely identify e-mail signatures. The signature filter captures incoming messages, compares their resulting strings to a database of suspect signatures, and blocks the spam signatures. Users can submit new spam addresses directly to the database, and third-party lists include regular updates on changing spam addresses. Database administrators use several validity checking Routines in a data entry program that test the input for correct and reasonable conditions, such as account numbers falling within a range, numeric data being all digits, dates having a valid month, day and year, etc.  techniques against false positives, including a requirement that multiple users submit a possible spam message before the database adds a signature.

NLP (Natural Language Processing) The capability of understanding human language. If the language is spoken, voice recognition plays an important role in converting the sounds to individual words. Then, natural language processing figures out what the words mean.

Natural Language Processing (NLP) tries to replicate intuitive human understanding of written information. NLP-based spam filters work by recognizing all probable forms of single words, which means that if a spammer substitutes "mor@gage" for "mortgage," NLP will recognize it anyway. NLP filters also identify phrase and sentence structures and relationships, assign dictionary definitions from context, and can process common sense information. Spam filtering NLP is trainable, which gives it a measure of artificial intelligence.

For example, Corvigo's intent-based filtering technology uses NLP to analyze an e-mail's intent: (a) the sender wants to sell you something--i.e., commercial e-mail; or (b) it doesn't want to sell you something--non-commercial e-mail like from a boss, a client, or uncle David. It further defines commercial e-mail as unwanted junk e-mail or a bulk mailing from a legitimate advertiser. It then sends on the desired e-mail to user inboxes and files the two types of commercial e-mail into separate folders. Users can train the system by rejecting messages from legitimate advertisers or non-commercial e-mail, and can correct spam categories if it's something they want to receive.

Most spam filtering technologies include a variety of these techniques. The challenge in building anti-spam features is there's a lot of money in spamming, and heavy-weight spammers employ good programmers to constantly beat the system. Filtering technologies that auto-adapt to spam challenges have the best chance of staying two steps ahead of the spammer's threatening game
Technology                 What it Does

Whitelisting/Blacklisting  Checks incoming e-mail against lists of
                           approved users and/or lists of suspected
                           spammers and suspicious domains. Users and
                           IT administrators can keep lists inhouse
                           or rely on third-party database services.
Pattern matching           Catches spam by using content pattern
                           recognition and content filtering. Bayesian
                           filters use algorithms to assign spam
                           probabilities to incoming content.
Signature filtering        Compares known spam elements against
                           subjects, messages, sending ISP, headers
                           and restricted sender names. Often works
                           in conjunction with blacklists.
Natural language           NLP reproduces human interpretation of
processing (NLP)           language. NLP-based spam filters analyze
                           words and phrases to determine message
                           intent, much as a human reader would.

Table 1


Jeff Ready is CEO (1) (Chief Executive Officer) The highest individual in command of an organization. Typically the president of the company, the CEO reports to the Chairman of the Board.  at Corvigo, Inc. (Mountain View, CA)

www.corvigo.com
COPYRIGHT 2003 West World Productions, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2003, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Internet
Author:Ready, Jeff
Publication:Computer Technology Review
Geographic Code:1USA
Date:Dec 1, 2003
Words:1452
Previous Article:iSCSI advantages and solutions for businesses.(Connectivity)
Next Article:Simplifying disaster recovery solutions to protect your data.(Disaster Recovery)
Topics:



Related Articles
One hundred percent of poll respondents dislike receiving marketing e-mails.
BIZWATCH : MARKETS.(BUSINESS)
Hold the spam, please: screeners a blessing for those who are not junkmail junkies.(Brief Article)
Monster in your computer: infectious spam weaks inbox havoc. (Spotlight).
Lost in a sea of spam? (TMC Labs).(Product/Service Evaluation)
Blue squirrel: spam sleuth. (New Products).
QUALCOMM releases Eudora 6.0.
Cleaning up e-mail: an end to spam?(Citings)(Anti-Spam Technical Alliance)(Brief Article)
College students reveal how they prefer to get information and marketing pitches.
Is email taking over your life? The 4-step plan for regaining control, getting more done and making more money.(InfoBytes Preview)(Advertisement)

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles