Fighting spam.IDC During the past two years, IDC has tracked huge increases in the volume and complexity of Spam. Despite the availability of many approaches to detecting Spam, it continues to plague email users. Antispam solutions that were considered effective a year ago based on their ability to block as many as 90 out of every 100 Spam messages may no longer be considered effective when the actual number of Spam messages reaching email user inboxes rises rather than falls due to the higher volumes of Spam being sent by spammers. To reduce Spam in real terms, antispam solutions need to detect more Spam by identifying and blocking new forms of Spam in real time, during the first few minutes of an outbreak. Recurrent Pattern Detection, a technology used by several server software, server appliance A self-contained computer system specialized for network use. Its applications are pre-installed, and access to setup and configuration is via a Web browser. Server appliances may provide a single application or several applications; for example, a single device may provide file server, , and client software vendors and their customers, provides real-time proactive detection of Spam regardless of content, language, or format. What Is Spam? Spam is "unsolicited bulk email" sent by both legitimate direct marketers offering commercial products and services and less reputable firms and individuals offering illicit offensive, and even nonexistent non·ex·is·tence n. 1. The condition of not existing. 2. Something that does not exist. non products and services or using email to deliver viruses. "Bulk email" refers to the automated broadcast of high volumes of Spam. Spam Trends Spam has been a major problem for service providers for several years. However, for most organizations, it has only recently emerged as a high-priority problem requiring high-priority attention and resources. Prior to 2002, companies and other organizations considered Spam more of a nuisance than anything else. They considered the volumes of Spam received to be manageable through the use of the Delete key What a difference two years make. The volume of Spam sent worldwide every day will jump from 7 billion in 2002 to 17 billion in 2004, according to according to prep. 1. As stated or indicated by; on the authority of: according to historians. 2. In keeping with: according to instructions. 3. IDC estimates. Spam has grown into too difficult and costly a problem for most IT departments to ignore or leave to email users. Fighting Spam can be very time consuming and is best handled by experts who spend all of their time and resources focused on developing even more effective ways to block it. Moreover Spam will contain more non-English language content as the majority of Spam will be sent to email users outside North America North America, third largest continent (1990 est. pop. 365,000,000), c.9,400,000 sq mi (24,346,000 sq km), the northern of the two continents of the Western Hemisphere. by 2005 (see Figure 1). [FIGURE 1 OMITTED] Spammers Spammers are motivated by the money to be made in selling legitimate or illegitimate products and services through spam and in delivering spam on behalf of their customers. Before most organizations deployed antispam solutions over the past two years, spammers had little difficulty in sending spam Life has become more difficult for spammers, but the financial incentive has pushed spammers to be very aggressive and creative in developing new forms of spam that can evade e·vade v. e·vad·ed, e·vad·ing, e·vades v.tr. 1. To escape or avoid by cleverness or deceit: evade arrest. 2. a. detection. These include spoofing (1) Faking the sending address of a transmission in order to gain illegal entry into a secure system. See e-mail spoofing. (2) Creating fake responses or signals in order to keep a session active and prevent timeouts. email addresses See Internet address. , constantly changing spam content, incorporating random and innocent text, and replacing text with images to fool content analysis and other spam detection technologies-Spammers take advantage of their ability to deliver a sizable portion of their spam broadcasts to email user inboxes before most antspam solutions are able to recognize and begin detecting new forms of spam. Service Providers The first line of defense against spam consists of service providers that are asked to relay huge volumes of email, including an abundance of spam, to other service providers, organizations, and email users. Because service providers are close to the source of spam, they are in a unique position of being able to either stop or at least slow down the rate at which spam reaches the Internet by detecting spam as it is attempts to enter their networks. Due to the limited amount of revenue that is directly generated by most hosted email services See Internet e-mail service. , the challenge for service providers is to block spam effectively with minimal cost and subscriber involvement. Corporate IT Departments The second line of defence against spam consists of corporate IT departments that are tasked with delivering email without viruses or Spam. Spam, including Spam carrying viruses, is best stopped at Internet gateways (1) See cable/DSL gateway. (2) A router or server that converts IP packets to IPX, AppleTalk or some other non-IP format and vice versa. It is used to connect non-IP networks to the Internet. as it attempts to enter corporate networks. The challenge is to stop Spam, and only Spam, with minimal cost. To achieve this goal, corporate IT departments, with input from business management need to determine what constitutes Spam in the context of their organizabons-While corporate IT evaluates, deploys, and configures antispam solutions, direct email user involvement in choosing between rejecting and accepting suspected Spam can ensure that Spam and only spam is blocked. Criteria for Antispam Solutions Effectiveness and Accuracy The most important element of blocking Spam for an antispam solution is to detect all or nearly all Spam. While blocking 100% of Spam is the ideal, removing 94-95% of Spam goes far in alleviating the costs imposed by Spam on email users and IT staff. In addition, organizations generally letting some suspected Spam through in the interests of accuracy. With the huge rise in Spam volumes, antispam solutions needed to block a rising percentage of Spam to reduce the actual number of Spam reaching email users. As a result, solutions that blocked 90 out of 100 (90% of) Spam and that were considered effective in the past must now block 94-95% of Spam to be considered effective. Accuracy refers to the degree to which an antispam solution can block Spam and not block legitimate email messages (i.e., avoid false positives). Frequency of Updates The key to effective and accurate detection of Spam is for antispam solutions to evolve along with Spam. With hundreds of thousands of Spam outbreaks disseminated every day, filters can quickly become ineffective against the latest Spam. Antispam solutions must be able to catch today's Spam, not yesterday's Spam. The same way that out-of-date antivirus products are ineffective at identifying the latest viruses, out-of-date antispam solutions that know nothing about the latest Spam become increasingly ineffective over time at stopping Spam. To minimize administration time and effort product and Spam pattern updates provided by commercial antispam solution providers that spend all of their time keeping up with spammers and supporting customers are needed. Ideally, these updates should be automatic and frequent. IT departments know better than to rely on homegrown home·grown adj. 1. Raised or grown at home. 2. Originating in or characteristic of a locality: "Rock is homegrown music in the United States, evolved from blues and country and Tin Pan Alley" solutions to fight viruses. The same appreciation for outside expertise applies to fighting Spam that is constantly changing its content and format to avoid detection. Global Coverage of Multilingual mul·ti·lin·gual adj. 1. Of, including, or expressed in several languages: a multilingual dictionary. 2. Spam Due to the global nature of email and the preference of many spammers to operate in countries where their spamming activities will not be easily detected or prosecuted, Spam is often sent from countries where a language other than English dominates. Antispam solutions must be able to block Spam regardless of the language or dialects used. Otherwise, Spam sent from foreign countries will have a good chance of evading detection. Antispam Methods The key approaches used by enterprises to detect spam are as follows: Honey Pot Signatures "Honey pots" or decoy DECOY. A pond used for the breeding and maintenance of water-fowl. 11 Mod. 74, 130; S. C. 3 Salk. 9; Holt, 14 11 East, 571. email mailboxes, created on service provider and enterprise networks to act as spam catchers or traps, are used to provide the basis for generating signatures or patterns of spam received for testing emails sent to real mailboxes. This approach focuses on the "unsolicited" aspect of spam by relying on the fact that any email sent to a mailbox A simulated mailbox in the computer that holds e-mail messages. Mailboxes are stored on disk as a file of messages, a database of messages or as an individual file for each message. The standard mailboxes are usually In, Out, Trash and Junk (Spam). that does not belong to a real person could not have been solicited and is spam by definition. An advantage of this approach is that only actual spam is blocked, helping to avoid false positives. A disadvantage is that only spam exactly matching known spam already caught can be blocked. Because spammers are constantly changing spam, signature-based solutions are always playing catchup catch·up n. Variant of ketchup. . Inherent in the design is a delay from when patterns for new forms of spam can be created and distributed before spam can be blocked. In addition, not all spam reach honey pots due to spammers using validated email addresses. Content Analysis One or more content analysis techniques are used to analyze everything about the content of inbound in·bound 1 adj. Bound inward; incoming: inbound commuter traffic. Adj. 1. inbound email by service providers, gateway or amail servers, or even client antispam solutions. This approach focuses on the "commercial" aspect of spam by relying on the suspicious characteristics of legitimate and illegitimate offerings or information requests that spammers try to hide from spam filters A software routine that deletes incoming spam or diverts it to a "junk" mailbox (see spam folder). Also called "spam blockers," spam filters are built into a user's e-mail program. and email users at least until they open the emails. Many content analysis techniques are in use, including: * Keyword analysis: This approach involves analyzing the text section of an email for specific keywords and phrases (e.g, sex, profanities, Viagra) that are unlikely to appear in legitimate business correspondence. Keyword analysis, when used as a standalone stand·a·lone adj. Self-contained and usually independently operating: a standalone computer terminal. spam solution, is a very primitive technique and often produces a high risk of false positives. * Lexical analysis (programming) lexical analysis - (Or "linear analysis", "scanning") The first stage of processing a language. The stream of characters making up the source program or other input is read one at a time and grouped into lexemes (or "tokens") - word-like pieces such as keywords, : Unlike keyword analysis, lexical analysis works by analyzing the context of all of the words and phrases Words and Phrases® A multivolume set of law books published by West Group containing thousands of judicial definitions of words and phrases, arranged alphabetically, from 1658 to the present. in a particular message. The presence of a particular suspicious word or phrase by itself does not necessarily mean that the message is spam. Instead, each word or phrase is assigned a weight depending primarily on the context in which it is found. * Bayesian analysis Bayesian analysis A decision-making analysis that '…permits the calculation of the probability that one treatment is superior based on the observed data and prior beliefs…subjectivity of beliefs is not a liability, but rather explicitly allows : The basis of Bayesian logic Bayesian logic A type of reasoning in which the likelihood of an event occurring can be described in quantitative—ie probabilistic terms. See Artificial intelligence, Computer-assisted diagnosis. uses the knowledge of prior events to predict future events. When used to detect sparn, a Bayesian filter examines emails that are known to be spam and emails that are known to be legitmate and compares the content in both emails in order to build a database of mrds that will, according to probability, identify or predict future emails as spam or legitimate email. Although Bayesian analysis is a new technique used to fight spam, the Bayesian logic theory was actually first published in 1763. * Heuristics heu·ris·tic adj. 1. Of or relating to a usually speculative formulation serving as a guide in the investigation or solution of a problem: : Heuristics is a technique that looks for spam-like characteristics in an email message. Each characteristic is assigned a spam probability, and the message is given a cumulative probability score based on the overall test results. If a certain probability threshold is reached, the email is determined to be spam and is blocked. * Header analysis: Header analysis examines headers, looking for Looking for In the context of general equities, this describing a buy interest in which a dealer is asked to offer stock, often involving a capital commitment. Antithesis of in touch with. such items as the validity of the sender's address, whether the same information is found in the "sender" and "from" fields of an email, and whether a specific message contains information not common to normal email. * URL URL in full Uniform Resource Locator Address of a resource on the Internet. The resource can be any type of file stored on a server, such as a Web page, a text file, a graphics file, or an application program. analysis: Spammers are increasingly embedding 1. (mathematics) embedding - One instance of some mathematical object contained with in another instance, e.g. a group which is a subgroup. 2. (theory) embedding - (domain theory) A complete partial order F in [X -> Y] is an embedding if URL links inside emails to direct users to specific Web sites. URL analysis looks at the embedded Inserted into. See embedded system. links in email messages and compares them to a list of URL rules or known Spam URLs to determine if the messages are spam. Multiple techniques can be combined to generate aggregated scoring for each inbound email representing the probability of it being spam. An advantage common to any content analysis technique is that it can block the very first spam sent by a spammer. Two disadvantages are that spamrners can change the content of spam sufficiently to evade content filters and false positives occur when legitimate emails contain sufficient content that is also common to spam. Blacklisting/Whitelisting Blacklisting and whitelisting rely on the identification of email senders to determine whether messages are spam. Most blacklisting relies on RBLs containing domain names or email addresses maintained by antispam Web sites, service providers, IT departments, and even individual email users that block all email from known spammers, Whitelisting relies on similarly maintained lists that allow all email from known legitimate or "good" senders. An advantage of this approach is that all content from known spammers is blocked and all content from known "good" senders is passed through. A disadvantage is that managing RBLs requires much time and effort. RBLs are often ineffective on their own in blocking spam because they often include legitimate sources misclassified as spammers and miss spammer IP addresses and domains, which change rapidly. In addition, spammers hijack legitimate domain names and individual email addresses to make spam appear to come from good senders. Reverse DNS Lookup This article is about the Network process of Reverse DNS lookup. For the Java-like naming convention, see Reverse-DNS. Reverse DNS lookup (rDNS) is a process to determine the hostname or host associated with a given IP address or host address. Reverse DNS lookup runs DNS (Domain Name System) A system for converting host names and domain names into IP addresses on the Internet or on local networks that use the TCP/IP protocol. For example, when a Web site address is given to the DNS either by typing a URL in a browser or behind the queries on the IP addresses of incoming email to determine if the identified host name matches an actual host name for the sender's IP addresses. Many spammers use spoofed hosts to disguise the source of the spam, a query that doesn't recover a matching host and IP address is a good indication that the message is spam. An advantage of this approach is that it quickly and easily blocks spam with spoofed addresses. A disadvantage is that it does nothing about spam without a disguised source. Sender Authentfcatfon Sender authentication See e-mail authentication and Sender ID. is a new approach that promises to identify spam by checking the identification of named email senders based on either Sender email or IP addresses. Emails with sender information that cannot be authenticated au·then·ti·cate tr.v. au·then·ti·cat·ed, au·then·ti·cat·ing, au·then·ti·cates To establish the authenticity of; prove genuine: a specialist who authenticated the antique samovar. with the sending domains can be blocked or identified as suspicious for further scanning. An advantage of this approach is the prevention of email fraud, the most harmful form of spam. A disadvantage is the time and will needed to incorporate compatible sender authentication tools into all of the message transfer agents The store and forward capability in a messaging system. See messaging system. (messaging) Message Transfer Agent - (MTA, Mail Transfer Agent) Any program responsible for delivering e-mail messages. (MTAS MTAS Multi-Sensor Target Acquisition System MTAS Modular Target Acquisition System MTAS Multifunctional Telecommunications Addressing System (NANP) MTAS Metallic Test Access Shelf MTAS Multisensor Target Acquisition System ) routing email at Internet gateways. Challenges for Current Antispam Technologies Effectiveness and accuracy of antispam solutions vary depending on the specific approach, product, and customer settings as well as their ability to respond to spammers evasion EVASION. A subtle device to set aside the truth, or escape the punishment of the law; as if a man should tempt another to strike him first, in order that he might have an opportunity of returning the blow with impunity. efforts. For example, faster distribution of spam with different subject lines poses challenges for honey pot signature-based solutions, which take time to identify spam and create and distribute exact signatures of the spam received before they can begin blocking spam. Spoofing of email addresses to make spam appear to be sent from someone else poses challenges for blacklistng/whitelisting techniques. The use of randomized ran·dom·ize tr.v. ran·dom·ized, ran·dom·iz·ing, ran·dom·iz·es To make random in arrangement, especially in order to control the variables in an experiment. text and images instead of straight text in spam poses challenges for content analysis-based solutions. Another challenge for antispam solutions is the time required for administration and usage. According to IDC's study of the cost of spam and the value of antspam solutions, companies spend a considerable amount of time dealing with spam, even after antispam solutions are in place. Survey respondents indicated that email users spend 5 minutes and IT staffs spend 19 minutes on average every day dealing with spam. They perform a variety of tasks such as maintaining and updating the solution, reviewing suspected spam, tracking down false positives, and asking or answering questions about spam. To further reduce spam's impact on worker productivity and other resources, organizations will be looking for antispam solutions that provide appropriate levels of automated operation, end-user involvement, effectiveness, accuracy, and up-to-date detection of new forms of spam that have yet to be created. Recurrent Pattern Detection Given the nature of spam, it is important to consider an innovative antispam technology--Recurrent Pattern Detection. This approach to detecting spam relies on the fact that spammers send spam in bulk over a relatively short period of time to satisfy their own or their customers' business needs. Using a set of sophisticated algorithms applied to analyzing Internet traffic Internet traffic is the flow of data around the Internet. It includes web traffic, which is the amount of that data that is related to the World Wide Web, along with the traffic from other major uses of the Internet, such as electronic mail and peer-to-peer networks. in key points around the world for repetitive components in multiple emails, RPD RPD Rapid RPD Radiation Protection Dosimetry RPD Rapid Product Development RPD Rochester Police Department RPD Recurrent Pattern Detection (Commtouch anti-spam engine) RPD Relative Percent Difference RPD Removable Partial Denture is able to trace a spam outbreak as soon as it begins. The ability to trace spam in the first few minutes of an outbreak is critical in preventing spam from reaching email user inboxes during the early portion of an outbreak of tens or hundreds of millions of spam messages before detection begins. Once an outbreak is detected, RPD then creates and stores a spam "DNA DNA: see nucleic acid. DNA or deoxyribonucleic acid One of two types of nucleic acid (the other is RNA); a complex organic compound found in all living cells and many viruses. It is the chemical substance of genes. " or hash pattern in its Spam Pattern Repository.Spam detection engines at customer sites, submit the DNA of inbound emails as queries to this repository, which can start classifying email as spam in the first few minutes of an outbreak. Fig 2.0. Local copies of the Spam Pattern Repository or caches of completed queries located at customer sites minimise response times. [FIGURE 2 OMITTED] Several advantages of the RPD approach enable it to detect and block spam in the first few minutes of an outbreak, unlike other antispam approaches. First its proactive, real-time analysis of internet email traffic and responses to queries from spam detection engines minimises delays. Second, its reliance on detecting spam outbreaks, rather than individual spam characteristics, means that spam and nothing but spam associated with bulk mailings are blocked. www.idc.com |
|
||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion