Apache SpamAssassin is the open-source anti-spam software used by most Linux based email systems to filter and block unsolicited emails. It is a complex system that is now maintained by the Apache Software Foundation. Apache SpamAssassin is modular and allows for the use of custom tests and score adjustment.
How does Apache SpamAssassin arrive at the spam score?
Each email sent to Apache SpamAssassin for checking can go through up to 600 individual tests. All-in-all, Apache SpamAssassin has over 1000 tests available, but not all are used. Checks run in priority and some checks are skipped if certain other checks succeed or fail. Each check may add or subtract to an email’s score. This per check score is usually very small, typically between ±0.01 and ±0.5. Some of the more obvious tests will add or subtract 1.0 to 2.5 to an email and you can also create a custom score of up to +100 (all email is spam) or -100 (no email is spam).
As an email gets tested, each test score is added to a running total. When this total reaches the Spam Threshold as set by the Administrator or the User, the email is marked as spam. The standard score is set at 5.0. The higher you set this, the more “spammy” email is allowed through and the lower this is set, the more chances there are that valid email gets blocked.
Apache SpamAssassin has rulesets that are downloadable and get updated regularly as the nature of junk mail changes. These rules can be very simple or very complex. You can even set up certain rule combinations. The individual rules may not score much, but in combination, they get a score boost. If done well, this increases the accuracy greatly. Here is an example:
- If you find the word “money” or “dollars” in an email, it may not mean it is spam. Score it 0.01.
- If you find the word “win” or “won” in an email, it also does not mean it is spam. Score it 0.01.
- If you find the word “millions” in an email, it does not mean it is spam. Score it 0.01.
- If we find the combination of these words and variants in an email, it has a large chance of being spam. Score it 2.5.
A rule involving the above would have a “regular expression” telling Apache SpamAssassin to check the BODY of the email for money and/or dollars & win or won & millions in any order.
Why are there negative scores in tests?
Negative scores “improve the credit rating” of an email. Let us say you are sending a “Special Discount” email to all your customers. Normally, these sort of “Special Offer” emails get an increased spam score. Luckily for you, your clients have added your email to their mailing “whitelist”. Now Apache SpamAssassin tests to see if your email address is in the whitelist, the answer is “YES” and the test “adds” -20 to your email score, which was sitting at 7.2. Your email now scores -12.8 which is less than 5.0 and the email is allowed to be delivered.
Where does the email get blocked?
Apache SpamAssassin itself never blocks email. It only makes a recommendation and gives a score. Your server’s Mail Transport Agent is then left to decide what to do. Often, spam is specially marked with “***SPAM***” in the subject. This allows you to filter these emails in Outlook and never risk losing an email that should not have been marked as spam (false positive). Another approach is to push all “Spam” into a Spambox – a special mailbox just for spam. This mailbox is usually in the end-user’s email folders and also allows the end-user full control. The last approach, more common with outbound spam checking, is to deny or drop mail marked as spam. This causes the sender of the email to get a message informing them that their email has been blocked.
Even though the insides of Apache SpamAssassin are far from simple, it boils down to the score you choose.
Coming soon: “How to Set up Apache SpamAssassin™ in cPanel”