Avoiding the Spam-Bots

Joseph Pelrine
MetaProg

Version 1.2
February 4, 2003

Visit MetaProg.

A while back, a client's web site was hit by a spam-bot looking to cull email addresses. Soon after, all the email addresses posted on the site started receiving mail offering free credit cards, free porn, etc. They asked me if I knew some way to protect the addresses while still letting people use the mail links. They didn't want a cgi script, but just some obfuscation in the html.

While grumbling under my breath about them being really stupid for posting plain-text email addresses on an indexed web site, I smiled at them, said I'd look at what I could do, and went home to write the following code.

I found a number of solutions on the web, all of which involved putting a large chunk of JavaScript code in the body of the web page - one chunk for each address. Not only were these ugly, and not as easily reuseable as a function call, they all used the document.write functionality which acts strange in some browsers.

Technique 1 - JavaScript

My solution consists of 2 parts. First, there's a JavaScript function which you should include in the head of your HTML document, or in an external JS file. The link's HREF action needs to call this function. Here's the code:

   <script language="JavaScript">
   <!--
   function sendMailTo(name, company, domain) {
      locationstring = 'mai' + 'lto:' + name + '@' + company + '.' + domain;
      window.location.replace(locationstring);
   }
   //-->
   </script>

You can simply cut this out of this page and paste it into your document. Once it's there, set up your mail link to call this function:

   <a href="javascript:sendMailTo('fred.flintstone','bedrock','com')">fred.flintstone@bedrock.com</a>

If you need to pack some more info into your mail, you can use these extended forms of the sendMailTo() function:

   function sendAnnotatedMailTo(name, company, domain, subject, body) {
locationstring = 'mai' + 'lto:' + name + '@' + company + '.' + domain + "?subject=" + escape(subject) + "&body=" + escape(body); window.location.replace(locationstring);
} function sendFullMailTo(name, company, domain, subject, cc, bcc, body) {
locationstring = 'mai' + 'lto:' + name + '@' + company + '.' + domain + "?cc=" + cc + "&bcc=" + bcc + "&subject=" + escape(subject) + "&body=" + escape(body); window.location.replace(locationstring);
}

Then, copy one of these examples to call the appropriate function:

   <a href="javascript:sendAnnotatedMailTo('fred.flintstone','bedrock','com','the subject','body text')">fred.flintstone@bedrock.com</a>
   <a href="javascript:sendFullMailTo('fred.flintstone','bedrock','com','the subject','cc@mail.com','bcc@mail.com','body text')">fred.flintstone@bedrock.com</a>

This technique eliminates the mailto: URL that most spambots look for. Unfortunately, the latest generation of spambots seems to use regular expression matching to look for text combining @ with .com, .org, etc. These spambots will find and eat up the plain-text address listed above, so we need a second technique to obfuscate it.

Technique 2 - Character Encoding

For this, we take advantage of the fact that every character can be represented in HTML in two ways - an "HTML-encoded" format, and a"URL (or escape)-encoded" format. Let's look at the HTML-encoded format first.

HTML Encoding

This format lets us display characters which would otherwise be interpreted as HTML control characters. The format consists of the two characters &#, the ASCII value of the character, and the character ;. The letter f would be URL-encoded as &#102;, and the email address fred.flintstone@bedrock.com would look like this (I put in a line break here. It's better to keep it all in one line):

   &#102;&#114;&#101;&#100;&#46;&#102;&#108;&#105;&#110;&#116;&#115;&#116;&#111;&#110;&#101;&#64;
   &#98;&#101;&#100;&#114;&#111;&#99;&#107;&#46;&#99;&#111;&#109;

Of course, this is a pain to calculate, so I wrote a little conversion routine which you can use. Type the email address you want to encode into the text box below, press Convert, and - voila! - there's your URL-encoded form. Copy this out, and paste it into your HTML page. When you're done, your mail link should look something like this (the line break are my fault):

   <a href="javascript:sendMailTo('fred.flintstone','bedrock','com')">&#102;&#114;&#101;&#100;&#46;
	&#102;&#108;&#105;&#110;&#116;&#115;&#116;&#111;&#110;&#101;&#64;&#98;&#101;&#100;&#114;
	&#111;&#99;&#107;&#46;&#99;&#111;&#109;</a>

Of course, you also have the option of not using the JavaScript function, but keeping the mailto: URL in the HREF, and URL-encoding it:

   <a href="&#109;&#97;&#105;&#108;&#116;&#111;&#58;&#102;&#114;&#101;&#100;&#46;&#102;&#108;
	&#105;&#110;&#116;&#115;&#116;&#111;&#110;&#101;&#64;&#98;&#101;&#100;&#114;&#111;&#99;
	&#107;&#46;&#99;&#111;&#109;">&#102;&#114;&#101;&#100;&#46;&#102;&#108;&#105;&#110;&#116;
	&#115;&#116;&#111;&#110;&#101;&#64;&#98;&#101;&#100;&#114;
	&#111;&#99;&#107;&#46;&#99;&#111;&#109;</a>

URL Encoding

URL (or escape) encoding is used to transmit URL, queries etc., over the HTTP protocol. The two terms are used interchangeably - the only difference is that URL encoding translates a white space as "+" and escape encoding translates as "%20". Escape encoding is called such because it is implemented in the JavaScript escape()-unescape() pair of embedded functions. This type of encoding converts each character to its hexadecimal equivalent, and precedes it with a %.

Applying escape encoding to our example email address results in the following (once again, the line breaks are from me):

   %66%72%65%64%2E%66%6C%69%6E%74%73%74%6F%6E%65%40%62%65%64%72%6F%63%6B%2E%63%6F%6D

Any place you can use HTML encoding, you can use escape encoding, so feel free to mix and match as you see fit! If you have any comments, you can reach me at jpelrine@metaprog.com. And - of course the previous address is encoded<grin>!

Cheers
Joseph Pelrine

Back to MetaProg.


Enter your email address (or other text) here
Press the magic button ->
Copy the HTML-encoded text into your HTML page
or copy the URL-encoded text into your HTML page