Fighting spam and bots

Hey, I’m actually writing on a programming related concept!!

Pretty much anyone who runs a public blog/site (like Mark Langenfeld, Nick Davis, MySpace, Dream.in.Code, etc.) knows that spam can be a problem. Fortunately, there are ways to combat it. With Wordpress, there is Akismet which I’ve found very helpful in keeping my blog clean. My blog isn’t very big and doesn’t get many hits a month, so I can send off comments to a third party for review. But for larger sites with thousands of hits, the programmer in my wonders what could be done to minimize traffic, decrease waiting time all the while better filtering spam. Enter my idea which basically scrambles the form up for the client so that the various fields, options and values aren’t easily parsed by bots.

When a client requests a page, the server assigns the client a session ID, generates a mapping of form fields to random names, and stores the relation of random form field names to that session. Then sends the page with the randomized form fields to the client. The client now fills out the form and sends it back. Since humans can much more easily parse abstract structures (and they don’t know/care about the underlying code), the data should be consistent whereas a bot should only be able to guess (poorly) or fill out the same thing for each field. After that, validating each field then using a battery of tests of what it should contain would be fairly easy and should give a much better idea on what is legit and what is spam.

This is not without problems though. For one, the session might expire and so the server couldn’t remap the randomized form fields to their respective real names and thus the user couldn’t finish the process. For another, bots could eventually be made smarter and could better figure out a well known interface. For instance, the order in which the form fields occur, their relative position in relation to other document elements, and the things needed so that humans can understand it.

This being a rather new idea of mine, I’m still trying to figure out other strengths, weaknesses, and ways to address the issues. That and I don’t know if anyone has tried this before. I think that this could be a neat and effective way of filtering down spam if implimented well.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*