By orcadk
via improve.dk
Published: Dec 27 2006 / 11:58
Is your name William? Do you normally write mails with the subject "Re: hi"? Are your mails usually 11,304 characters in length? Guess what, you're a spammer! I present to you, an article containing a textual analysis of about 15,000 spam mails.
Comments
coboldinosaur replied ago:
An interesting read. Good research, but the author has too much time on his hands if he is reading throungh that many spams. ;^)
bjupton replied ago:
learn perl, my friend. knocking together the scripts to do such an analysis would be very fast.
orcadk replied ago:
I'm the author... And yes, I do have a lot of time on my hands, luckily I've got a swarm of midgets that I outsource the task of reading through the spam to ;)
@bjupton
Why in the world should I learn Perl? This was done first by exporting all the mails from Outlook to a SQL Server database using an Outlook adding written in .NET (taking care of various issues such as NULL bodies, subjects and so forth). After this the data was calculated using a combination of SQL queiries and custom traversal of all the records using .NET code, didn't take that long to write the code itself, what takes the time is looping through 15k mails while performing all these tests. Perl wouldn't change anything here. I might post the code for doing the analysis in a next step :)
bjupton replied ago:
@orcadk
Yeah, you know, the little indentation to my post compared to coboldinosaur indicates that it was a response to his post. Obviously you could use a variety of tools to do this, but the first one that I would think of is perl since it is the supreme master of text processing and would do this very easily. I don't think he was being serious by saying that you actually read through the mail.
So lighten up, ok? This was a pretty neat analysis.
orcadk replied ago:
Thanks, and sorry if I sounded grumpy, it wasn't my intention :)
bjupton replied ago:
As pennance, learn perl! ;-)
orcadk replied ago:
Sorry, I already know about System.Text.RegularExpressions... Perl ;)
Voters For This Link (10)
Voters Against This Link (0)