Link Details

Link 9861 thumbnail
User 208964 avatar

By orcadk
via improve.dk
Published: Dec 27 2006 / 11:58

Is your name William? Do you normally write mails with the subject "Re: hi"? Are your mails usually 11,304 characters in length? Guess what, you're a spammer! I present to you, an article containing a textual analysis of about 15,000 spam mails.
  • 10
  • 0
  • 1299
  • 247

Comments

Add your comment
User 200670 avatar

coboldinosaur replied ago:

0 votes Vote down Vote up Reply

An interesting read. Good research, but the author has too much time on his hands if he is reading throungh that many spams. ;^)

User 213320 avatar

bjupton replied ago:

0 votes Vote down Vote up Reply

learn perl, my friend. knocking together the scripts to do such an analysis would be very fast.

User 208964 avatar

orcadk replied ago:

1 votes Vote down Vote up Reply

I'm the author... And yes, I do have a lot of time on my hands, luckily I've got a swarm of midgets that I outsource the task of reading through the spam to ;)

@bjupton
Why in the world should I learn Perl? This was done first by exporting all the mails from Outlook to a SQL Server database using an Outlook adding written in .NET (taking care of various issues such as NULL bodies, subjects and so forth). After this the data was calculated using a combination of SQL queiries and custom traversal of all the records using .NET code, didn't take that long to write the code itself, what takes the time is looping through 15k mails while performing all these tests. Perl wouldn't change anything here. I might post the code for doing the analysis in a next step :)

User 213320 avatar

bjupton replied ago:

0 votes Vote down Vote up Reply

@orcadk
Yeah, you know, the little indentation to my post compared to coboldinosaur indicates that it was a response to his post. Obviously you could use a variety of tools to do this, but the first one that I would think of is perl since it is the supreme master of text processing and would do this very easily. I don't think he was being serious by saying that you actually read through the mail.

So lighten up, ok? This was a pretty neat analysis.

User 208964 avatar

orcadk replied ago:

0 votes Vote down Vote up Reply

Thanks, and sorry if I sounded grumpy, it wasn't my intention :)

User 213320 avatar

bjupton replied ago:

0 votes Vote down Vote up Reply

As pennance, learn perl! ;-)

User 208964 avatar

orcadk replied ago:

0 votes Vote down Vote up Reply

Sorry, I already know about System.Text.RegularExpressions... Perl ;)

Add your comment


Html tags not supported. Reply is editable for 5 minutes. Use [code lang="java|ruby|sql|css|xml"][/code] to post code snippets.

Voters For This Link (10)



Voters Against This Link (0)