So I went poking around the comcast web mail interface and found and option to turn on Spam warning messages… So now every time I mark a message as spam I get this:
You are currently reporting this message as Spam. Please note, the spam filters that Comcast have in place will not immediately block any given sender, rather they look for patterns in submissions. Please click the Report button to report this message as Spam.
I still think they are just deleting the messages and not doing anything with them.
I have a comcast email account, but never gave it out (nor did I ever give out my adelphia.net account), so I never check the account. well tonight I logged in to it for the first time in months and had 374 spam messages in my inbox. Their Spam catcher sucks. What is even worse, is their “Report as Spam” button points to the same URL as the “Delete” button does no wonder there is so much spam in there if there is no way to build a Bayesian filter.. Nice job comcast!
After looking in a little further, it looks like they use Symantec Brightmail for the filtering…
I rewrote the spam statistics stuff the other night and now it is reporting a more “accurate” statistics.. Just for yesterday alone:
|Mail Statistics for 2006-12-03
|Total Number of Messages:
||262,685 (8.53 %)
||24,511 (0.80 %)
||511,046 (16.59 %)
||1,609,575 (52.26 %)
||0 (0.00 %)
||672,299 (21.83 %)
|Total Bad Messages:
||2,817,431 (91.47 %)
Was looking at one of our incoming servers tonight for yesterday’s mail stats, here is what I found:
||Percent of Total
Now if you add up the Aborted/Discarded/Quarantined/Rejected you get 462,908 messages, which gives you about 92% of the mail coming in was listed as spam. And this is just for 1 of 7 incoming servers we have. If I were to expand this by 7 (assuming all were equal, which they are not), then there would be 3.5 million messages coming in and 3.2 million spam messages.
And based on this article on CNN looks like we are right on the money. If only we could block all gif images in email I think we could get the number a little higher on the catch rate.
Earlier I had started a job to clean up the PostgreSQL database on our main spam machine, here are some stats:
To reindex 6.4 million messages, 7+ hours
To reimport the header information of those messages back in to PostgreSQL: 38 hours 28 minutes, 29 seconds.
What is the outcome of this? The Spam DB went from 80+ gig down to 7.5Gig and the searchs are MUCH faster now. Hopefully this will fix some of the problems we have been having over the last week with people trying to release their spam messages.