Open Source Applications Foundation

[Design] filters: filter on raw/stripped text switch

Kaitlin Duck Sherwood Thu, 07 Nov 2002 18:09:15 -0800


I would like to request the ability to filter on either the raw text 
of the message *or* on the stripped (plain text) version.  If 
Chandler uses an adaptive filter of some sort, it needs to consider 
raw and stripped as well.

Why?

Because some spammers are now starting to insert HTML comment tags 
inside their ads, e.g.:
	Ma<!-- cow -->ke Mone<!-- pig -->y Fas<!-- chicken -->t!

Okay, so why not only look for items in the raw text?

Because there is very useful information in the tags.  For example, 
messages that contain embedded images -- which use the IMG tag -- are 
highly likely to be spam.  (I haven't checked all languages, but the 
only English word that contains IMG is Primghar, Iowa.)

Similarly, almost all messages that contain
	<iframe src=cid:{something}>
(or differently-spaced permutations thereof) are viruses (trying to 
exploit a weakness in Outlook to execute {something} without the user 
having to click on it).

So it is very useful to be able to look at both the raw and stripped text.

Possible alternative: it might be possible to find comment-munged 
spam merely by calculating the ratio of {the number of comments} to 
the {length of the message}.  Spammers could get around that by 
appending word salads to the end of the messages, however.



P.S.  I think I'm done dumping my brain now.  *pant pant pant*  You 
should see far fewer postings from me from now on.

-- 
Kaitlin Duck Sherwood
Author of the _Overcome Email Overload_ series, http://www.EmailOverload.com