 |
[Design] filters: filter on raw/stripped text switch
Kaitlin Duck Sherwood
Thu, 07 Nov 2002 18:09:15 -0800
I would like to request the ability to filter on either the raw text
of the message *or* on the stripped (plain text) version. If
Chandler uses an adaptive filter of some sort, it needs to consider
raw and stripped as well.
Why?
Because some spammers are now starting to insert HTML comment tags
inside their ads, e.g.:
Ma<!-- cow -->ke Mone<!-- pig -->y Fas<!-- chicken -->t!
Okay, so why not only look for items in the raw text?
Because there is very useful information in the tags. For example,
messages that contain embedded images -- which use the IMG tag -- are
highly likely to be spam. (I haven't checked all languages, but the
only English word that contains IMG is Primghar, Iowa.)
Similarly, almost all messages that contain
<iframe src=cid:{something}>
(or differently-spaced permutations thereof) are viruses (trying to
exploit a weakness in Outlook to execute {something} without the user
having to click on it).
So it is very useful to be able to look at both the raw and stripped text.
Possible alternative: it might be possible to find comment-munged
spam merely by calculating the ratio of {the number of comments} to
the {length of the message}. Spammers could get around that by
appending word salads to the end of the messages, however.
P.S. I think I'm done dumping my brain now. *pant pant pant* You
should see far fewer postings from me from now on.
--
Kaitlin Duck Sherwood
Author of the _Overcome Email Overload_ series, http://www.EmailOverload.com
|