Learning with rspamd

anatoli · March 29, 2022, 9:42pm

What’s should I feed rspamd to make it learn ham? Does it accept a .eml file or does it work with copy-pasted content/text?

anatoli · March 29, 2022, 9:49pm

Raw source, so basically a .eml file downloaded through Roundcube works fine. Copy paste contents of the raw .eml file into the learn box on rspamd, “scan message” and “upload HAM” does the trick.

Instructions are clear on the Scan/Learn page:

msaladna · March 29, 2022, 9:52pm

You can drag and drop email into your Spam folder or use rspamc learn_ham to feed directly from a Maildir (~/Mail/.MAILBOX-NAME/cur).

On my mail server, since I’m the only consumer of mail, I’ve changed the behavior that any mail sent to Trash is automatically trained.

cpcmd scope:set cp.bootstrapper dovecot_learn_spam_folder "{{ dovecot_imap_root }}Trash"
upcp -sb mail/configure-dovecot

anatoli · March 29, 2022, 9:57pm

I’m looking to learn HAM, as there’s some dude that has the privacy note in the mail signature which increases the SPAM score for some reason. Removing that text work flawlessly instead…

The trash mail tip is pretty interesting though, love it! I see that the blog post has pretty interesting infos too, will definitely give rspamc learn_ham a try.

msaladna · March 29, 2022, 10:24pm

Keep a pristine inbox of mail, for me it’s the primary inbox. Feed this back to rspamd through rspamc learn_ham (or learn_spam). Remember that spam and ham needs to be balanced for Bayes to be most effective. If 500 hams are fed, then 500 spams should be fed as well to reduce errors.

Enron Corpus is still the best source of human-generated mail but works if email is predominantly English. Unzip, then run rspamc learn_ham path/to/emails.