.\" Automatically generated by Pod::Man 2.25 (Pod::Simple 3.16) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "SA-LEARN-CYRUS 8" .TH SA-LEARN-CYRUS 8 "2011-11-10" "perl v5.14.2" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" sa\-learn\-cyrus \- Train Spamassassin with spam/ham from user's imap mailboxes .SH "USAGE" .IX Header "USAGE" sa-learn-cyrus [ options ] user\-name(s) .PP .Vb 1 \& user\-name(s) One ore more user/mailbox name(s). \& \& options: \& \-\-help Prints a brief help message and exits. \& \-h \& \& \-\-man Prints the manual page and exits. \& \& \-\-verbose level Be verbose if level > 0 \& \-v level \& \& \-\-config file Use a configuration file other than the default \& \-c file one. \& \& \-\-sa\-debug Run sa\-learn in debug mode. \& \-d \& \& \-\-simulate Run in simulation mode (show commands only). \& \-s \& \& \-\-imap\-domains domains Search mailboxes in list of domains. \& \-D domains .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" \&\fBsa-learn-cyrus\fR feeds spam and non-spam (ham) messages to Spamassassin's database. Its main purpose is to train \s-1SA\s0's bayes database with spam/ham messages sorted by the mailbox owners into special subfolders. .PP It is intended to be used on smal mail systems (e.g. home office) with a single server-wide \s-1SA\s0 configuration. .PP Launching \fBsa-learn-cyrus\fR at regular intervalls (cron job) may improve \s-1SA\s0's hit rate considerably, provided that the users are well instructed what to move to their ham/spam folders and what not. .SH "FUNCTION" .IX Header "FUNCTION" \&\fBsa-learn-cyrus\fR scans local mail spools as used by Cyrus IMAPd for special subfolders. These subfolders are supposed to contain mails which have been classfied as spam or ham by the mailbox owners. .PP Example: The users move spam mails which have not been tagged as spam by SpamAssassin (false positives) to a subfolder \fI\s-1INBOX\s0.Learn.Spam\fR. Other mails, which may be classified by \s-1SA\s0 as spam in the future because of certain characteristics are copied to a subfolder \fI\s-1INBOX\s0.Learn.Ham\fR. .PP \&\fBsa-learn-cyrus\fR feeds the content of these spam/ham folders to \s-1SA\s0's Bayes database using the \fBsa-learn\fR tool which is shipped with the Spamassassin package. .PP Afterwards these mails are deleted (optionally) by means of \fBipurge\fR which is a helper tool coming along with the Cyrus IMAPd package. .SH "ARGUMENTS" .IX Header "ARGUMENTS" \&\fBsa-learn-cyrus\fR optionally takes a list of mailbox/user names as agruments: .PP .Vb 1 \& sa\-learn\-cyrus fred wilma fritz hjb .Ve .PP If not supplied all mailboxes found will be handled. .SH "OPTIONS" .IX Header "OPTIONS" All options supplied on the comand line will override corresponding parameters given in the configuration file. .PP Please note that the basic parameters of sa-learn-cyrus have to be defined in a configuration file. sa-learn-cyrus cannot be controlled solely by means of command. .IP "\fB\-\-config file, \-c file\fR" 4 .IX Item "--config file, -c file" Use configuration file other then the default one. Always adopt the configuartion file to your needs before using sa-learn-cyrus on a live system. Otherwise you may loose data or corrupt your \s-1SA\s0 data base! .IP "\fB\-\-verbose level, \-v level\fR" 4 .IX Item "--verbose level, -v level" Specify level of verbosity. (Default = 0) .IP "\fB\-\-sa\-debug, \-d\fR" 4 .IX Item "--sa-debug, -d" Run sa-learn in debug mode. This may be useful to examine problems with sa-learn. .IP "\fB\-\-simulate, \-s\fR" 4 .IX Item "--simulate, -s" Run \fBsa-learn-cyrus\fR in simulation mode. This is useful for first tests after initial configuration or if problem are encountered. In simulation mode \&\fBsa-learn-cyrus\fR doesn't execute any system commands nor does it touch any data. It just displays what it would do. .IP "\fB\-\-imap\-domains list-of-domains, \-D list-of-domains\fR" 4 .IX Item "--imap-domains list-of-domains, -D list-of-domains" If your Cyrus installation uses the \*(L"domain support\*(R" you may use this option to tell what domains you want to be searched. .Sp .Vb 1 \& \-\-domains example.com,another.org .Ve .Sp is equivalent to .Sp .Vb 4 \& [imap] \& ... \& domains = example.com another.org \& ... .Ve .Sp in the configuration file. .SH "CONFIGURATION" .IX Header "CONFIGURATION" By default \fBsa-learn-cyrus\fR expects its configuration file as \&\fI/etc/sapmasassin/sa\-learn\-cyrus.conf\fR. .PP One has to change this setting in the code, if another default file is wanted. Another than the default file can always be choosen with the \f(CW\*(C`\-\-config option\*(C'\fR. .PP A sample configuration file is shipped with sa-learn-cyrus. .SS "Format" .IX Subsection "Format" The configuration file has a format as knwon from rsync or samba is very similar to the format of Windows ini files. The file consist of sequence sections. The begin of each section is designated with a section name, a word in square brackets, e.g. \f(CW\*(C`[global]\*(C'\fR. The section entries consist of parameters, which are key/value pairs each on a single line. Key an value are separated by an equal sign like .PP .Vb 1 \& key = value .Ve .PP The value is a single word or a list of words each of them representing a number or a string. Words may be surrounded ba any number of spaces for better readability. Empty lines and lines with a leading hash character \f(CW\*(C`#\*(C'\fR are ingored. .SS "Section [global]" .IX Subsection "Section [global]" The [global] section contains all global controll parameters. .IP "\fBtmp_dir = temporary-directory\fR" 4 .IX Item "tmp_dir = temporary-directory" \&\fBsa-learn-cyrus\fR creates some temporary files during each run. This is the directory where thes files are created. .IP "\fBlock_file = full-path-to-lock-file\fR" 4 .IX Item "lock_file = full-path-to-lock-file" To avoid race conditions, \fBsa-learn-cyrus\fR uses a simple file locking mechanism. Each new sa-learn-cyrus process looks for this file before it realy does anything. If this file exists, the process exits with a warning, assuming that another sa-learn-cyrus process is running. .IP "\fBverbose = level\fR" 4 .IX Item "verbose = level" The level of verbosity. Values range from 0 (low) to 3 (high). A reasonable level to start with is 1. .IP "\fBsimulate = yes|no\fR" 4 .IX Item "simulate = yes|no" \&\fBsa-learn-cyrus\fR should be run in simulation mode (\f(CW\*(C`simulate = yes\*(C'\fR) after the first customization of the configuration to avoid loss of data or corruption of \s-1SA\s0's database in case of wrongly configured parameters. .IP "\fBlog_with_tag = yes|no\fR" 4 .IX Item "log_with_tag = yes|no" Prepend the ouput (log) with a tag (date, time, pid). Set to \f(CW\*(C`no\*(C'\fR to avoid additional tagging when piped to syslog. Default is \f(CW\*(C`yes\*(C'\fR. .SS "Section [mailbox]" .IX Subsection "Section [mailbox]" Section [mailbox] contains all parameters to select the mailboxes, to specify the special subfolders, and to define the actions to apply. .IP "\fBinclude_list = list-of-mailboxes\fR" 4 .IX Item "include_list = list-of-mailboxes" Only spam/ham mails of these mailboxes are fed to Spamassassin's database. If this List ist empty, all mailboxes will be used. \f(CW\*(C`include_list\*(C'\fR may be used instead of the list on the command line. .Sp Example: .Sp .Vb 1 \& include_list = fred wilma fritz hjb .Ve .IP "\fBinclude_regexp = regular-expression\fR" 4 .IX Item "include_regexp = regular-expression" If include_list is empty, a regular expression given here is applied to all mailbox names to select mailboxes. This parameter is ignored if include_list is not empty. .Sp Example: Include all mailboxes beginning with 'knf\-'. .Sp .Vb 1 \& include_regexp = ^knf\- .Ve .IP "\fBexclude_list = list-of-mailboxes\fR" 4 .IX Item "exclude_list = list-of-mailboxes" A list of mailboxes wich will be excluded. If include_list is not empty, this parameter is ignored. .IP "\fBexclude_regexp = regular-expression\fR" 4 .IX Item "exclude_regexp = regular-expression" Mailbox names which match with this regular expresson are excluded from processing. .Sp Example: Ignore all mailboxes ending with '.beie' .Sp .Vb 1 \& exclude_regexp = \e.beie$ .Ve .IP "\fBspam_folder = folder-name\fR" 4 .IX Item "spam_folder = folder-name" The name of the special subfolder in each mailbox which contains spam. The name should be a complete folder path relative to the root folder \s-1INBOX\s0. The Cyrus nomenclature is applied (same as with cyradm). .Sp Example: .Sp .Vb 1 \& spam_folder = Learn.Spam .Ve .Sp This is a subfolder in a folder tree like this: .Sp .Vb 8 \& INBOX \& +\-\-Drafts \& +\-\-Templates \& +\-\-Sent \& +\-\-Learn \& | +\-\-Ham \& | +\-\-Spam <\-\- spam subfolder \& | .Ve .IP "\fBham_folder = folder-name\fR" 4 .IX Item "ham_folder = folder-name" The name of the special subfolder in each mailbox which contains ham. (Same naming scheme as with \f(CW\*(C`spam_folder\*(C'\fR, see above.) .IP "\fBremove_spam = yes|no\fR" 4 .IX Item "remove_spam = yes|no" Are the spam messages in the \f(CW\*(C`spam_folder\*(C'\fR to be removed after feeding them to the \s-1SA\s0 database or not? .IP "\fBremove_ham = yes|no\fR" 4 .IX Item "remove_ham = yes|no" Are the ham messages in the \f(CW\*(C`ham_folder\*(C'\fR to be removed after feeding them to the \s-1SA\s0 database or not? .SS "Section [sa]" .IX Subsection "Section [sa]" Spamassassin (\s-1SA\s0) configuration items. .IP "\fBsite_config_path = path\fR" 4 .IX Item "site_config_path = path" Path to system-wide \s-1SA\s0 preferences. .Sp Example: .Sp .Vb 1 \& site_config_path = /etc/spamassassin .Ve .IP "\fBbayes_storage = berkely|sql\fR" 4 .IX Item "bayes_storage = berkely|sql" Bayes storage mechanism (berkely|sql) .Sp berkely: Berkely \s-1DB\s0 (default) .Sp sql: \s-1SQL\s0 Database .IP "\fBprefs_file = file\fR" 4 .IX Item "prefs_file = file" Path of the system-wide \s-1SA\s0 configuartin file. .Sp Example: .Sp .Vb 1 \& prefs_file = /etc/spamassassin/local.cf .Ve .IP "\fBlearn_cmd = path\fR" 4 .IX Item "learn_cmd = path" Path to the sa-learn utility. .Sp Example: .Sp .Vb 1 \& learn_cmd = /usr/bin/sa\-learn .Ve .IP "\fBfix_db_permissions = yes|no\fR" 4 .IX Item "fix_db_permissions = yes|no" Should permissions of \s-1DB\s0 files be fixed? Ignored unless \f(CW\*(C`bayes_storage = berkely\*(C'\fR .IP "\fBuser = user-id\fR" 4 .IX Item "user = user-id" The user id \s-1SA\s0 runs with. Required if \f(CW\*(C`fix_db_permissions = yes\*(C'\fR. .Sp Example: .Sp .Vb 1 \& user = mail .Ve .IP "\fBgroup = group-id\fR" 4 .IX Item "group = group-id" The group id \s-1SA\s0 runs with. Required if \f(CW\*(C`fix_db_permissions = yes\*(C'\fR. .Sp Example: .Sp .Vb 1 \& group = mail .Ve .IP "\fBsync_once = yes|no\fR" 4 .IX Item "sync_once = yes|no" Skip synchronization after every change of database, but sync once after all messages have been learned. May speed up learning from many folders. Default is \f(CW\*(C`yes\*(C'\fR. .IP "\fBvirtual_config_dir = pattern\fR" 4 .IX Item "virtual_config_dir = pattern" Use this if you use the \f(CW\*(C`\-\-virtual\-config\-dir\*(C'\fR option of \f(CW\*(C`spamd\*(C'\fR (it needs to match exactly). See the \f(CW\*(C`spamd\*(C'\fR man page for more information. .IP "\fBdebug = yes|no\fR" 4 .IX Item "debug = yes|no" Run sa-learn in debug mode or not. \f(CW\*(C`debug = yes\*(C'\fR may be useful to examine problems. .SS "Section [imap]" .IX Subsection "Section [imap]" The section [imap] contains the necessary configuration parameter to locate an manage the (Cyrus) IMAPd spool files. .IP "\fBbase_dir = dir\fR" 4 .IX Item "base_dir = dir" The root of the base directory of the \s-1IMAP\s0 spool (below that the mailboxes are located). .IP "\fBinitial_letter = yes|no\fR" 4 .IX Item "initial_letter = yes|no" If base_dir is divided in subdirectories named with the initial letters of mailbox names set \f(CW\*(C`initial_letter = yes\*(C'\fR (default), otherwise choose no. .Sp Examples for joe's mailbox: .Sp .Vb 2 \& /j/user/joe/ : initial_letter = yes \& /user/joe/ : initial_letter = no .Ve .IP "\fBdomains = list-of-domains\fR" 4 .IX Item "domains = list-of-domains" If your Cyrus spool uses domain hierarchy supply a list of domains. If domain support is not used leave this entry empty. The \f(CW\*(C`initial_letter\*(C'\fR option (see above) is applied to domains, too. .Sp Example for mailboxes fritz@bar.org and joe@foo.com : .Sp The mail files within the Cyrus spool are located at .Sp .Vb 2 \& /domain/b/bar.org/f/fritz \& /domain/f/foo.com/j/joe .Ve .Sp List the domains as .Sp .Vb 1 \& domains = foo.com bar.org .Ve .IP "\fBunixhierarchysep = yes|no\fR" 4 .IX Item "unixhierarchysep = yes|no" Choose \f(CW\*(C`unixhierarchysep = yes\*(C'\fR if Cyrus is configured to accept usernames like 'hans.mueller.somedomain.tld'. Otherwise set \f(CW\*(C`unixhierarchysep = no\*(C'\fR. .IP "\fBpurge_cmd = path-to-command\fR" 4 .IX Item "purge_cmd = path-to-command" The path to the Cyrus \fBipurge\fR utility for purging mail messages. .Sp Example: .Sp .Vb 1 \& purge_cmd = /usr/sbin/ipurge .Ve .IP "\fBuser = user\fR" 4 .IX Item "user = user" The user Cyrus-IMAPd runs as. .Sp Example: .Sp .Vb 1 \& user = cyrus .Ve .SH "FILES" .IX Header "FILES" \&\fI/etc/spamassassin/sa\-learn\-cyrus.conf\fR .SH "SEE ALSO" .IX Header "SEE ALSO" \&\f(CW\*(C`sa\-learn(1)\*(C'\fR, \f(CWspamassassin(1)\fR, \f(CWMail::SpamAssassin(3)\fR, \&\f(CWMail::SpamAssassin::Conf(3)\fR, \f(CWimapd(8)\fR, \f(CWspamd(8)\fR .PP The current version of this script is available at http://www.pollux.franken.de/mail\-server\-tools/sa\-learn\-cyrus/ .SH "PREREQUISITES" .IX Header "PREREQUISITES" \&\fBsa-learn\fR (part of the SpamAssassin package), \fBipurge\fR (part of Cyrus IMAPd) .SH "AUTHOR" .IX Header "AUTHOR" Hans-Juergen Beie .SH "COPYRIGHT AND LICENSE" .IX Header "COPYRIGHT AND LICENSE" Copyright 2004\-2011 by Hans-Juergen Beie. .PP This program is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0 (http://foundation.perl.org/legal/licenses/artistic\-2_0\-plain.html ) or the \s-1GNU\s0 General Public License as published by the Free Software Foundation; either version 2 of the license (http://www.gnu.org/licenses/old\-licenses/gpl\-2.0.html ), or (at your option) any later version. .SH "DISCLAIMER" .IX Header "DISCLAIMER" This program is distributed in the hope that it will be useful, but \s-1WITHOUT\s0 \s-1ANY\s0 \&\s-1WARRANTY\s0; without even the implied warranty of \s-1MERCHANTABILITY\s0 or \s-1FITNESS\s0 \s-1FOR\s0 A \&\s-1PARTICULAR\s0 \s-1PURPOSE\s0. .SH "ACKNOWLEDGMENTS" .IX Header "ACKNOWLEDGMENTS" Thanks to Robert Carnecky and Jan Hauke Rahm for testing and suggestions for the implementation of the domain support. David Caldwell contributed the the virtual_config_dir feature. Some other contributers are listed in the \&\s-1CHANGELOG\s0. Many thanks to them for their help and suggestions.