This chapter contains a tutorial introduction, guiding you
through various mailfromd
configurations, starting from the
simplest ones and proceeding up to more advanced forms. It omits
most complicated details, concentrating mainly on the
common practical tasks.
If you are familiar with mailfromd
, you can skip this
chapter and go directly to the next one (see MFL), which contains
detailed discussion of the mail filtering language and
mailfromd
interaction with the Mail Transport Agent.
Next: Simplest Configurations, Up: Tutorial [Contents][Index]
The mailfromd
utility runs as a standalone daemon
program and listens on a predefined communication channel for requests
from the Mail Transfer Agent (MTA, for short). When
processing each message, the MTA installs communication with
mailfromd
, and goes through several states, collecting the
necessary data from the sender. At each state it sends the relevant
information to mailfromd
, and waits for it to reply. The
mailfromd
filter receives the message data through
Sendmail macros and runs a handler program
defined for the given state. The result of this run is a response
code, that it returns to the MTA. The following response
codes are defined:
continue
Continue message processing from next milter state.
accept
Accept this message for delivery. After receiving this code the
MTA continues processing this message without
further consulting mailfromd
filter.
reject
Reject this message. The message processing stops at this stage, and the
sender receives the reject reply (‘5xx’ reply code). No
further mailfromd
handlers are called for this message.
discard
Silently discard the message. This means that MTA will
continue processing this message as if it were going to deliver it,
but will discard it after receiving. No further interaction with
mailfromd
occurs.
tempfail
Temporarily reject the message. The message processing stops at this
stage, and the sender receives the ‘temporary failure’ reply
(‘4xx’ reply code). No further mailfromd
handlers are called for this message.
The instructions on how to process the message are supplied to
mailfromd
in its filter script file. It is normally
called /usr/local/etc/mailfromd.mfl (but can be located elsewhere,
see Invocation) and contains a set of milter state handlers,
or subroutines to be executed in various SMTP states. Each
interaction state can be supplied its own handling procedure. A
missing procedure implies continue
response code.
The filter script can define up to nine milter state handlers,
called after the names of milter states: ‘connect’, ‘helo’,
‘envfrom’, ‘envrcpt’, ‘data’, ‘header’, ‘eoh’,
‘body’, and ‘eom’. The ‘data’ handler is invoked only
if MTA uses Milter protocol version 3 or later. Two special
handlers are available for initialization and clean-up purposes:
‘begin’ is called before the processing starts, and ‘end’ is
called after it is finished. The diagram below shows the control flow
when processing an SMTP transaction. Lines marked with
C:
show SMTP commands issued by the remote machine (the
client), those marked with ‘⇒’ show called handlers
with their arguments. An ‘[R]’ appearing at the start of a line
indicates that this part of the transaction can be repeated any number
of times:
⇒ begin() ⇒ connect(hostname, family, port, ‘IP address’) C: HELO domain helo(domain) for each message transaction do C: MAIL FROM sender ⇒ envfrom(sender) [R] C: RCPT TO recipient ⇒ envrcpt(recipient) C: DATA ⇒ data() [R] C: header: value ⇒ header(header, value) C: ⇒ eoh() [R] C: body-line ⇒ /* Collect lines into blocks blk of ⇒ * at most len bytes and for each ⇒ * such block call: ⇒ */ ⇒ body(blk, len) C: . ⇒ eom() done ⇒ end()
This control flow is maintained for as long as each called handler
returns continue
(see Actions). Otherwise, if
any handler returns accept
or discard
, the message
processing continues, but no other handler is called. In the case
of accept
, the MTA will accept the message for
delivery, in the case of discard
it will silently discard it.
If any of the handlers returns reject
or tempfail
, the
result depends on the handler. If this code is returned by
envrcpt
handler, it causes this particular recipient address to
be rejected. When returned by any other handler,
it causes the whole message will be rejected.
The reject
and tempfail
actions executed by
helo
handler do not take effect immediately. Instead, their
action is deferred until the next SMTP command from the
client, which is usually MAIL FROM
.
Next: Conditional Execution, Previous: Start Up, Up: Tutorial [Contents][Index]
The mailfromd
script file contains a
series of declarations of the handler procedures. Each
declaration has the form:
prog name do … done
where prog
, do
and done
are the keywords,
and name is the state name for this handler. The dots in the
above example represent the actual code, or a set of
commands, instructing mailfromd
how to process the message.
For example, the declaration:
prog envfrom do accept done
installs a handler for ‘envfrom’ state, which always approves the
message for delivery, without any further interaction with
mailfromd
.
The word accept
in the above example is an action.
Action is a special language statement that instructs the
run-time engine to stop execution of the program and to return a
response code to the Sendmail
. There are five actions, one
for each response code: continue
, accept
, reject
,
discard
, and tempfail
. Among these, reject
and
discard
can optionally take one to three arguments. There are
two ways of supplying the arguments.
In the first form, called literal or traditional notation,
the arguments are supplied as additional words after the action name,
separated by whitespace. The first argument is a three-digit
RFC 2821 reply code. It must begin with ‘5’ for
reject
and with ‘4’ for tempfail
. If two arguments
are supplied, the second argument must be either an extended
reply code (RFC 1893/2034) or a textual string to be
returned along with the SMTP reply. Finally, if all three
arguments are supplied, then the second one must be an extended reply
code and the third one must supply the textual string. The following
examples illustrate all possible ways of using the reject
statement in literal notation:
reject reject 503 reject 503 5.0.0 reject 503 "Need HELO command" reject 503 5.0.0 "Need HELO command"
Please note the quotes around the textual string.
Another form for these action is called functional notation, because it resembles the function syntax. When used in this form, the action word is followed by a parenthesized group of exactly three arguments, separated by commas. The meaning and ordering of the argument is the same as in literal form. Any of three arguments may be absent, in which case it will be replaced by the default value. To illustrate this, here are the statements from the previous example, written in functional notation:
reject(,,) reject(503,,) reject(503, 5.0.0) reject(503,, "Need HELO command") reject(503, 5.0.0, "Need HELO command")
Next: Functions and Modules, Previous: Simplest Configurations, Up: Tutorial [Contents][Index]
Programs consisting of a single action are rarely useful. In most cases you will want to do some checking and decide whether to process the message depending on its result. For example, if you do not want to accept messages from the address ‘<badguy@some.net>’, you could write the following program:
prog envfrom do if $f = "badguy@some.net" reject else accept fi done
This example illustrates several important concepts. First or
all, $f
in the third line is a Sendmail macro
reference. Sendmail macros are referenced the same way as in
sendmail.cf, with the only difference that curly braces around
macro names are optional, even if the name consists of several
letters. The value of a macro reference is always a string.
The equality operator (‘=’) compares its left and right
arguments and evaluates to true if the two strings are exactly the
same, or to false otherwise. Apart from equality, you can use the
regular relational operators: ‘!=’, ‘>’, ‘>=’, ‘<’
and ‘<=’. Notice that string comparison in mailfromd
is always case sensitive. To do case-insensitive comparison,
translate both operands to upper or lower case (See tolower, and
see toupper).
The if
statement decides what actions to execute depending
on the value its condition evaluates to. Its usual form is:
if expression then-body [else else-body] fi
The then-body is executed if the expression evaluates to
true
(i.e. to any non-zero value). The optional
else-body is executed if the expression yields
false
(i.e. zero). Both then-body and else-body can contain
other if
statements, their nesting depth is not limited. To
facilitate writing complex conditional statements, the elif
keyword can be used to introduce alternative conditions, for example:
prog envfrom do if $f = "badguy@some.net" reject elif $f = "other@domain.com" tempfail 470 "Please try again later" else accept fi done
See switch, for more elaborate forms of conditional branching.
Next: Domain Name System, Previous: Conditional Execution, Up: Tutorial [Contents][Index]
As any programming language, MFL supports a concept of function, i.e. a body of code that is assigned a unique name and can be invoked elsewhere as many times as needed.
All functions have a definition that introduces types and names of the formal parameters and the result type, if the function is to return a meaningful value (function definitions in MFL are discussed in detail in see User-Defined Functions).
A function is invoked using a special construct, a function call:
name (arg-list)
where name is the function name, and arg-list is a comma-separated list of expressions. Each expression in arg-list is evaluated, and its type is compared with that of the corresponding formal argument. If the types differ, the expression is converted to the formal argument type. Finally, a copy of its value is passed to the function as a corresponding argument. The order in which the expressions are evaluated is not defined. The compiler checks that the number of elements in arg-list match the number of mandatory arguments for function name.
If the function does not deliver a result, it should only be called as a statement.
Functions may be recursive, even mutually recursive.
Mailfromd
comes with a rich set of predefined functions
for various purposes. There are two basic function classes:
built-in functions, that are implemented by the MFL
runtime environment in mailfromd
, and library
functions, that are implemented in MFL. The built-in
functions are always available and no preparatory work is needed before
calling them. In contrast, the library functions are defined in
modules, special MFL source files that contain functions
designed for a particular task. In order to access a library
function, you must first require a module it is defined in.
This is done using require
statement. For example, the
function hostname
looks up in the DNS the name
corresponding to the IP address specified as its argument. This
function is defined in module dns.mfl, so before calling it you
must require this module:
require dns
The require
statement takes a single argument: the name of the
requested module (without the ‘.mfl’ suffix). It looks up the
module on disk and loads it if it is available.
For more information about the module system See Modules.
Next: Checking Sender Address, Previous: Functions and Modules, Up: Tutorial [Contents][Index]
Site administrators often do not wish to accept mail from hosts that
do not have a proper reverse delegation in the Domain Name System.
In the previous section we introduced the library function
hostname
, that looks up in the DNS the name corresponding to
the IP address specified as its argument. If there is no
corresponding name, the function returns its argument unchanged. This
can be used to test if the IP was resolved, as illustrated in the
example below:
require 'dns' prog envfrom do if hostname($client_addr) = $client_addr reject fi done
The #require dns
statement loads the module dns.mfl,
after which the definition of hostname
becomes available.
A similar function, resolve
, which resolves the symbolic
name to the corresponding IP address is provided in the same
dns.mfl module.
Next: SMTP Timeouts, Previous: Domain Name System, Up: Tutorial [Contents][Index]
A special language construct is provided for verification of sender addresses (callout):
on poll $f do when success: accept when not_found or failure: reject 550 5.1.0 "Sender validity not confirmed" when temp_failure: tempfail 450 4.1.0 "Try again later" done
The on poll
construct runs standard verification
(see standard verification) for the email address specified as its
argument (in the example above it is the value of the Sendmail macro
‘$f’). The check can result in the following conditions:
success
The address exists.
not_found
The address does not exist.
failure
Some error of permanent nature occurred during the check. The existence of the address cannot be verified.
temp_failure
Some temporary failure occurred during the check. The existence of the address cannot be verified at the moment.
The when
branches of the on poll
statement
introduce statements, that are executed depending on the actual
return condition. If any condition occurs that is not handled within
the on
block, the run-time evaluator will signal an
exception5 and return temporary
failure, therefore it is advisable to always handle all four
conditions. In fact, the condition handling shown in the above
example is preferable for most normal configurations: the mail is
accepted if the sender address is proved to exist and rejected
otherwise. If a temporary failure occurs, the remote party is urged
to retry the transaction some time later.
The poll
statement itself has a number of options that
control the type of the verification. These are discussed in detail
in poll.
It is worth noticing that there is one special email address which is always available on any host, it is the null address ‘<>’ used in error reporting. It is of no use verifying its existence:
prog envfrom do if $f == "" accept else on poll $f do when success: accept when not_found or failure: reject 550 5.1.0 "Sender validity not confirmed" when temp_failure: tempfail 450 4.1.0 "Try again later" done fi done
Next: Avoiding Verification Loops, Previous: Checking Sender Address, Up: Tutorial [Contents][Index]
When using polling functions, it is important to take into account possible delays, which can occur in SMTP transactions. Such delays may be due to low network bandwidth or high load on the remote server. Some sites impose them willingly, as a spam-fighting measure.
Ideally the callout verification should use the timeout values defined in the RFC 2822, but this is impossible in practice, because it would cause a timeout escalation, which consists in propagating delays encountered in a callout SMTP session back to the remote client whose session initiated the callout.
Consider, for example, the following scenario. An MFL script performs a callout on ‘envfrom’ stage. The remote server is overloaded and delays heavily in responding, so that the initial response arrives 3 minutes after establishing the connection, and processing the ‘EHLO’ command takes another 3 minutes. These delays are OK according to the RFC, which imposes a 5 minute limit for each stage, but while waiting for the remote reply our SMTP server remains in the ‘envfrom’ state with the client waiting for a response to its ‘MAIL’ command more than 6 minutes, which is intolerable, because of the same 5 minute limit. Thus, the client will almost certainly break the session.
To avoid this, mailfromd
uses a special instance, called
callout server, which is responsible for running callout
SMTP sessions asynchronously. The usual sender verification
is performed using so-called soft timeout values, which
are set to values short enough to not disturb the incoming session
(e.g. a timeout for ‘HELO’ response is 3 seconds, instead of 5
minutes). If this verification yields a definite answer, that answer
is stored in the cache database and returned to the calling procedure
immediately. If, however, the verification is aborted due to a timeout,
the caller procedure is returned an ‘e_temp_failure’ exception, and
the callout is scheduled for processing by a callout server. This
exception normally causes the milter session to return a temporary
error to the sender, urging it to retry the connection later.
In the meantime, the callout server runs the sender verification again using another set of timeouts, called hard timeouts, which are normally much longer than ‘soft’ ones (they default to the values required by RFC 2822). If it gets a definitive result (e.g. ‘email found’ or ‘email not found’), the server stores it in the cache database. If the callout ends due to a timeout, a ‘not_found’ result is stored in the database.
Some time later, the remote server retries the delivery, and the
mailfromd
script is run again. This time, the callout
function will immediately obtain the already cached result from the
database and proceed accordingly. If the callout server has not
finished the request by the time the sender retries the connection,
the latter is again returned a temporary error, and the process
continues until the callout is finished.
Usually, callout server is just another instance of
mailfromd
itself, which is started automatically to
perform scheduled SMTP callouts. It is also possible to set up
a separate callout server on another machine. This is discussed in
calloutd.
For a detailed information about callout timeouts and their configuration, see conf-timeout.
For a description of how to configure mailfromd
to use callout
servers, see conf-server.
Next: HELO Domain, Previous: SMTP Timeouts, Up: Tutorial [Contents][Index]
An envfrom
program consisting only of the on poll
statement will work smoothly for incoming mails, but will create
infinite loops for outgoing mails. This is because upon sending an outgoing
message mailfromd
will start the verification procedure, which
will initiate an SMTP transaction with the same mail server
that runs it. This transaction will in turn trigger execution of
on poll
statement, etc. ad infinitum. To avoid this, any
properly written filter script should not run the verification
procedure on the email addresses in those domains that are relayed by
the server it runs on. This can be achieved using relayed
function. The function returns true
if its argument is
contained in one of the predefined domain list files. These
files correspond to Sendmail
plain text files used in
F
class definition forms (see Sendmail Installation and
Operation Guide, chapter 5.3), i.e. they contain one domain name per
line, with empty lines and lines started with ‘#’ being ignored.
The domain files consulted by relayed
function are defined
in the relayed-domain-file
configuration file statement
(see relayed-domain-file):
relayed-domain-file (/etc/mail/local-host-names, /etc/mail/relay-domains);
or:
relayed-domain-file /etc/mail/local-host-names; relayed-domain-file /etc/mail/relay-domains;
The above example declares two domain list files, most commonly
used in Sendmail
installations to keep hostnames of the server
6 and names of the domains, relayed by this
server7.
Given all this, we can improve our filter program:
require 'dns' prog envfrom do if $f == "" accept elif relayed(hostname(${client_addr})) accept else on poll $f do when success: accept when not_found or failure: reject 550 5.1.0 "Sender validity not confirmed" when temp_failure: tempfail 450 4.1.0 "Try again later" done fi done
If you feel that your Sendmail’s relayed domains are not restrictive
enough for mailfromd
filters (for example you are relaying
mails from some third-party servers), you can use a database of
trusted mail server addresses. If the number of such servers is small
enough, a single ‘or’ statement can be used, e.g.:
elif ${client_addr} = "10.10.10.1" or ${client_addr} = "192.168.11.7" accept …
otherwise, if the servers’ IP addresses fall within one or
several CIDRs, you can use the match_cidr
function
(see Internet address manipulation functions), e.g.:
elif match_cidr (${client_addr}, "199.232.0.0/16") accept …
or combine both methods. Finally, you can keep a DBM
database of relayed addresses and use dbmap
or dbget
function for checking (see Database functions).
elif dbmap("%__statedir__/relay.db", ${client_addr}) accept …
Next: rset, Previous: Avoiding Verification Loops, Up: Tutorial [Contents][Index]
Some of the mail filtering conditions may depend on the value of
helo domain name, i.e. the argument to the SMTP EHLO
(or
HELO
) command. If you ever need such conditions, take into
account the following caveats. Firstly, although Sendmail
passes the helo domain in $s
macro, it does not do this
consistently. In fact, the $s
macro is available only to
the helo
handler, all other handlers won’t see it, no matter what
the value of the corresponding Milter.macros.handler
statement. So, if you wish to access its value from any
handler, other than helo
, you will have to store it in a
variable in the helo
handler and then use this variable
value in the other handler. This approach is also recommended for
another MTAs. This brings us to the concept of
variables in mailfromd
scripts.
A variable is declared using the following syntax:
type name
where variable is the variable name and type is ‘string’, if the variable is to hold a string value, and ‘number’, if it is supposed to have a numeric value.
A variable is assigned a value using the set
statement:
set name expr
where expr is any valid MFL expression.
The set
statement can occur within handler or function
declarations as well as outside of them.
There are two kinds of Mailfromd
variables: global
variables, that are visible to all handlers and functions, and
automatic variables, that are available only within the handler
or function where they are declared. For our purpose we need a global
variable (See Variable classes, for detailed descriptions
of both kinds of variables).
The following example illustrates an approach that allows to use
the HELO
domain name in any handler:
# Declare the helohost variable string helohost prog helo do # Save the host name for further use set helohost $s done prog envfrom do # Reject hosts claiming to be localhost if helohost = "localhost" reject 570 "Please specify real host name" fi done
Notice, that for this approach to work, your MTA
must export the ‘s’ macro (e.g., in case of Sendmail, the
Milter.macros.helo
statement in the sendmail.cf file must contain
‘s’. see Sendmail). This requirement can be removed by
using the handler argument of helo
. Each
mailfromd
handler is given one or several arguments. The
exact number of arguments and their meaning are handler-specific and are
described in Handlers, and Figure 3.1.
The arguments are referenced by their ordinal number, using the notation
$n
. The helo
handler takes one argument, whose
value is the helo domain. Using this information, the helo
handler from the example above can be rewritten as follows:
prog helo
do
# Save the host name for further use
set helohost $1
done
Next: Controlling Number of Recipients, Previous: HELO Domain, Up: Tutorial [Contents][Index]
In previous section we have used a global variable to hold certain
information and share it between handlers. In the majority of cases,
such information is session specific, and becomes invalid if the
remote party issues the SMTP RSET
command. Therefore,
mailfromd
clears all global variables when it receives a
Milter ‘abort’ request, which is normally generated by this
command.
However, you may need some variables that retain their values
even across SMTP session resets. In mailfromd
terminology
such variables are called precious. Precious variables are
declared by prefixing their declaration with the keyword
precious
. Consider, for example, this snippet of code:
precious number rcpt_counter prog envrcpt do set rcpt_counter rcpt_counter + 1 done
Here, the variable ‘rcpt_counter’ is declared as precious and
its value is incremented each time the ‘envrcpt’ handler is
called. This way, ‘rcpt_counter’ will keep the total number of
SMTP RCPT
commands issued during the session, no matter how
many times it was restarted using the RSET
command.
Next: Sending Rate, Previous: rset, Up: Tutorial [Contents][Index]
Any MTA provides a way to limit the number of recipients
per message. For example, in Sendmail
you may use the
MaxRecipientsPerMessage
option8. However, such methods are not flexible, so
you are often better off using mailfromd
for this purpose.
Mailfromd
keeps the number of recipients collected so far
in variable rcpt_count
, which can be controlled in
envrcpt
handler as shown in the example below:
prog envrcpt do if rcpt_count > 10 reject 550 5.7.1 "Too many recipients" fi done
This filter will accept no more than 10 recipients per message. You may achieve finer granularity by using additional conditions. For example, the following code will allow any number of recipients if the mail is coming from a domain relayed by the server, while limiting it to 10 for incoming mail from other domains:
prog envrcpt do if not relayed(hostname($client_addr)) and rcpt_count > 10 reject 550 5.7.1 "Too many recipients" fi done
There are three important features to notice in the above code.
First of all, it introduces two boolean operators:
and
, which evaluates to true
only if both
left-side and right-side expressions are true
, and not
,
which reverses the value of its argument.
Secondly, the scope of an operation is determined by its
precedence, or binding strength. Not
binds more
tightly than and
, so its scope is limited by the next
expression between it and and
. Using parentheses to underline the
operator scoping, the above if
condition can be rewritten as
follows:
if (not (relayed(hostname($client_addr)))) and (%rcpt_count > 10)
Finally, it is important to notice that all boolean expressions
are computed using shortcut evaluation. To understand what it
is, let’s consider the following expression: x and
y
. Its value is true
only if both x and y
are true
. Now suppose that we evaluate the expression from
left to right and we find that x is false. This means that
no matter what the value of y is, the resulting expression will be
false
, therefore there is no need to compute y at all.
So, the boolean shortcut evaluation works as follows:
x and y
If x ⇒
, do not evaluate y and
return false
false
.
x or y
If x ⇒
, do not evaluate y and
return true
true
.
Thus, in the expression not relayed(hostname($client_addr)) and
rcpt_count > 10
, the value of the rcpt_count
variable will be
compared with ‘10’ only if the relayed
function yielded
false
.
To further enhance our sample filter, you may wish to make the
reject
output more informative, to let the sender know what
the recipient limit is. To do so, you can use the concatenation
operator ‘.’ (a dot):
set max_rcpt 10 prog envrcpt do if not relayed(hostname($client_addr)) and rcpt_count > 10 reject 550 5.7.1 "Too many recipients, max=" . max_rcpt fi done
When evaluating the third argument to reject
,
mailfromd
will first convert max_rcpt
to string and
then concatenate both strings together, producing string ‘Too many
recipients, max=10’.
Next: Greylisting, Previous: Controlling Number of Recipients, Up: Tutorial [Contents][Index]
We have introduced the notion of mail sending rate in Rate Limit. Mailfromd
keeps the computed rates in the special
rate
database (see Databases). Each record in this
database consists of a key
, for which the rate is computed, and
the rate value, in form of a double precision floating point number,
representing average number of messages per second sent by this
key
within the last sampling interval. In the simplest case,
the sender email address can be used as a key
, however we recommend
to use a conjunction email-sender_ip instead, so the
actual email owner won’t be blocked by actions of some spammer
abusing his/her address.
Two functions are provided to control and update sending rates. The
rateok
function takes three mandatory arguments:
bool rateok(string key, number interval, number threshold)
The key meaning is described above. The interval is the
sampling interval, or the number of seconds to which the actual
sending rate value is converted. Remember that it is stored
internally as a floating point number, and thus cannot be directly
used in mailfromd
filters, which operate only on integer
numbers. To use the rate value, it is first converted to messages per
given interval, which is an integer number. For example, the rate
0.138888
brought to 1-hour interval gives 500
(messages per hour).
When the rateok
function is called, it recomputes
rate record for the given key. If the new rate value converted
to messages per given interval is less than threshold,
the function updates the database and returns True
. Otherwise it
returns False
and does not update the database.
This function must be required prior to use, by placing the following statement somewhere at the beginning of your script:
require rateok
For example, the following code limits the mail sending rate for each ‘email address’-‘IP’ combination to 180 per hour. If the actual rate value exceeds this limit, the sender is returned a temporary failure response:
require rateok prog envfrom do if not rateok($f . "-" . ${client_addr}, 3600, 180) tempfail 450 4.7.0 "Mail sending rate exceeded. Try again later" fi done
Notice argument concatenation, used to produce the key.
It is often inconvenient to specify intervals in seconds,
therefore a special interval
function is provided. It
converts its argument, which is a textual string representing time
interval in English, to the corresponding number of seconds. Using
this function, the function invocation would be:
rateok($f . "-" . ${client_addr}, interval("1 hour"), 180)
The interval
function is described in interval, and time
intervals are discussed in time interval specification.
The rateok
function begins computing the rate
as soon as it has collected enough data. By default, it needs at least
four mails. Since this may lead to a big number of false positives
(i.e. overestimated rates) at the beginning of sampling interval,
there is a way to specify a minimum number of samples rateok
must collect before starting to actually compute rates. This number of
samples is given as the optional fourth argument to the function. For
example, the following call will always return True
for the
first 10 mails, no matter what the actual rate:
rateok($f . "-" . ${client_addr}, interval("1 hour"), 180, 10)
The tbf_rate
function allows to exercise more control over
the mail rates. This function implements a token bucket filter
(TBF) algorithm.
The token bucket controls when the data can be transmitted based on the presence of abstract entities called tokens in a container called bucket. Each token represents some amount of data. The algorithm works as follows:
This algorithm allows to keep the data traffic at a constant rate t with bursts of up to m data items. Such bursts occur when no data was being arrived for m*t or more microseconds.
Mailfromd
keeps buckets in a database ‘tbf’. Each
bucket is identified by a unique key. The tbf_rate
function is defined as follows:
bool tbf_rate(string key, number n, number t, number m)
The key identifies the bucket to operate upon. The rest of
arguments is described above. The tbf_rate
function returns
‘True’ if the algorithm allows to accept the data and
‘False’ otherwise.
Depending on how the actual arguments are selected the tbf_rate
function can be used to control various types of flow rates. For
example, to control mail sending rate, assign the arguments as
follows: n to the number of mails and t to the control
interval in microseconds:
prog envfrom do if not tbf_rate($f . "-" . $client_addr, 1, 10000000, 20) tempfail 450 4.7.0 "Mail sending rate exceeded. Try again later" fi done
The example above permits to send at most one mail each 10 seconds. The burst size is set to 20.
Another use for the tbf_rate
function is to limit the total
delivered mail size per given interval of time. To do so, the
function must be used in prog eom
handler, because it is the
only handler where the entire size of the message is known. The
n argument must contain the number of bytes in the email (or
email bytes * number of recipients), and the t must be set to
the number of bytes per microsecond a given user is allowed to send. The
m argument must be large enough to accommodate a couple of
large emails. E.g.:
prog eom do if not tbf_rate("$f-$client_addr", message_size(current_message()), 10240*1000000, # At most 10 kb/sec 10*1024*1024) tempfail 450 4.7.0 "Data sending rate exceeded. Try again later" fi done
See Rate limiting functions, for more information about
rateok
and tbf_rate
functions.
Next: Local Account Verification, Previous: Sending Rate, Up: Tutorial [Contents][Index]
Greylisting is a simple method of defending against the spam
proposed by Evan Harris. In few words, it consists in recording the
‘sender IP’-‘sender email’-‘recipient email’ triplet of
mail transactions. Each time the unknown triplet is seen, the
corresponding message is rejected with the tempfail
code. If the
mail is legitimate, this will make the originating server retry
the delivery later, until the destination eventually accepts it. If,
however, the mail is a spam, it will probably never be retried, so
the users will not be bothered by it. Even if the spammer will retry
the delivery, the greylisting period will give spam-detection
systems, such as DNSBLs, enough time to detect and blacklist it,
so by the time the destination host starts accepting emails from this
triplet, it will already be blocked by other means.
You will find the detailed description of the method in The Next Step in the Spam Control War: Greylisting, the original whitepaper by Evan Harris.
The mailfromd
implementation of greylisting is based on
greylist
function. The function takes two arguments:
the key
, identifying the greylisting triplet, and the
interval
. The function looks up the key in the greylisting
database. If such a key is not found, a new entry is created for it
and the function returns true
. If the key is
found, greylist
returns false
, if it was inserted to the
database more than interval
seconds ago, and true
otherwise.
In other words, from the point of view of the greylisting algorithm, the
function returns true
when the message delivery should be
blocked. Thus, the simplest implementation of the algorithm would be:
prog envrcpt do if greylist("${client_addr}-$f-${rcpt_addr}", interval("1 hour")) tempfail 451 4.7.1 "You are greylisted" fi done
However, the message returned by this example, is not informative
enough. In particular, it does not tell when the message will be
accepted. To help you produce more informative messages, greylist
function stores the number of seconds left to the end of the
greylisting period in the global variable
greylist_seconds_left
, so the above example could be enhanced
as follows:
prog envrcpt do set gltime interval("1 hour") if greylist("${client_addr}-$f-${rcpt_addr}", gltime) if greylist_seconds_left = gltime tempfail 451 4.7.1 "You are greylisted for %gltime seconds" else tempfail 451 4.7.1 "Still greylisted for %greylist_seconds_left seconds" fi fi done
In real life you will have to avoid greylisting some messages, in particular those coming from the ‘<>’ address and from the IP addresses in your relayed domain. It can easily be done using the techniques described in previous sections and is left as an exercise to the reader.
Mailfromd
provides two implementations of greylisting
primitives, which differ in the information stored in the database.
The one described above is called traditional. It keeps in the
database the time when the greylisting was activated for the given
key, so the greylisting
function uses its second argument
(interval
) and the current timestamp to decide whether the key
is still greylisted.
The second implementation is called by the name of its inventor
Con Tassios. This implementation stores in the database the
time when the greylisting period is set to expire, computed by the
greylist
when it is first called for the given key, using the
formula ‘current_timestamp + interval’. Subsequent calls to
greylist
compare the current timestamp with the one stored in
the database and ignore their second argument. This implementation is
enabled by one of the following pragmas:
#pragma greylist con-tassios
or
#pragma greylist ct
When Con Tassios implementation is used, yet another function
becomes available. The function is_greylisted
(see is_greylisted) returns
‘True’ if its argument is greylisted and ‘False’ otherwise.
It can be used to check for the greylisting status without actually
updating the database:
if is_greylisted("${client_addr}-$f-${rcpt_addr}") … fi
One special case is whitelisting, which is often used
together with greylisting. To implement it, mailfromd
provides the function dbmap
, which takes two mandatory arguments:
dbmap(file, key)
(it also allows an optional third
argument, see dbmap, for more information on it). The first argument is
the name of the DBM file where to search for the key, the second one
is the key to be searched. Assuming you keep your whitelist database
in file /var/run/whitelist.db, a more practical example will be:
prog envrcpt do set gltime interval("1 hour") if not ($f = "" or relayed(hostname(${client_addr})) or dbmap("/var/run/whitelist.db", ${client_addr})) if greylist("${client_addr}-$f-${rcpt_addr}", gltime) if greylist_seconds_left = gltime tempfail 451 4.7.1 "You are greylisted for %gltime seconds" else tempfail 451 4.7.1 "Still greylisted for %greylist_seconds_left seconds" fi fi fi done
Next: Databases, Previous: Greylisting, Up: Tutorial [Contents][Index]
In your filter script you may need to verify if the given
user name is served by your mail server, in other words, to verify if
it represents a local account. Notice that in this context, the word
local does not necessarily mean that the account is local for
the server running mailfromd
, it simply means any account
whose mailbox is served by the mail servers using mailfromd
.
The validuser
function may be used for this purpose. It
takes one argument, the user name, and returns true
if
this name corresponds to a local account. To verify this, the
function relies on libmuauth
, a powerful authentication
library shipped with GNU mailutils
. More precisely, it
invokes a list of authorization functions. Each function is
responsible for looking up the user name in a particular source of
information, such as system passwd database, an SQL database,
etc. The search is terminated when one of the functions finds
the name in question or the list is exhausted. In the former case, the
account is local, in the latter it is not. This concept is
discussed in detail in see Authorization and Authentication Principles in GNU Mailutils
Manual). Here we will give only some practical advices for
implementing it in mailfromd
filters.
The actual list of available authorization modules depends on your
mailutils
installation. Usually it includes, apart from
traditional UNIX passwd database, the functions for verifying
PAM, RADIUS and SQL database accounts.
Each of the authorization methods is configured using special
configuration file statements. For the description of the Mailutils
configuration files, See Mailutils Configuration File in GNU Mailutils Manual.
You can obtain the template for mailfromd
configuration by
running mailfromd --config-help
.
For example, the following mailfromd.conf file:
auth { authorization pam:system; } pam { service mailfromd; }
sets up the authorization using PAM and system passwd database. The name of PAM service to use is ‘mailfromd’.
The function validuser
is often used together with
dbmap
, as in the example below:
#pragma dbprop /etc/mail/aliases.db null if dbmap("/etc/mail/aliases.db", localpart($rcpt_addr)) and validuser(localpart($rcpt_addr)) … fi
For more information about dbmap
function, see dbmap.
For a description of dbprop
pragma, see Database functions.
Next: Testing Filter Scripts, Previous: Local Account Verification, Up: Tutorial [Contents][Index]
Some mailfromd
functions use DBM databases to save their
persistent state data. Each database has a unique identifier,
and is assigned several pieces of information for its maintenance: the
database file name and the expiration period, i.e. the
time after which a record is considered expired.
To obtain the list of available databases along with their preconfigured settings, run mailfromd --show-defaults (see Examining Defaults). You will see an output similar to this:
version: 9.0 script file: /etc/mailfromd.mfl preprocessor: /usr/bin/m4 -s user: mail statedir: /var/run/mailfromd socket: unix:/var/run/mailfromd/mailfrom pidfile: /var/run/mailfromd/mailfromd.pid default syslog: blocking supported databases: gdbm, bdb default database type: bdb optional features: DKIM GeoIP2 STARTTLS greylist database: /var/run/mailfromd/greylist.db greylist expiration: 86400 tbf database: /var/run/mailfromd/tbf.db tbf expiration: 86400 rate database: /var/run/mailfromd/rates.db rate expiration: 86400 cache database: /var/run/mailfromd/mailfromd.db cache positive expiration: 86400 cache negative expiration: 43200
The text below ‘optional features’ line describes the available built-in databases. Notice that the ‘cache’ database, in contrast to the rest of databases, has two expiration periods associated with it. This is explained in the next subsection.
• Database Formats | ||
• Basic Database Operations | ||
• Database Maintenance |
Next: Basic Database Operations, Up: Databases [Contents][Index]
The version 9.0 runs the following database types (or formats):
Cache database keeps the information about external emails, obtained using sender verification functions (see Checking Sender Address). The key entry to this database is an email address or email:sender-ip string, for addresses checked using strict verification. The data its stores for each key are:
success
or
not_found
, meaning the address is confirmed to exists or it
is not.
The ‘cache’ database has two expiration periods: a
positive expiration period, that is applied to entries with
the first field set to success
, and a negative expiration
period, applied to entries marked as not_found
.
The mail sending rate data, maintained by rate
function
(see Rate limiting functions). A record consists of the following fields:
The time when the entry was entered into the database.
Interval during which the rate was measured (seconds).
Number of mails sent during this interval.
This database is maintained by tbf_rate
function (see TBF).
Each record represents a single bucket and consists of the following
keys:
Timestamp of most recent token, as a 64-bit unsigned integer (microseconds resolution).
Estimated time when this bucket expires (seconds since epoch).
Number of tokens in the bucket (size_t
).
This database is maintained by greylist
function
(see Greylisting). Each record holds only the timestamp.
Its semantics depends on the greylisting implementation in
use (see greylisting types). In traditional implementation, it
is the time when the entry was entered into the database. In Con
Tassios implementation, it is the time when the greylisting period
expires.
Next: Database Maintenance, Previous: Database Formats, Up: Databases [Contents][Index]
The mfdbtool
utility is provided for performing various
operations on the mailfromd
database.
To list the contents of a database, use --list option. When used without any arguments it will list the ‘cache’ database:
$ mfdbtool --list abrakat@mail.com success Thu Aug 24 15:28:58 2006 baccl@EDnet.NS.CA not_found Fri Aug 25 10:04:18 2006 bhzxhnyl@chello.pl not_found Fri Aug 25 10:11:57 2006 brqp@aaanet.ru:24.1.173.165 not_found Fri Aug 25 14:16:06 2006
You can also list data for any particular key or keys. To do so,
give the keys as arguments to mfdbtool
:
$ mfdbtool --list abrakat@mail.com brqp@aaanet.ru:24.1.173.165 abrakat@mail.com success Thu Aug 24 15:28:58 2006 brqp@aaanet.ru:24.1.173.165 not_found Fri Aug 25 14:16:06 2006
To list another database, give its format identifier with the --format (-H) option. For example, to list the ‘rate’ database:
$ mfdbtool --list --format=rate sam@mail.net-62.12.4.3 Wed Sep 6 19:41:42 2006 139 3 0.0216 6.82e-06 axw@rame.com-59.39.165.172 Wed Sep 6 20:26:24 2006 0 1 N/A N/A
The --format option can be used with any database management option, described below.
Another useful operation you can do while listing ‘rate’ database is the prediction of estimated time of sending, i.e. the time when the user will be able to send mail if currently his mail sending rate has exceeded the limit. This is done using --predict option. The option takes an argument, specifying the mail sending rate limit, e.g. (the second line is split for readability):
$ mfdbtool --predict="180 per 1 minute" ed@fae.net-21.10.1.2 Wed Sep 13 03:53:40 2006 0 1 N/A N/A; free to send service@19.netlay.com-69.44.129.19 Wed Sep 13 15:46:07 2006 7 2 0.286 0.0224; in 46 sec. on Wed Sep 13 15:49:00 2006
Notice, that there is no need to use --list --format=rate along with this option, although doing so is not an error.
To delete an entry from the database, use --delete option, for example: mfdbtool --delete abrakat@mail.com. You can give any number of keys to delete in the command line.
Previous: Basic Database Operations, Up: Databases [Contents][Index]
There are two principal operations of database management:
expiration and compaction. Expiration consists in removing
expired entries from the database. In fact, it is rarely needed,
since the expired entries are removed in the process of normal
mailfromd
work. Nevertheless, a special option is provided
in case an explicit expiration is needed (for example, before dumping
the database to another format, to avoid transferring useless
information).
The command line option --expire instructs
mfdbtool
to delete expired entries from the specified database. As
usual, the database is specified using --format option. If
it is not given explicitly, ‘cache’ is assumed.
While removing expired entries the space they occupied is marked as free, so it can be used by subsequent inserts. The database does not shrink after expiration is finished. To actually return the unused space to the file system you should compact your database.
This is done by running mfdbtool --compact (and, optionally,
specifying the database to operate upon with --format
option). Notice, that compacting a database needs roughly as
much disk space on the partition where the database resides as is
currently used by the database. Database compaction runs in three phases.
First, the database is scanned and all non-expired records are stored
in the memory. Secondly, a temporary database is created in the state
directory and all the cached entries are flushed into it. This
database is named after the PID of the running
mfdbtool
process. Finally, the temporary database is
renamed to the source database.
Both --compact and --expire can be applied to all databases by combining them with --all. It is useful, for example, in crontab files. For example, I have the following monthly job in my crontab:
0 1 1 * * /usr/bin/mfdbtool --compact --all
It is important to check your filter script before actually starting to use it. There are several ways to do so.
To test the syntax of your filter script, use the --lint
option. It will cause mailfromd
to exit
immediately after attempting to compile the script file. If the
compilation succeeds, the program will exit with code 0. Otherwise,
it will exit with error code 78 (‘configuration error’). In the
latter case, mailfromd
will also print a diagnostic message,
describing the error along with the exact location where the error was
diagnosed, for example:
mailfromd: /etc/mailfromd.mfl:39: syntax error, unexpected reject
The error location is indicated by the name of the file and the
number of the line when the error occurred. By using the
--location-column option you instruct mailfromd
to
also print the column number. E.g. with this option the above
error message may look like:
mailfromd: /etc/mailfromd.mfl:39.12 syntax error, unexpected reject
Here, ‘39’ is the line and ‘12’ is the column number.
For complex scripts you may wish to obtain a listing of variables used in the script. This can be achieved using --xref command line option:
The output it produces consists of four columns:
Either number
or string
.
Measured in words.
A comma-separated list of locations where the variable was referenced. Each location is represented as file:line. If several locations pertain to the same file, the file name is listed only once.
Here is an example of the cross-reference output:
$ mailfromd --xref Cross-references: ----------------- cache_used number 5 /etc/mailfromd.mfl:48 clamav_virus_name string 9 /etc/mailfromd.mfl:240,240 db string 15 /etc/mailfromd.mfl:135,194,215 dns_record_ttl number 16 /etc/mailfromd.mfl:136,172,173 ehlo_domain string 11 gltime number 13 /etc/mailfromd.mfl:37,219,220,222,223 greylist_seconds_left number 1 /etc/mailfromd.mfl:220,226,227 last_poll_host string 2
If the script passes syntax check, the next step is often to test if
it works as you expect it to. This is done with --test
(-t) command line option. This option runs the
envfrom
handler (or another one, see below) and prints the
result of its execution.
When running your script in test mode, you will need to supply the
values of Sendmail
macros it needs. You do this by placing
the necessary assignments in the command line. For example, this is
how to supply initial values for f
and client_addr
macros:
$ mailfromd --test f=gray@gnu.org client_addr=127.0.0.1
You may also need to alter initial values of some global variables
your script uses. To do so, use -v (--variable)
command line option. This option takes a single argument consisting
of the variable name and its initial value, separated by an equals
sign. For example, here is how to change the value of
ehlo_domain
global variable:
$ mailfromd -v ehlo_domain=mydomain.org
The --test option is often useful in conjunction with options --debug, --trace and --transcript (see Logging and Debugging. The following example shows what the author got while debugging the filter script described in Filter Script Example:
$ mailfromd --test --debug=50 f=gray@gnu.org client_addr=127.0.0.1 MX 20 mx20.gnu.org MX 10 mx10.gnu.org MX 10 mx10.gnu.org MX 20 mx20.gnu.org getting cache info for gray@gnu.org found status: success (0), time: Thu Sep 14 14:54:41 2006 getting rate info for gray@gnu.org-127.0.0.1 found time: 1158245710, interval: 29, count: 5, rate: 0.172414 rate for gray@gnu.org-127.0.0.1 is 0.162162 updating gray@gnu.org-127.0.0.1 rates SET REPLY 450 4.7.0 Mail sending rate exceeded. Try again later State envfrom: tempfail
If your script uses echo
statements (see Echo), they will
print their output on standard error. To direct them to the standard
output, use the --echo option. You can also redirect the
echo
output to arbitrary file, by supplying its name as
argument, as in: --echo=file. see echo option.
To test any handler, other than ‘envfrom’, give its name as the argument to --test option. Since this argument is optional, it is important that it be given immediately after the option, without any intervening white space, for example mailfromd --test=helo, or mailfromd -thelo.
This method allows to test one handler at a time. To test the
script as a whole, use mtasim
utility. When
started it enters interactive mode, similar to that of
sendmail -bs
, where it expects SMTP commands on
its standard input and sends answers to the standard output. The
--port=auto command line option instructs it to start
mailfromd
and to create a unique socket for communication
with it. For the detailed description of the program and the ways to
use it, See mtasim.
Next: Examining Defaults, Previous: Testing Filter Scripts, Up: Tutorial [Contents][Index]
Mailfromd provides a special option that allows to run arbitrary MFL scripts.
When given the --run command line option,
mailfromd
loads the script given in its command line, looks for
the function called ‘main’, and runs it.
This function must be declared as:
func main(...) returns number
Mailfromd passes all command line arguments that follow the script
name as arguments to that function. When the function returns, its
return value is used by mailfromd
as exit code.
As an example, suppose the file script.mfl contains the following:
func main (...) returns number do loop for number i 1, while i <= $#, set i i + 1 do echo "arg %i=" . $(i) done done
This function prints all its arguments (See variadic functions, for a detailed description of functions with variable number of arguments). Now running:
$ mailfromd --run script.mfl 1 file dest
displays the following:
arg 1=1 arg 2=file arg 3=dest |
You can direct the script output to the standard output by using the --echo, as described above, e.g.:
$ mailfromd --echo --run script.mfl 1 file dest
Note, that MFL does not have a direct equivalent of
shell’s $0
argument. If your function needs to know the name
of the script that is being executed, use __file__
built-in
constant instead (see __file__).
The name main
is not hard-coded. You can use the
--run option to run any function, provided that its
definition is as discussed above. Just give the name of this
function as the argument to the option. This argument is optional,
therefore it must be separated from the option by an equals sign (with
no whitespace from either side). For example, given the command line below,
mailfromd
will load the file script.mfl and execute the
function ‘start’:
$ mailfromd --run=start script.mfl
If you need to define sendmail macros (see Sendmail Macros) for use in the run mode, place the macro=value assignments before the script name, e.g.:
$ mailfromd --run=start i=feedbeef client_addr=::1 script.mfl
To summarize, the command line when using the run mode is:
mailfromd [options] --run [macro=value] file args...
Finally, notice that file together with args... can be omitted. In this case the default script file will be used (see default script file).
The ‘macro=value’ assignments define Sendmail macros,
args... are passed as arguments to the main
function
defined in file, and option stands for any other
mailfromd
options that might be needed.
• top-block | The Top of a Script File. | |
• getopt | Parsing Command Line Arguments. |
The --run option makes it possible to use mailfromd
scripts as standalone programs. The traditional way to do so was to
set the executable bit on the script file and to begin the script with
the interpreter selector, i.e. the characters ‘#!’ followed by
the name of the mailfromd
executable, e.g.:
#! /usr/sbin/mailfromd --run
This would cause the shell to invoke mailfromd
with the
command line constructed from the --run option, the name
of the invoked script file itself, and any actual arguments from
the invocation. Once invoked, mailfromd
would treat the
initial ‘#!’ line as a usual single-line comment
(see Comments).
However, the interpretation of the ‘#!’ by shells has various
deficiencies, which depend on the actual shell being used. For
example, some shells pass any characters following the whitespace
after the interpreter name as a single argument, some others silently
truncate the command line after some number of characters, etc. This
often make it impossible to pass additional arguments to
mailfromd
. For example, a script which begins with the
following line would most probably fail to be executed properly:
#! /usr/sbin/mailfromd --echo --run
To compensate for these deficiencies and to allow for more complex
invocation sequences, mailfromd
handles initial ‘#’
in a special way. If the first line of a source file begins with
‘#!/’ or ‘#! /’ (with a single space between ‘!’ and
‘/’), it is treated as a start of a multi-line comment, which is
closed by the two characters ‘!#’ on a line by themselves.
Thus, the correct way to begin a mailfromd
script is:
#! /usr/sbin/mailfromd --run !#
Using this feature, you can start the mailfromd
with
arbitrary shell code, provided it ends with an exec
statement
invoking the interpreter itself. For example:
#!/bin/sh exec /usr/sbin/mailfromd --echo --run $0 $@ !# func main(...) returns number do /* actual mfl code goes here */ done
Note the use of ‘$0’ and ‘$@’ to pass the actual script file
name and command line arguments to mailfromd
.
A special function is provided to break (parse) the command line into options, and to check them for validity. It uses the GNU getopt routines (see getopt in The GNU C Library Reference Manual).
The getopt
function parses the command line arguments, as
supplied by argc and argv. The argc argument is the
argument count, and argv is an opaque data structure,
representing the array of arguments9. The operator vaptr
(see vaptr) is
provided to initialize this argument.
An argument that starts with ‘-’ (and is not exactly ‘-’ or ‘--’), is an option element. An argument that starts with a ‘-’ is called short or traditional option. The characters of this element, except for the initial ‘-’ are option characters. Each option character represents a separate option. An argument that starts with ‘--’ is called long or GNU option. The characters of this element, except for the initial ‘--’ form the option name.
Options may have arguments. The argument to a short option is supplied immediately after the option character, or as the next word in command line. E.g., if option -f takes a mandatory argument, then it may be given either as -farg or as -f arg. The argument to a long option is either given immediately after it and separated from the option name by an equals sign (as --file=arg), or is given as the next word in the command line (e.g. --file arg).
If the option argument is optional, i.e. it may not necessarily be given, then only the first form is allowed (i.e. either -farg or --file=arg.
The ‘--’ command line argument ends the option list. Any arguments following it are not considered options, even if they begin with a dash.
If getopt
is called repeatedly, it returns successively each of
the option characters from each of the option elements (for short
options) and each option name (for long options). In this case, the
actual arguments are supplied only to the first invocation.
Subsequent calls must be given two nulls as arguments. Such
invocation instructs getopt
to use the values saved on the
previous invocation.
When the function finds another option, it returns its character or name
updating the external variable optind
(see below) so that the
next call to getopt
can resume the scan with the following
option.
When there are no more options left, or a ‘--’ argument is
encountered, getopt
returns an empty string. Then
optind
gives the index in argv of the first element that
is not an option.
The legitimate options and their characteristics are supplied in
additional arguments to getopt
. Each such argument is a string
consisting of two parts, separated by a vertical bar (‘|’). Any
one of these parts is optional, but at least one of them must be
present. The first part specifies short option character. If it is
followed by a colon, this character takes mandatory argument. If it
is followed by two colons, this character takes an optional argument.
If only the first part is present, the ‘|’ separator may be
omitted. Examples:
Short option -c.
Short option -f, taking a mandatory argument.
Short option -f, taking an optional argument.
If the vertical bar is present and is followed by any characters, these characters specify the name of a long option, synonymous to the short one, specified by the first part. Any mandatory or optional arguments to the short option remain mandatory or optional for the corresponding long option. Examples:
Short option -f, or long option --file, requiring an argument.
Short option -f, or long option --file, taking an optional argument.
In any of the above cases, if this option appears in the command line,
getopt
returns its short option character.
To define a long option without a short equivalent, begin it with a bar, e.g.:
If this option is to take an argument, this is specified using the mechanism described above, except that the short option character is replaced with a minus sign. For example:
Long option --output, which takes a mandatory argument.
Long option --output, which takes an optional argument.
If an option is returned that has an argument in the command line,
getopt
stores this argument in the variable optarg
.
After each invocation, getopt
sets the variable optind
to the index of the next argv element to be parsed. Thus,
when the list of options is exhausted and the function returned an
empty string, optind
contains the index of the the first
element that is not an option.
When getopt
encounters an option that is not described in its
arguments or if it detects a missing option argument it prints an
error message using mailfromd
logging facilities, stores
the offending option in the variable optopt
, and returns ‘?’.
If printing error message is not desired (e.g. the application is going
to take care of error messaging), it can be disabled by setting the
variable opterr
to ‘0’.
The third argument to getopt
, called controlling argument,
may be used to control the behavior of the function. If it is a
colon, it disables printing the error message for unrecognized options
and missing option arguments (as setting opterr
to ‘0’
does). In this case getopt
returns ‘:’, instead of
‘?’ to indicate missing option argument.
If the controlling argument is a plus sign, or the environment
variable POSIXLY_CORRECT
is set, then option processing stops as
soon as a non-option argument is encountered. By default, if options
and non optional arguments are intermixed in argv, getopt
permutes them so that the options go first, followed by non-optional
arguments.
If the controlling argument is ‘-’, then each non-option element in argv is handled as if it were the argument of an option with character code 1 (‘"\001"’, in MFL notation. This can used by programs that are written to expect options and other argv-elements in any order and that care about the ordering of the two.
Any other value of the controlling argument is handled as an option definition.
A special language construct is provided to supply the second
argument (argv) to getopt
and similar functions:
vaptr(param)
where param is a positional parameter, from which to start the array of argv. For example:
func main(...) returns number do set rc getopt($#, vaptr($1), "|help") ...
Here, vaptr($1)
constructs the argv array from all the
arguments, supplied to the function main
.
To illustrate the use of getopt
function, let’s suppose you
write a script that takes the following options:
Then, the corresponding getopt
invocation will be:
func main(...) returns number do loop for string rc getopt($#, vaptr($1), "f:|file", "-::|output", "h|help"), while rc != "", set rc getopt(0, 0) do switch rc do case "f": set file optarg case "output" set output 1 set output_dir optarg case "h" help() default: return 1 done ...
Next: Logging and Debugging, Previous: Run Mode, Up: Tutorial [Contents][Index]
Sometimes you may need to check what are the default settings of the
mailfromd
binary and what values it uses actually. Both
tasks are accomplished using the --show-defaults option.
When used alone, it shows the settings actually in use (default
values, eventually modified by your configuration settings). When
used together with --no-config
, it displays the compiled
defaults.
The output of mailfromd --show-defaults
looks like this:
version: 9.0 script file: /etc/mailfromd.mfl preprocessor: /usr/bin/m4 -s -DWITH_DKIM -DWITH_MFMOD /var/mailfromd/9.0/include/pp-setup user: mail statedir: /var/lib/mailfromd socket: mailfrom pidfile: mailfromd.pid default syslog: blocking include path: /etc/mailfromd:/usr/share/mailfromd/include: /usr/share/mailfromd/8.14.94/include module path: /usr/share/mailfromd: /usr/share/mailfromd/9.0 mfmod path: /usr/lib/mailfromd optional features: DKIM, mfmod, STARTTLS supported database types: gdbm, bdb default database type: bdb greylist database: /var/lib/mailfromd/greylist.db greylist expiration: 86400 tbf database: /var/lib/mailfromd/tbf.db tbf expiration: 86400 rate database: /var/lib/mailfromd/rates.db rate expiration: 86400 cache database: /var/lib/mailfromd/mailfromd.db cache positive expiration: 604800 cache negative expiration: 86400
The above format, called human-readable, with two-column output
and long lines split across several physical lines, is used if
mailfromd
is linked with GNU libmailutils
library version 3.16 or later and its standard output is connected to
a terminal. Otherwise, machine-readable output format is used,
in which additional whitespace is elided, and long lines are retained
verbatim. This makes it possible to easily extract default values
using familiar text processing tools, e.g.:
$ mailfromd --show-defaults --no-config | grep '^script file:' script file:/etc/mailfromd.mfl $ mailfromd --show-defaults --no-config | sed -ne '/^script file:/s///p' /etc/mailfromd.mfl
The following table describes each line of the output in detail:
Program version.
The script file used by the program. It is empty if the script file is not found.
Preprocessor command line. See Preprocessor. This value can be changed in configuration: See conf-preprocessor.
System user mailfromd
runs as. See conf-priv.
mailfromd
local state directory. See statedir.
The socket mailfromd
listens on. If UNIX socket, the
filename is shown. Unless it begins with ‘/’, it is relative to
the local state directory. TCP sockets are shown in milter port specification.
See listen.
PID file name (relative to local state directory, unless absolute).
See pidfile.
Syslog implementation used: either ‘blocking’, or ‘non-blocking’.
See Using non-blocking syslog. See also Logging and Debugging.
Include search path. See include search path.
It can be changed from the command line, using the -I option
(see General Settings), and in configuration file, using the
include-path
statement (see include-path).
Search path for MFL modules. see module search path.
It can be changed from the command line, using the -P
(--module-path) option (see General Settings), and in
configuration file, using the module-path
statement
(see module-path).
Search path for dynamically loaded modules. see mfmod-path.
Comma-delimited list of optional features, included to
mailfromd
at compile time. It can contain the following
feature names:
Feature | Reference |
---|---|
DKIM | See DKIM. |
GeoIP2 | See Geolocation functions. |
mfmod | See Dynamically Loaded Modules. |
STARTTLS | See STARTTLS in call-out. |
Comma-delimited list of supported database types. See Databases. These types can be used as scheme prefixes in database names (see DBM scheme).
Type of the DBM used by default. See Databases.
File name and record expiration time of the greylisting database. See greylist database.
File name and record expiration time of the token-bucket filter rate-limiting database. See tbf database.
See rate database File name and record expiration time of the legacy rate-limiting database. See Rate limiting functions.
File name and record expiration times of the call-out cache database. See cache database.
The database settings can be changed using conf-database.
Next: Runtime errors, Previous: Examining Defaults, Up: Tutorial [Contents][Index]
Depending on its operation mode, mailfromd
tries to guess
whether it is appropriate to print its diagnostics and informational
messages on standard error or to send them to syslog. Standard error
is assumed if the program is run with one of the following command
line options:
If none of these are used, mailfromd
switches to syslog as
soon as it finishes its startup. There are two ways to communicate
with the syslogd
daemon: using the syslog
function from the system libc library, which is a blocking
implementation in most cases, or via internal, asynchronous,
syslog implementation. Whether the latter is compiled in and which
implementation is used by default is determined when compiling
the package, as described in Using non-blocking syslog.
The --logger command line option allows you to manually select the diagnostic channel:
Log everything to the standard error.
Log to syslog.
Log to syslog using the asynchronous syslog implementation.
Another way to select the diagnostic channel is by using the
logger
statement in the configuration file. The statement
takes the same argument as its command line counterpart.
The rest of details regarding diagnostic output are controlled by
the logging
configuration statement.
The default syslog facility is ‘mail’; it can be changed
using the --log-facility command line option or
facility
statement. Argument in both cases is a valid
facility name, i.e. one of: ‘user’, ‘daemon’,
‘auth’, ‘authpriv’, ‘mail’, and ‘local0’
through ‘local7’. The argument can be given in upper, lower or
mixed cases, and it can be prefixed with ‘log_’:
Another syslog-related parameter that can be configured is the
tag, which identifies mailfromd
messages. The default tag
is the program name. It is changed by the --log-tag
(-L command line option and the tag
logging statement.
The following example configures both the syslog facility and tag:
logging { facility local7; tag "mfd"; }
As any other UNIX utility, mailfromd
is very quiet unless it
has something important to communicate, such as, e.g. an error condition.
A set of command line options is provided for
controlling the verbosity of its output.
The --trace option enables tracing Sendmail actions
executed during message verifications. When this option is given,
any accept
, discard
, continue
, etc. triggered
during execution of your filter program will leave their traces in
the log file. Here is an example of how it looks like (syslog time
stamp, tag and PID removed for readability):
k8DHxvO9030656: /etc/mailfromd.mfl:45: reject 550 5.1.1 Sender validity not confirmed
This shows that while verifying the message with ID
‘k8DHxvO9030656’ the reject
action was executed by filter
script /etc/mailfromd.mfl at line 45.
The use of message ID in the log deserves a special
notice. The program will always identify its log messages with
the ‘Message-Id’, when it is available. Your responsibility as an
administrator is to make sure it is available by configuring
your MTA to export the macro ‘i’ to mailfromd
.
The rule of thumb is: make ‘i’ available to the very first
handler mailfromd
executes. It is not necessary to export
it to the rest of the handlers, since mailfromd
will cache
it. For example, if your filter script contains ‘envfrom’ and
‘envrcpt’ handlers, export ‘i’ for ‘envfrom’.
The exact instructions on how to ensure it depend on the
MTA you use. For ‘Sendmail’, refer to Sendmail.
For MeTA1, see MeTA1, and pmult-macros. For
‘Postfix’, see Postfix.
To push log verbosity further, use the debug
configuration statement (see conf-debug) or its command line
equivalent, --debug (-d, see --debug). Its
argument is a debugging level, whose syntax is described
in http://mailutils.org/wiki/Debug_level.
The debugging output is controlled by a set of levels, each of which can be set independently of others. Each debug level consists of a category name, which identifies the part of package for which additional debugging is desired, and a level number, which indicates how verbose should its output be.
Valid debug levels are:
Displays error conditions which are normally not reported, but passed to the caller layers for handling.
Ten levels of verbosity, trace0
producing less output,
trace9
producing the maximum amount of output.
Displays network protocol interaction, where applicable.
The overall debugging level is specified as a list of individual levels, delimited with semicolons. Each individual level can be specified as one of:
Disables all levels for the specified category.
Enables all levels for the specified category.
For this category, enables all levels from ‘error’ to level, inclusive.
Enables only the given level in this category.
Disables all levels from ‘error’ to level, inclusive, in this category.
Disables only the given level in this category.
Enables all levels in the range from levelA to levelB, inclusive.
Disables all levels in the range from levelA to levelB, inclusive.
Additionally, a comma-separated list of level specifications is allowed after the dot. For example, the following specification:
acl.prot,!=trace9,!trace2
enables in category acl all levels, except trace9, trace0, trace1, and trace2.
Implementation and applicability of each level of debugging differs between various categories. Categories built-in to mailutils are described in http://mailutils.org/wiki/Debug_level. Mailfromd introduces the following additional categories:
Detailed debugging info about expiration and compaction.
List records being removed.
Verbose information about attempted DNS queries and their results.
Enables ‘libadns’ internal debugging.
Additional information about normal conditions, such as subprocess exiting successfully or a remote party being allowed access by ACL.
Detailed transcript of server manager actions: startup, shutdown, subprocess cleanups, etc.
Additional info about fd sets.
Individual subserver status information.
Subprocess registration.
Verbosely list incoming connections, functions being executed and erroneous conditions: missing headers in SMFIR_CHGHEADER, undefined macros, etc.
List milter requests being processed.
List SMTP body content in SMFIR_REPLBODY requests.
Verbosely list mild errors encountered: bad recipient addresses, etc.
Verification session transcript.
MX servers checks.
List emails being checked.
Additional info.
Info about hostnames in relayed domain list
Debugging of the virtual engine.
Message modification lists.
Debug message modification operations and Sendmail macros registered.
List SMTP stages (‘xxfi_*’ calls).
Cleanup calls.
Preprocessor.
Show command line of the preprocessor being run.
Stack operations
Debug exception state save/restore operations.
Mild errors.
List calls to ‘spf_eval_record’, ‘spf_test_record’, ‘spf_check_host_internal’, etc.
General debug info.
Explicitly list A records obtained when processing the ‘a’ SPF mechanism.
Categories starting with ‘bi_’ debug built-in modules:
Database functions.
List database look-ups.
Trace operations on the greylisting database.
SpamAssassin and ClamAV API.
Report the findings of the ‘clamav’ function.
Trace payload in interactions with ‘spamd’.
I/O functions.
Debug the following functions: open
, spawn
, write
.
Report stderr redirection.
Report external commands being run.
Mailbox functions.
Report opened mailboxes.
Other built-ins.
Report results of checks for existence of usernames.
For example, the following invocation enables levels up to ‘trace2’ in category ‘engine’, all levels in category ‘savsrv’ and levels up to ‘trace0’ in category ‘srvman’:
$ mailfromd --debug='engine.trace2;savsrv;srvman.trace0'
You need to have sufficient knowledge about mailfromd
internal structure to use this form of the --debug option.
To control the execution of the sender verification functions (see SMTP Callout functions), you may use --transcript (-X) command line option which enables transcripts of SMTP sessions in the logs. Here is an example of the output produced running mailfromd --transcript:
k8DHxlCa001774: RECV: 220 spf-jail1.us4.outblaze.com ESMTP Postfix k8DHxlCa001774: SEND: HELO mail.gnu.org.ua k8DHxlCa001774: RECV: 250 spf-jail1.us4.outblaze.com k8DHxlCa001774: SEND: MAIL FROM: <> k8DHxlCa001774: RECV: 250 Ok k8DHxlCa001774: SEND: RCPT TO: <t1Kmx17Q@malaysia.net> k8DHxlCa001774: RECV: 550 <>: No thank you rejected: Account Unavailable: Possible Forgery k8DHxlCa001774: poll exited with status: not_found; sent "RCPT TO: <t1Kmx17Q@malaysia.net>", got "550 <>: No thank you rejected: Account Unavailable: Possible Forgery" k8DHxlCa001774: SEND: QUIT
Next: Notes, Previous: Logging and Debugging, Up: Tutorial [Contents][Index]
A runtime error is a special condition encountered during
execution of the filter program, that makes further execution of
the program impossible. There are two kinds of runtime errors: fatal
errors, and uncaught exceptions. Whenever a runtime error occurs,
mailfromd
writes into the log file the following message:
RUNTIME ERROR near file:line: text
where file:line indicates approximate source file location where the error occurred and text gives the textual description of the error.
Fatal runtime errors are caused by a condition that is impossible to fix at run time. For version 9.0 these are:
There is not enough memory for the execution of the program. Try to
make more memory available for mailfromd
or to reduce
its memory requirements by rewriting your filter script.
These errors are reported when there is not enough space left on
stack to perform the requested operation, and the attempt to resize the stack
has failed. Usually mailfromd
expands the stack when the need
arises (see automatic stack resizing). This runtime error
indicates that there were no more memory available for stack
expansion. Try to make more memory available for mailfromd
or to reduce its memory requirements by rewriting your filter script.
Program attempted to pop a value off the stack but the stack was
already empty. This indicates an internal error in the
MFL compiler or mailfromd
runtime engine. If you
ever encounter this error,
please report it to bug-mailfromd@gnu.org.ua. Include
the log fragment (about 10-15 lines before and after this log message)
and your filter script. See Reporting Bugs, for more
information about bug reporting.
The program counter is out of allowed range. This is a severe
error, indicating an internal inconsistency in mailfromd
runtime engine. If you encounter it, please report it to
bug-mailfromd@gnu.org.ua. Include the log fragment (about
10-15 lines before and after this log message) and your filter script.
See Reporting Bugs, for more information about how to report a
bug.
These indicate a programmatic error in your filter script, which the MFL compiler was unable to discover at compilation stage:
The throw
statement used a not existent exception number n.
Fix the statement and restart mailfromd
. See throw, for
the information about throw
statement and see Exceptions,
for the list of available exception codes.
You have used a back-reference (see Back references), where there is no previous regular expression to refer to. Fix this line in your code and restart the program.
You have used a back-reference (see Back references), with a number greater than the number of available groups in the previous regular expression. For example:
if $f matches "(.*)@gnu.org"
# Wrong: there is only one group in the regexp above!
set x \2
…
Fix your code and restart the daemon.
Another kind of runtime errors are uncaught exceptions, i.e. exceptional conditions for which no handler was installed (See Exceptions, for information on exceptions and on how to handle them). These errors mean that the programmer (i.e. you), made no provision for some specific condition. For example, consider the following code:
prog envfrom do if $f mx matches "yahoo.com" foo() fi done
It is syntactically correct, but it overlooks the fact that mx
matches
may generate e_temp_failure
exception, if the underlying
DNS query has timed out (see Special comparisons). If
this happens, mailfromd
has no instructions on what to do
next and reports an error. This can easily be fixed using a
try
/catch
(see Catch and Throw) statement, e.g.:
prog envfrom do try do if $f mx matches "yahoo.com" foo() fi done # Catch DNS errors catch e_temp_failure or e_failure do tempfail 451 4.1.1 "MX verification failed" done done
Another common case are undefined Sendmail macros. In this case the
e_macroundef
exception is generated:
RUNTIME ERROR near foo.c:34: Macro not defined: {client_adr}
These can be caused either by misspelling the macro name (as in the example message above) or by failing to export the required name in Sendmail milter configuration (see exporting macros). This error should be fixed either in your source code or in sendmail.cf file, but if you wish to provide a special handling for it, you can use the following catch statement:
catch e_macroundef do … done
Sometimes the location indicated with the runtime error message is
not enough to trace the origin of the error. For example, an error
can be generated explicitly with throw
statement
(see throw):
RUNTIME ERROR near match_cidr.mfl:30: invalid CIDR (text)
If you look in module match_cidr.mfl, you will see the following code (line numbers added for reference):
23 func match_cidr(string ipstr, string cidr) 24 returns number 25 do 26 number netmask 27 28 if cidr matches '^(([0-9]{1,3}\.){3}[0-9]{1,3})/([0-9][0-9]?)' 29 return inet_aton(ipstr) & len_to_netmask(\3) = inet_aton(\1) 30 else 31 throw invcidr "invalid CIDR (%cidr)" 32 fi 33 return 0 34 done
Now, it is obvious that the value of cidr
argument to
match_cidr
was wrong, but how to find the caller that passed
the wrong value to it? The special command line option
--stack-trace is provided for this. This option enables
dumping stack traces when a fatal error occurs. Traces
contain information about function calls. Continuing our example,
using the --stack-trace option you will see the following diagnostics:
RUNTIME ERROR near match_cidr.mfl:30: invalid CIDR (127%) mailfromd: Stack trace: mailfromd: 0077: match_cidr.mfl:31: match_cidr mailfromd: 0096: test.mfl:13: bar mailfromd: 0110: mailfromd.mfl:18: foo mailfromd: Stack trace finishes mailfromd: Execution of the configuration program was not finished
Each trace line describes one stack frame. The lines appear in the order of most recently called to least recently called. Each frame consists of:
Thus, the example above can be read as: “the function
match_cidr
was called by the function bar
in file
test.mfl at line 13. This function was called from
the function bar
, in file test.mfl at line 13. In its turn,
bar
was called by the function foo
, in file
mailfromd.mfl at line 18”.
Examining caller functions will help you localize the source of the error and fix it.
You can also request a stack trace any place in your code, by
calling the stack_trace
function. This can be useful for
debugging.
Previous: Runtime errors, Up: Tutorial [Contents][Index]
This section discusses some potential culprits in the MFL.
It is important to execute special caution when writing format
strings for sprintf
(see String formatting) and strftime
(see strftime) functions. They use ‘%’ as a character
introducing conversion specifiers, while the same character is used to
expand a MFL variable within a string. To prevent this
misinterpretation, always enclose format specification in single
quotes (see singe-vs-double). To illustrate this, let’s consider
the following example:
echo sprintf ("Mail from %s", $f)
If a variable s
is not declared, this line will produce the
‘Variable s is not defined’ error message, which will allow you
to identify and fix the bug. The situation is considerably worse if
s
is declared. In that case you will see no warning message,
as the statement is perfectly valid, but at the run-time the variable
s
will be interpreted within the format string, and its value
will replace %s
. To prevent this from happening, single quotes
must be used:
echo sprintf ('Mail from %s', $f)
This does not limit the functionality, since there is no need to fall back to variable interpretation in format strings.
Yet another dangerous feature of the language is the way to refer to variable and constant names within literal strings. To expand a variable or a constant the same notation is used (See Variables, and see Constants). Now, lets consider the following code:
const x 2 string x "X" prog envfrom do echo "X is %x" done
Does %x
in echo
refers to the variable or to the
constant? The correct answer is ‘to the variable’. When executed, this
code will print ‘X is X’.
As of version 9.0, mailfromd
will always print a
diagnostic message whenever it stumbles upon a variable having the
same name as a previously defined constant or vice versa. The
resolution of such name clashes is described in detail in
See variable--constant shadowing.
Future versions of the program may provide a non-ambiguous way of referring to variables and constants from literal strings.
For more information about exceptions and their handling, please refer to Exceptions.
class ‘w’, see Sendmail Installation and Operation Guide, chapter 5.2.
class ‘R’
Sendmail (tm) Installation and Operation Guide, chapter 5.6, ‘O -- Set Option’.
When MFL has array data type, the second argument will change to array of strings.
Previous: Runtime errors, Up: Tutorial [Contents][Index]