Tutorial (Mailfromd Manual)

3 Tutorial

This chapter contains a tutorial introduction, guiding you through various mailfromd configurations, starting from the simplest ones and proceeding up to more advanced forms. It omits most complicated details, concentrating mainly on the common practical tasks.

If you are familiar with mailfromd, you can skip this chapter and go directly to the next one (see MFL), which contains detailed discussion of the mail filtering language and mailfromd interaction with the Mail Transport Agent.

3.1 Start Up

The mailfromd utility runs as a standalone daemon program and listens on a predefined communication channel for requests from the Mail Transfer Agent (MTA, for short). When processing each message, the MTA installs communication with mailfromd, and goes through several states, collecting the necessary data from the sender. At each state it sends the relevant information to mailfromd, and waits for it to reply. The mailfromd filter receives the message data through Sendmail macros and runs a handler program defined for the given state. The result of this run is a response code, that it returns to the MTA. The following response codes are defined:

continue: Continue message processing from next milter state.
accept: Accept this message for delivery. After receiving this code the MTA continues processing this message without further consulting mailfromd filter.
reject: Reject this message. The message processing stops at this stage, and the sender receives the reject reply (‘5xx’ reply code). No further mailfromd handlers are called for this message.
discard: Silently discard the message. This means that MTA will continue processing this message as if it were going to deliver it, but will discard it after receiving. No further interaction with mailfromd occurs.
tempfail: Temporarily reject the message. The message processing stops at this stage, and the sender receives the ‘temporary failure’ reply (‘4xx’ reply code). No further mailfromd handlers are called for this message.

The instructions on how to process the message are supplied to mailfromd in its filter script file. It is normally called /usr/local/etc/mailfromd.mfl (but can be located elsewhere, see Invocation) and contains a set of milter state handlers, or subroutines to be executed in various SMTP states. Each interaction state can be supplied its own handling procedure. A missing procedure implies continue response code.

The filter script can define up to nine milter state handlers, called after the names of milter states: ‘connect’, ‘helo’, ‘envfrom’, ‘envrcpt’, ‘data’, ‘header’, ‘eoh’, ‘body’, and ‘eom’. The ‘data’ handler is invoked only if MTA uses Milter protocol version 3 or later. Two special handlers are available for initialization and clean-up purposes: ‘begin’ is called before the processing starts, and ‘end’ is called after it is finished. The diagram below shows the control flow when processing an SMTP transaction. Lines marked with C: show SMTP commands issued by the remote machine (the client), those marked with ‘⇒’ show called handlers with their arguments. An ‘[R]’ appearing at the start of a line indicates that this part of the transaction can be repeated any number of times:

⇒ begin()
⇒ connect(hostname, family, port, ‘IP address’)
C: HELO domain
helo(domain)
for each message transaction
do
        C: MAIL FROM sender
        ⇒ envfrom(sender)

[R]     C: RCPT TO recipient
        ⇒ envrcpt(recipient)

        C: DATA
        ⇒ data()
[R]     C: header: value
        ⇒ header(header, value)

        C:
        ⇒ eoh()

[R]     C: body-line
        ⇒ /* Collect lines into blocks blk of
        ⇒  * at most len bytes and for each
        ⇒  * such block call:
        ⇒  */
        ⇒ body(blk, len)

        C: .
        ⇒ eom()
done
⇒ end()

Figure 3.1: Mailfromd Control Flow

This control flow is maintained for as long as each called handler returns continue (see Actions). Otherwise, if any handler returns accept or discard, the message processing continues, but no other handler is called. In the case of accept, the MTA will accept the message for delivery, in the case of discard it will silently discard it.

If any of the handlers returns reject or tempfail, the result depends on the handler. If this code is returned by envrcpt handler, it causes this particular recipient address to be rejected. When returned by any other handler, it causes the whole message will be rejected.

The reject and tempfail actions executed by helo handler do not take effect immediately. Instead, their action is deferred until the next SMTP command from the client, which is usually MAIL FROM.

3.2 Simplest Configurations

The mailfromd script file contains a series of declarations of the handler procedures. Each declaration has the form:

prog name
do
  …
done

where prog, do and done are the keywords, and name is the state name for this handler. The dots in the above example represent the actual code, or a set of commands, instructing mailfromd how to process the message.

For example, the declaration:

prog envfrom
do
  accept
done

installs a handler for ‘envfrom’ state, which always approves the message for delivery, without any further interaction with mailfromd.

The word accept in the above example is an action. Action is a special language statement that instructs the run-time engine to stop execution of the program and to return a response code to the Sendmail. There are five actions, one for each response code: continue, accept, reject, discard, and tempfail. Among these, reject and discard can optionally take one to three arguments. There are two ways of supplying the arguments.

In the first form, called literal or traditional notation, the arguments are supplied as additional words after the action name, separated by whitespace. The first argument is a three-digit RFC 2821 reply code. It must begin with ‘5’ for reject and with ‘4’ for tempfail. If two arguments are supplied, the second argument must be either an extended reply code (RFC 1893/2034) or a textual string to be returned along with the SMTP reply. Finally, if all three arguments are supplied, then the second one must be an extended reply code and the third one must supply the textual string. The following examples illustrate all possible ways of using the reject statement in literal notation:

reject
reject 503
reject 503 5.0.0
reject 503 "Need HELO command"
reject 503 5.0.0 "Need HELO command"

Please note the quotes around the textual string.

Another form for these action is called functional notation, because it resembles the function syntax. When used in this form, the action word is followed by a parenthesized group of exactly three arguments, separated by commas. The meaning and ordering of the argument is the same as in literal form. Any of three arguments may be absent, in which case it will be replaced by the default value. To illustrate this, here are the statements from the previous example, written in functional notation:

reject(,,)
reject(503,,)
reject(503, 5.0.0)
reject(503,, "Need HELO command")
reject(503, 5.0.0, "Need HELO command")

3.3 Conditional Execution

Programs consisting of a single action are rarely useful. In most cases you will want to do some checking and decide whether to process the message depending on its result. For example, if you do not want to accept messages from the address ‘<badguy@some.net>’, you could write the following program:

prog envfrom
do
  if $f = "badguy@some.net"
    reject
  else
    accept
  fi
done

This example illustrates several important concepts. First or all, $f in the third line is a Sendmail macro reference. Sendmail macros are referenced the same way as in sendmail.cf, with the only difference that curly braces around macro names are optional, even if the name consists of several letters. The value of a macro reference is always a string.

The equality operator (‘=’) compares its left and right arguments and evaluates to true if the two strings are exactly the same, or to false otherwise. Apart from equality, you can use the regular relational operators: ‘!=’, ‘>’, ‘>=’, ‘<’ and ‘<=’. Notice that string comparison in mailfromd is always case sensitive. To do case-insensitive comparison, translate both operands to upper or lower case (See tolower, and see toupper).

The if statement decides what actions to execute depending on the value its condition evaluates to. Its usual form is:

if expression then-body [else else-body] fi

The then-body is executed if the expression evaluates to true (i.e. to any non-zero value). The optional else-body is executed if the expression yields false (i.e. zero). Both then-body and else-body can contain other if statements, their nesting depth is not limited. To facilitate writing complex conditional statements, the elif keyword can be used to introduce alternative conditions, for example:

prog envfrom
do
  if $f = "badguy@some.net"
    reject
  elif $f = "other@domain.com"
    tempfail 470 "Please try again later"
  else
    accept
  fi
done

See switch, for more elaborate forms of conditional branching.

3.4 Functions and Modules

As any programming language, MFL supports a concept of function, i.e. a body of code that is assigned a unique name and can be invoked elsewhere as many times as needed.

All functions have a definition that introduces types and names of the formal parameters and the result type, if the function is to return a meaningful value (function definitions in MFL are discussed in detail in see User-Defined Functions).

A function is invoked using a special construct, a function call:

 name (arg-list)

where name is the function name, and arg-list is a comma-separated list of expressions. Each expression in arg-list is evaluated, and its type is compared with that of the corresponding formal argument. If the types differ, the expression is converted to the formal argument type. Finally, a copy of its value is passed to the function as a corresponding argument. The order in which the expressions are evaluated is not defined. The compiler checks that the number of elements in arg-list match the number of mandatory arguments for function name.

If the function does not deliver a result, it should only be called as a statement.

Functions may be recursive, even mutually recursive.

Mailfromd comes with a rich set of predefined functions for various purposes. There are two basic function classes: built-in functions, that are implemented by the MFL runtime environment in mailfromd, and library functions, that are implemented in MFL. The built-in functions are always available and no preparatory work is needed before calling them. In contrast, the library functions are defined in modules, special MFL source files that contain functions designed for a particular task. In order to access a library function, you must first require a module it is defined in. This is done using require statement. For example, the function hostname looks up in the DNS the name corresponding to the IP address specified as its argument. This function is defined in module dns.mfl, so before calling it you must require this module:

require dns

The require statement takes a single argument: the name of the requested module (without the ‘.mfl’ suffix). It looks up the module on disk and loads it if it is available.

For more information about the module system See Modules.

3.5 Domain Name System

Site administrators often do not wish to accept mail from hosts that do not have a proper reverse delegation in the Domain Name System. In the previous section we introduced the library function hostname, that looks up in the DNS the name corresponding to the IP address specified as its argument. If there is no corresponding name, the function returns its argument unchanged. This can be used to test if the IP was resolved, as illustrated in the example below:

require 'dns'

prog envfrom
do
  if hostname($client_addr) = $client_addr
    reject
  fi
done

The #require dns statement loads the module dns.mfl, after which the definition of hostname becomes available.

A similar function, resolve, which resolves the symbolic name to the corresponding IP address is provided in the same dns.mfl module.

3.6 Checking Sender Address

A special language construct is provided for verification of sender addresses (callout):

on poll $f do
when success:
  accept
when not_found or failure:
  reject 550 5.1.0 "Sender validity not confirmed"
when temp_failure:
  tempfail 450 4.1.0 "Try again later"
done

The on poll construct runs standard verification (see standard verification) for the email address specified as its argument (in the example above it is the value of the Sendmail macro ‘$f’). The check can result in the following conditions:

success: The address exists.
not_found: The address does not exist.
failure: Some error of permanent nature occurred during the check. The existence of the address cannot be verified.
temp_failure: Some temporary failure occurred during the check. The existence of the address cannot be verified at the moment.

The when branches of the on poll statement introduce statements, that are executed depending on the actual return condition. If any condition occurs that is not handled within the on block, the run-time evaluator will signal an exception⁵ and return temporary failure, therefore it is advisable to always handle all four conditions. In fact, the condition handling shown in the above example is preferable for most normal configurations: the mail is accepted if the sender address is proved to exist and rejected otherwise. If a temporary failure occurs, the remote party is urged to retry the transaction some time later.

The poll statement itself has a number of options that control the type of the verification. These are discussed in detail in poll.

It is worth noticing that there is one special email address which is always available on any host, it is the null address ‘<>’ used in error reporting. It is of no use verifying its existence:

prog envfrom
do
  if $f == ""
    accept
  else
    on poll $f do
    when success:
      accept
    when not_found or failure:
      reject 550 5.1.0 "Sender validity not confirmed"
    when temp_failure:
      tempfail 450 4.1.0 "Try again later"
    done
  fi
done

3.7 SMTP Timeouts

When using polling functions, it is important to take into account possible delays, which can occur in SMTP transactions. Such delays may be due to low network bandwidth or high load on the remote server. Some sites impose them willingly, as a spam-fighting measure.

Ideally the callout verification should use the timeout values defined in the RFC 2822, but this is impossible in practice, because it would cause a timeout escalation, which consists in propagating delays encountered in a callout SMTP session back to the remote client whose session initiated the callout.

Consider, for example, the following scenario. An MFL script performs a callout on ‘envfrom’ stage. The remote server is overloaded and delays heavily in responding, so that the initial response arrives 3 minutes after establishing the connection, and processing the ‘EHLO’ command takes another 3 minutes. These delays are OK according to the RFC, which imposes a 5 minute limit for each stage, but while waiting for the remote reply our SMTP server remains in the ‘envfrom’ state with the client waiting for a response to its ‘MAIL’ command more than 6 minutes, which is intolerable, because of the same 5 minute limit. Thus, the client will almost certainly break the session.

To avoid this, mailfromd uses a special instance, called callout server, which is responsible for running callout SMTP sessions asynchronously. The usual sender verification is performed using so-called soft timeout values, which are set to values short enough to not disturb the incoming session (e.g. a timeout for ‘HELO’ response is 3 seconds, instead of 5 minutes). If this verification yields a definite answer, that answer is stored in the cache database and returned to the calling procedure immediately. If, however, the verification is aborted due to a timeout, the caller procedure is returned an ‘e_temp_failure’ exception, and the callout is scheduled for processing by a callout server. This exception normally causes the milter session to return a temporary error to the sender, urging it to retry the connection later.

In the meantime, the callout server runs the sender verification again using another set of timeouts, called hard timeouts, which are normally much longer than ‘soft’ ones (they default to the values required by RFC 2822). If it gets a definitive result (e.g. ‘email found’ or ‘email not found’), the server stores it in the cache database. If the callout ends due to a timeout, a ‘not_found’ result is stored in the database.

Some time later, the remote server retries the delivery, and the mailfromd script is run again. This time, the callout function will immediately obtain the already cached result from the database and proceed accordingly. If the callout server has not finished the request by the time the sender retries the connection, the latter is again returned a temporary error, and the process continues until the callout is finished.

Usually, callout server is just another instance of mailfromd itself, which is started automatically to perform scheduled SMTP callouts. It is also possible to set up a separate callout server on another machine. This is discussed in calloutd.

For a detailed information about callout timeouts and their configuration, see conf-timeout.

For a description of how to configure mailfromd to use callout servers, see conf-server.

3.8 Avoiding Verification Loops

An envfrom program consisting only of the on poll statement will work smoothly for incoming mails, but will create infinite loops for outgoing mails. This is because upon sending an outgoing message mailfromd will start the verification procedure, which will initiate an SMTP transaction with the same mail server that runs it. This transaction will in turn trigger execution of on poll statement, etc. ad infinitum. To avoid this, any properly written filter script should not run the verification procedure on the email addresses in those domains that are relayed by the server it runs on. This can be achieved using relayed function. The function returns true if its argument is contained in one of the predefined domain list files. These files correspond to Sendmail plain text files used in F class definition forms (see Sendmail Installation and Operation Guide, chapter 5.3), i.e. they contain one domain name per line, with empty lines and lines started with ‘#’ being ignored. The domain files consulted by relayed function are defined in the relayed-domain-file configuration file statement (see relayed-domain-file):

relayed-domain-file (/etc/mail/local-host-names,
                     /etc/mail/relay-domains);

or:

relayed-domain-file /etc/mail/local-host-names;
relayed-domain-file /etc/mail/relay-domains;

The above example declares two domain list files, most commonly used in Sendmail installations to keep hostnames of the server ⁶ and names of the domains, relayed by this server⁷.

Given all this, we can improve our filter program:

require 'dns'

prog envfrom
do
  if $f == ""
    accept
  elif relayed(hostname(${client_addr}))
    accept
  else
    on poll $f do
    when success:
      accept
    when not_found or failure:
      reject 550 5.1.0 "Sender validity not confirmed"
    when temp_failure:
      tempfail 450 4.1.0 "Try again later"
    done
  fi
done

If you feel that your Sendmail’s relayed domains are not restrictive enough for mailfromd filters (for example you are relaying mails from some third-party servers), you can use a database of trusted mail server addresses. If the number of such servers is small enough, a single ‘or’ statement can be used, e.g.:

  elif ${client_addr} = "10.10.10.1"
       or ${client_addr} = "192.168.11.7"
    accept
  …

otherwise, if the servers’ IP addresses fall within one or several CIDRs, you can use the match_cidr function (see Internet address manipulation functions), e.g.:

  elif match_cidr (${client_addr}, "199.232.0.0/16")
    accept
  …

or combine both methods. Finally, you can keep a DBM database of relayed addresses and use dbmap or dbget function for checking (see Database functions).

  elif dbmap("%__statedir__/relay.db", ${client_addr})
    accept
  …

3.9 HELO Domain

Some of the mail filtering conditions may depend on the value of helo domain name, i.e. the argument to the SMTP EHLO (or HELO) command. If you ever need such conditions, take into account the following caveats. Firstly, although Sendmail passes the helo domain in $s macro, it does not do this consistently. In fact, the $s macro is available only to the helo handler, all other handlers won’t see it, no matter what the value of the corresponding Milter.macros.handler statement. So, if you wish to access its value from any handler, other than helo, you will have to store it in a variable in the helo handler and then use this variable value in the other handler. This approach is also recommended for another MTAs. This brings us to the concept of variables in mailfromd scripts.

A variable is declared using the following syntax:

type name

where variable is the variable name and type is ‘string’, if the variable is to hold a string value, and ‘number’, if it is supposed to have a numeric value.

A variable is assigned a value using the set statement:

set name expr

where expr is any valid MFL expression.

The set statement can occur within handler or function declarations as well as outside of them.

There are two kinds of Mailfromd variables: global variables, that are visible to all handlers and functions, and automatic variables, that are available only within the handler or function where they are declared. For our purpose we need a global variable (See Variable classes, for detailed descriptions of both kinds of variables).

The following example illustrates an approach that allows to use the HELO domain name in any handler:

# Declare the helohost variable
string helohost

prog helo
do
  # Save the host name for further use
  set helohost $s
done

prog envfrom
do
  # Reject hosts claiming to be localhost
  if helohost = "localhost"
    reject 570 "Please specify real host name"
  fi
done

Notice, that for this approach to work, your MTA must export the ‘s’ macro (e.g., in case of Sendmail, the Milter.macros.helo statement in the sendmail.cf file must contain ‘s’. see Sendmail). This requirement can be removed by using the handler argument of helo. Each mailfromd handler is given one or several arguments. The exact number of arguments and their meaning are handler-specific and are described in Handlers, and Figure 3.1. The arguments are referenced by their ordinal number, using the notation $n. The helo handler takes one argument, whose value is the helo domain. Using this information, the helo handler from the example above can be rewritten as follows:

prog helo
do
  # Save the host name for further use
  set helohost $1
done

3.10 SMTP RSET and Milter Abort Handling

In previous section we have used a global variable to hold certain information and share it between handlers. In the majority of cases, such information is session specific, and becomes invalid if the remote party issues the SMTP RSET command. Therefore, mailfromd clears all global variables when it receives a Milter ‘abort’ request, which is normally generated by this command.

However, you may need some variables that retain their values even across SMTP session resets. In mailfromd terminology such variables are called precious. Precious variables are declared by prefixing their declaration with the keyword precious. Consider, for example, this snippet of code:

precious number rcpt_counter

prog envrcpt
do
  set rcpt_counter rcpt_counter + 1
done

Here, the variable ‘rcpt_counter’ is declared as precious and its value is incremented each time the ‘envrcpt’ handler is called. This way, ‘rcpt_counter’ will keep the total number of SMTP RCPT commands issued during the session, no matter how many times it was restarted using the RSET command.

3.11 Controlling Number of Recipients

Any MTA provides a way to limit the number of recipients per message. For example, in Sendmail you may use the MaxRecipientsPerMessage option⁸. However, such methods are not flexible, so you are often better off using mailfromd for this purpose.

Mailfromd keeps the number of recipients collected so far in variable rcpt_count, which can be controlled in envrcpt handler as shown in the example below:

prog envrcpt
do
  if rcpt_count > 10
    reject 550 5.7.1 "Too many recipients"
  fi
done

This filter will accept no more than 10 recipients per message. You may achieve finer granularity by using additional conditions. For example, the following code will allow any number of recipients if the mail is coming from a domain relayed by the server, while limiting it to 10 for incoming mail from other domains:

prog envrcpt
do
  if not relayed(hostname($client_addr)) and rcpt_count > 10
    reject 550 5.7.1 "Too many recipients"
  fi
done

There are three important features to notice in the above code. First of all, it introduces two boolean operators: and, which evaluates to true only if both left-side and right-side expressions are true, and not, which reverses the value of its argument.

Secondly, the scope of an operation is determined by its precedence, or binding strength. Not binds more tightly than and, so its scope is limited by the next expression between it and and. Using parentheses to underline the operator scoping, the above if condition can be rewritten as follows:

    if (not (relayed(hostname($client_addr)))) and (%rcpt_count > 10)

Finally, it is important to notice that all boolean expressions are computed using shortcut evaluation. To understand what it is, let’s consider the following expression: x and y. Its value is true only if both x and y are true. Now suppose that we evaluate the expression from left to right and we find that x is false. This means that no matter what the value of y is, the resulting expression will be false, therefore there is no need to compute y at all. So, the boolean shortcut evaluation works as follows:

x and y: If x ⇒ false, do not evaluate y and return false.
x or y: If x ⇒ true, do not evaluate y and return true.

Thus, in the expression not relayed(hostname($client_addr)) and rcpt_count > 10, the value of the rcpt_count variable will be compared with ‘10’ only if the relayed function yielded false.

To further enhance our sample filter, you may wish to make the reject output more informative, to let the sender know what the recipient limit is. To do so, you can use the concatenation operator ‘.’ (a dot):

set max_rcpt 10
prog envrcpt
do
  if not relayed(hostname($client_addr)) and rcpt_count > 10
    reject 550 5.7.1 "Too many recipients, max=" . max_rcpt
  fi
done

When evaluating the third argument to reject, mailfromd will first convert max_rcpt to string and then concatenate both strings together, producing string ‘Too many recipients, max=10’.

3.12 Sending Rate

We have introduced the notion of mail sending rate in Rate Limit. Mailfromd keeps the computed rates in the special rate database (see Databases). Each record in this database consists of a key, for which the rate is computed, and the rate value, in form of a double precision floating point number, representing average number of messages per second sent by this key within the last sampling interval. In the simplest case, the sender email address can be used as a key, however we recommend to use a conjunction email-sender_ip instead, so the actual email owner won’t be blocked by actions of some spammer abusing his/her address.

Two functions are provided to control and update sending rates. The rateok function takes three mandatory arguments:

  bool rateok(string key, number interval, number threshold)

The key meaning is described above. The interval is the sampling interval, or the number of seconds to which the actual sending rate value is converted. Remember that it is stored internally as a floating point number, and thus cannot be directly used in mailfromd filters, which operate only on integer numbers. To use the rate value, it is first converted to messages per given interval, which is an integer number. For example, the rate 0.138888 brought to 1-hour interval gives 500 (messages per hour).

When the rateok function is called, it recomputes rate record for the given key. If the new rate value converted to messages per given interval is less than threshold, the function updates the database and returns True. Otherwise it returns False and does not update the database.

This function must be required prior to use, by placing the following statement somewhere at the beginning of your script:

require rateok

For example, the following code limits the mail sending rate for each ‘email address’-‘IP’ combination to 180 per hour. If the actual rate value exceeds this limit, the sender is returned a temporary failure response:

require rateok

prog envfrom
do
  if not rateok($f . "-" . ${client_addr}, 3600, 180)
    tempfail 450 4.7.0 "Mail sending rate exceeded.  Try again later"
  fi
done

Notice argument concatenation, used to produce the key.

It is often inconvenient to specify intervals in seconds, therefore a special interval function is provided. It converts its argument, which is a textual string representing time interval in English, to the corresponding number of seconds. Using this function, the function invocation would be:

     rateok($f . "-" . ${client_addr}, interval("1 hour"), 180)

The interval function is described in interval, and time intervals are discussed in time interval specification.

The rateok function begins computing the rate as soon as it has collected enough data. By default, it needs at least four mails. Since this may lead to a big number of false positives (i.e. overestimated rates) at the beginning of sampling interval, there is a way to specify a minimum number of samples rateok must collect before starting to actually compute rates. This number of samples is given as the optional fourth argument to the function. For example, the following call will always return True for the first 10 mails, no matter what the actual rate:

     rateok($f . "-" . ${client_addr}, interval("1 hour"), 180, 10)

The tbf_rate function allows to exercise more control over the mail rates. This function implements a token bucket filter (TBF) algorithm.

The token bucket controls when the data can be transmitted based on the presence of abstract entities called tokens in a container called bucket. Each token represents some amount of data. The algorithm works as follows:

A token is added to the bucket at a constant rate of 1 token per t microseconds.
A bucket can hold at most m tokens. If a token arrives when the bucket is full, that token is discarded.
When n items of data arrive (e.g. n mails), n tokens are removed from the bucket and the data are accepted.
If fewer than n tokens are available, no tokens are removed from the bucket and the data are not accepted.

This algorithm allows to keep the data traffic at a constant rate t with bursts of up to m data items. Such bursts occur when no data was being arrived for m*t or more microseconds.

Mailfromd keeps buckets in a database ‘tbf’. Each bucket is identified by a unique key. The tbf_rate function is defined as follows:

 bool tbf_rate(string key, number n, number t, number m)

The key identifies the bucket to operate upon. The rest of arguments is described above. The tbf_rate function returns ‘True’ if the algorithm allows to accept the data and ‘False’ otherwise.

Depending on how the actual arguments are selected the tbf_rate function can be used to control various types of flow rates. For example, to control mail sending rate, assign the arguments as follows: n to the number of mails and t to the control interval in microseconds:

prog envfrom
do
  if not tbf_rate($f . "-" . $client_addr, 1, 10000000, 20)
    tempfail 450 4.7.0 "Mail sending rate exceeded.  Try again later"
  fi
done

The example above permits to send at most one mail each 10 seconds. The burst size is set to 20.

Another use for the tbf_rate function is to limit the total delivered mail size per given interval of time. To do so, the function must be used in prog eom handler, because it is the only handler where the entire size of the message is known. The n argument must contain the number of bytes in the email (or email bytes * number of recipients), and the t must be set to the number of bytes per microsecond a given user is allowed to send. The m argument must be large enough to accommodate a couple of large emails. E.g.:

  prog eom
  do
    if not tbf_rate("$f-$client_addr",
                    message_size(current_message()),
                    10240*1000000,  # At most 10 kb/sec
                    10*1024*1024)
      tempfail 450 4.7.0 "Data sending rate exceeded.  Try again later"
    fi
  done

See Rate limiting functions, for more information about rateok and tbf_rate functions.

3.13 Greylisting

Greylisting is a simple method of defending against the spam proposed by Evan Harris. In few words, it consists in recording the ‘sender IP’-‘sender email’-‘recipient email’ triplet of mail transactions. Each time the unknown triplet is seen, the corresponding message is rejected with the tempfail code. If the mail is legitimate, this will make the originating server retry the delivery later, until the destination eventually accepts it. If, however, the mail is a spam, it will probably never be retried, so the users will not be bothered by it. Even if the spammer will retry the delivery, the greylisting period will give spam-detection systems, such as DNSBLs, enough time to detect and blacklist it, so by the time the destination host starts accepting emails from this triplet, it will already be blocked by other means.

You will find the detailed description of the method in The Next Step in the Spam Control War: Greylisting, the original whitepaper by Evan Harris.

The mailfromd implementation of greylisting is based on greylist function. The function takes two arguments: the key, identifying the greylisting triplet, and the interval. The function looks up the key in the greylisting database. If such a key is not found, a new entry is created for it and the function returns true. If the key is found, greylist returns false, if it was inserted to the database more than interval seconds ago, and true otherwise. In other words, from the point of view of the greylisting algorithm, the function returns true when the message delivery should be blocked. Thus, the simplest implementation of the algorithm would be:

prog envrcpt
do
 if greylist("${client_addr}-$f-${rcpt_addr}", interval("1 hour"))
   tempfail 451 4.7.1 "You are greylisted"
 fi
done

However, the message returned by this example, is not informative enough. In particular, it does not tell when the message will be accepted. To help you produce more informative messages, greylist function stores the number of seconds left to the end of the greylisting period in the global variable greylist_seconds_left, so the above example could be enhanced as follows:

prog envrcpt
do
  set gltime interval("1 hour")
  if greylist("${client_addr}-$f-${rcpt_addr}", gltime)
    if greylist_seconds_left = gltime
      tempfail 451 4.7.1
         "You are greylisted for %gltime seconds"
    else
      tempfail 451 4.7.1
         "Still greylisted for %greylist_seconds_left seconds"
    fi
  fi
done

In real life you will have to avoid greylisting some messages, in particular those coming from the ‘<>’ address and from the IP addresses in your relayed domain. It can easily be done using the techniques described in previous sections and is left as an exercise to the reader.

Mailfromd provides two implementations of greylisting primitives, which differ in the information stored in the database. The one described above is called traditional. It keeps in the database the time when the greylisting was activated for the given key, so the greylisting function uses its second argument (interval) and the current timestamp to decide whether the key is still greylisted.

The second implementation is called by the name of its inventor Con Tassios. This implementation stores in the database the time when the greylisting period is set to expire, computed by the greylist when it is first called for the given key, using the formula ‘current_timestamp + interval’. Subsequent calls to greylist compare the current timestamp with the one stored in the database and ignore their second argument. This implementation is enabled by one of the following pragmas:

#pragma greylist con-tassios

#pragma greylist ct

When Con Tassios implementation is used, yet another function becomes available. The function is_greylisted (see is_greylisted) returns ‘True’ if its argument is greylisted and ‘False’ otherwise. It can be used to check for the greylisting status without actually updating the database:

  if is_greylisted("${client_addr}-$f-${rcpt_addr}")
    …
  fi

One special case is whitelisting, which is often used together with greylisting. To implement it, mailfromd provides the function dbmap, which takes two mandatory arguments: dbmap(file, key) (it also allows an optional third argument, see dbmap, for more information on it). The first argument is the name of the DBM file where to search for the key, the second one is the key to be searched. Assuming you keep your whitelist database in file /var/run/whitelist.db, a more practical example will be:

prog envrcpt
do
  set gltime interval("1 hour")

  if not ($f = "" or relayed(hostname(${client_addr}))
         or dbmap("/var/run/whitelist.db", ${client_addr}))
    if greylist("${client_addr}-$f-${rcpt_addr}", gltime)
      if greylist_seconds_left = gltime
        tempfail 451 4.7.1
           "You are greylisted for %gltime seconds"
      else
        tempfail 451 4.7.1
           "Still greylisted for %greylist_seconds_left seconds"
      fi
    fi
  fi
done

3.14 Local Account Verification

In your filter script you may need to verify if the given user name is served by your mail server, in other words, to verify if it represents a local account. Notice that in this context, the word local does not necessarily mean that the account is local for the server running mailfromd, it simply means any account whose mailbox is served by the mail servers using mailfromd.

The validuser function may be used for this purpose. It takes one argument, the user name, and returns true if this name corresponds to a local account. To verify this, the function relies on libmuauth, a powerful authentication library shipped with GNU mailutils. More precisely, it invokes a list of authorization functions. Each function is responsible for looking up the user name in a particular source of information, such as system passwd database, an SQL database, etc. The search is terminated when one of the functions finds the name in question or the list is exhausted. In the former case, the account is local, in the latter it is not. This concept is discussed in detail in see Authorization and Authentication Principles in GNU Mailutils Manual). Here we will give only some practical advices for implementing it in mailfromd filters.

The actual list of available authorization modules depends on your mailutils installation. Usually it includes, apart from traditional UNIX passwd database, the functions for verifying PAM, RADIUS and SQL database accounts. Each of the authorization methods is configured using special configuration file statements. For the description of the Mailutils configuration files, See Mailutils Configuration File in GNU Mailutils Manual. You can obtain the template for mailfromd configuration by running mailfromd --config-help.

For example, the following mailfromd.conf file:

auth {
  authorization pam:system;
}

pam {
  service mailfromd;
}

sets up the authorization using PAM and system passwd database. The name of PAM service to use is ‘mailfromd’.

The function validuser is often used together with dbmap, as in the example below:

#pragma dbprop /etc/mail/aliases.db null

if dbmap("/etc/mail/aliases.db", localpart($rcpt_addr))
   and validuser(localpart($rcpt_addr))
  …
fi

For more information about dbmap function, see dbmap. For a description of dbprop pragma, see Database functions.

3.15 Databases

Some mailfromd functions use DBM databases to save their persistent state data. Each database has a unique identifier, and is assigned several pieces of information for its maintenance: the database file name and the expiration period, i.e. the time after which a record is considered expired.

To obtain the list of available databases along with their preconfigured settings, run mailfromd --show-defaults (see Examining Defaults). You will see an output similar to this:

version:             9.0
script file:         /etc/mailfromd.mfl
preprocessor:        /usr/bin/m4 -s
user:                mail
statedir:            /var/run/mailfromd
socket:              unix:/var/run/mailfromd/mailfrom
pidfile:             /var/run/mailfromd/mailfromd.pid
default syslog:          blocking
supported databases:     gdbm, bdb
default database type:   bdb
optional features:   DKIM GeoIP2 STARTTLS
greylist database:      /var/run/mailfromd/greylist.db
greylist expiration:    86400
tbf database:        /var/run/mailfromd/tbf.db
tbf expiration:      86400
rate database:      /var/run/mailfromd/rates.db
rate expiration:    86400
cache database:      /var/run/mailfromd/mailfromd.db
cache positive expiration: 86400
cache negative expiration: 43200

The text below ‘optional features’ line describes the available built-in databases. Notice that the ‘cache’ database, in contrast to the rest of databases, has two expiration periods associated with it. This is explained in the next subsection.

3.15.1 Database Formats

The version 9.0 runs the following database types (or formats):

‘cache’

Cache database keeps the information about external emails, obtained using sender verification functions (see Checking Sender Address). The key entry to this database is an email address or email:sender-ip string, for addresses checked using strict verification. The data its stores for each key are:

Address validity. This field can be either success or not_found, meaning the address is confirmed to exists or it is not.
The time when the entry was entered into the database. It is used to check for expired entries.

The ‘cache’ database has two expiration periods: a positive expiration period, that is applied to entries with the first field set to success, and a negative expiration period, applied to entries marked as not_found.

‘rate’

The mail sending rate data, maintained by rate function (see Rate limiting functions). A record consists of the following fields:

timestamp: The time when the entry was entered into the database.
interval: Interval during which the rate was measured (seconds).
count: Number of mails sent during this interval.

‘tbf’

This database is maintained by tbf_rate function (see TBF). Each record represents a single bucket and consists of the following keys:

timestamp: Timestamp of most recent token, as a 64-bit unsigned integer (microseconds resolution).
expirytime: Estimated time when this bucket expires (seconds since epoch).
tokens: Number of tokens in the bucket (size_t).

‘greylist’

This database is maintained by greylist function (see Greylisting). Each record holds only the timestamp. Its semantics depends on the greylisting implementation in use (see greylisting types). In traditional implementation, it is the time when the entry was entered into the database. In Con Tassios implementation, it is the time when the greylisting period expires.

3.15.2 Basic Database Operations

The mfdbtool utility is provided for performing various operations on the mailfromd database.

To list the contents of a database, use --list option. When used without any arguments it will list the ‘cache’ database:

$ mfdbtool --list
abrakat@mail.com           success Thu Aug 24 15:28:58 2006
baccl@EDnet.NS.CA          not_found Fri Aug 25 10:04:18 2006
bhzxhnyl@chello.pl       not_found Fri Aug 25 10:11:57 2006
brqp@aaanet.ru:24.1.173.165  not_found Fri Aug 25 14:16:06 2006

You can also list data for any particular key or keys. To do so, give the keys as arguments to mfdbtool:

$ mfdbtool --list abrakat@mail.com brqp@aaanet.ru:24.1.173.165
abrakat@mail.com           success Thu Aug 24 15:28:58 2006
brqp@aaanet.ru:24.1.173.165  not_found Fri Aug 25 14:16:06 2006

To list another database, give its format identifier with the --format (-H) option. For example, to list the ‘rate’ database:

$ mfdbtool --list --format=rate
sam@mail.net-62.12.4.3 Wed Sep  6 19:41:42 2006  139   3 0.0216 6.82e-06
axw@rame.com-59.39.165.172 Wed Sep  6 20:26:24 2006  0  1  N/A  N/A

The --format option can be used with any database management option, described below.

Another useful operation you can do while listing ‘rate’ database is the prediction of estimated time of sending, i.e. the time when the user will be able to send mail if currently his mail sending rate has exceeded the limit. This is done using --predict option. The option takes an argument, specifying the mail sending rate limit, e.g. (the second line is split for readability):

$ mfdbtool --predict="180 per 1 minute"
ed@fae.net-21.10.1.2 Wed Sep 13 03:53:40 2006  0 1 N/A N/A; free to send
service@19.netlay.com-69.44.129.19 Wed Sep 13 15:46:07 2006 7 2
   0.286   0.0224; in 46 sec. on Wed Sep 13 15:49:00 2006

Notice, that there is no need to use --list --format=rate along with this option, although doing so is not an error.

To delete an entry from the database, use --delete option, for example: mfdbtool --delete abrakat@mail.com. You can give any number of keys to delete in the command line.

3.15.3 Database Maintenance

There are two principal operations of database management: expiration and compaction. Expiration consists in removing expired entries from the database. In fact, it is rarely needed, since the expired entries are removed in the process of normal mailfromd work. Nevertheless, a special option is provided in case an explicit expiration is needed (for example, before dumping the database to another format, to avoid transferring useless information).

The command line option --expire instructs mfdbtool to delete expired entries from the specified database. As usual, the database is specified using --format option. If it is not given explicitly, ‘cache’ is assumed.

While removing expired entries the space they occupied is marked as free, so it can be used by subsequent inserts. The database does not shrink after expiration is finished. To actually return the unused space to the file system you should compact your database.

This is done by running mfdbtool --compact (and, optionally, specifying the database to operate upon with --format option). Notice, that compacting a database needs roughly as much disk space on the partition where the database resides as is currently used by the database. Database compaction runs in three phases. First, the database is scanned and all non-expired records are stored in the memory. Secondly, a temporary database is created in the state directory and all the cached entries are flushed into it. This database is named after the PID of the running mfdbtool process. Finally, the temporary database is renamed to the source database.

Both --compact and --expire can be applied to all databases by combining them with --all. It is useful, for example, in crontab files. For example, I have the following monthly job in my crontab:

0 1 1 * * /usr/bin/mfdbtool --compact --all

3.16 Testing Filter Scripts

It is important to check your filter script before actually starting to use it. There are several ways to do so.

To test the syntax of your filter script, use the --lint option. It will cause mailfromd to exit immediately after attempting to compile the script file. If the compilation succeeds, the program will exit with code 0. Otherwise, it will exit with error code 78 (‘configuration error’). In the latter case, mailfromd will also print a diagnostic message, describing the error along with the exact location where the error was diagnosed, for example:

mailfromd: /etc/mailfromd.mfl:39: syntax error, unexpected reject

The error location is indicated by the name of the file and the number of the line when the error occurred. By using the --location-column option you instruct mailfromd to also print the column number. E.g. with this option the above error message may look like:

mailfromd: /etc/mailfromd.mfl:39.12 syntax error, unexpected reject

Here, ‘39’ is the line and ‘12’ is the column number.

For complex scripts you may wish to obtain a listing of variables used in the script. This can be achieved using --xref command line option:

The output it produces consists of four columns:

Variable name
Data type: Either number or string.
Offset in data segment: Measured in words.
References: A comma-separated list of locations where the variable was referenced. Each location is represented as file:line. If several locations pertain to the same file, the file name is listed only once.

Here is an example of the cross-reference output:

$ mailfromd --xref
Cross-references:
-----------------
cache_used               number 5   /etc/mailfromd.mfl:48
clamav_virus_name        string 9   /etc/mailfromd.mfl:240,240
db                       string 15  /etc/mailfromd.mfl:135,194,215
dns_record_ttl           number 16  /etc/mailfromd.mfl:136,172,173
ehlo_domain              string 11
gltime                   number 13  /etc/mailfromd.mfl:37,219,220,222,223
greylist_seconds_left    number 1   /etc/mailfromd.mfl:220,226,227
last_poll_host           string 2

If the script passes syntax check, the next step is often to test if it works as you expect it to. This is done with --test (-t) command line option. This option runs the envfrom handler (or another one, see below) and prints the result of its execution.

When running your script in test mode, you will need to supply the values of Sendmail macros it needs. You do this by placing the necessary assignments in the command line. For example, this is how to supply initial values for f and client_addr macros:

$ mailfromd --test f=gray@gnu.org client_addr=127.0.0.1

You may also need to alter initial values of some global variables your script uses. To do so, use -v (--variable) command line option. This option takes a single argument consisting of the variable name and its initial value, separated by an equals sign. For example, here is how to change the value of ehlo_domain global variable:

$ mailfromd -v ehlo_domain=mydomain.org

The --test option is often useful in conjunction with options --debug, --trace and --transcript (see Logging and Debugging. The following example shows what the author got while debugging the filter script described in Filter Script Example:

$ mailfromd --test --debug=50 f=gray@gnu.org client_addr=127.0.0.1
MX 20 mx20.gnu.org
MX 10 mx10.gnu.org
MX 10 mx10.gnu.org
MX 20 mx20.gnu.org
getting cache info for gray@gnu.org
found status: success (0), time: Thu Sep 14 14:54:41 2006
getting rate info for gray@gnu.org-127.0.0.1
found time: 1158245710, interval: 29, count: 5, rate: 0.172414
rate for gray@gnu.org-127.0.0.1 is 0.162162
updating gray@gnu.org-127.0.0.1 rates
SET REPLY 450 4.7.0 Mail sending rate exceeded.  Try again later
State envfrom: tempfail

If your script uses echo statements (see Echo), they will print their output on standard error. To direct them to the standard output, use the --echo option. You can also redirect the echo output to arbitrary file, by supplying its name as argument, as in: --echo=file. see echo option.

To test any handler, other than ‘envfrom’, give its name as the argument to --test option. Since this argument is optional, it is important that it be given immediately after the option, without any intervening white space, for example mailfromd --test=helo, or mailfromd -thelo.

This method allows to test one handler at a time. To test the script as a whole, use mtasim utility. When started it enters interactive mode, similar to that of sendmail -bs, where it expects SMTP commands on its standard input and sends answers to the standard output. The --port=auto command line option instructs it to start mailfromd and to create a unique socket for communication with it. For the detailed description of the program and the ways to use it, See mtasim.

3.17 Run Mode

Mailfromd provides a special option that allows to run arbitrary MFL scripts.

When given the --run command line option, mailfromd loads the script given in its command line, looks for the function called ‘main’, and runs it.

This function must be declared as:

func main(...) returns number

Mailfromd passes all command line arguments that follow the script name as arguments to that function. When the function returns, its return value is used by mailfromd as exit code.

As an example, suppose the file script.mfl contains the following:

func main (...)
  returns number
do
  loop for number i 1,
       while i <= $#,
       set i i + 1
  do
    echo "arg %i=" . $(i)
  done
done

This function prints all its arguments (See variadic functions, for a detailed description of functions with variable number of arguments). Now running:

$ mailfromd --run script.mfl 1 file dest

displays the following:

arg 1=1
arg 2=file
arg 3=dest

You can direct the script output to the standard output by using the --echo, as described above, e.g.:

$ mailfromd --echo --run script.mfl 1 file dest

Note, that MFL does not have a direct equivalent of shell’s $0 argument. If your function needs to know the name of the script that is being executed, use __file__ built-in constant instead (see __file__).

The name main is not hard-coded. You can use the --run option to run any function, provided that its definition is as discussed above. Just give the name of this function as the argument to the option. This argument is optional, therefore it must be separated from the option by an equals sign (with no whitespace from either side). For example, given the command line below, mailfromd will load the file script.mfl and execute the function ‘start’:

$ mailfromd --run=start script.mfl

If you need to define sendmail macros (see Sendmail Macros) for use in the run mode, place the macro=value assignments before the script name, e.g.:

$ mailfromd --run=start i=feedbeef client_addr=::1 script.mfl

To summarize, the command line when using the run mode is:

mailfromd [options] --run [macro=value] file args...

Finally, notice that file together with args... can be omitted. In this case the default script file will be used (see default script file).

The ‘macro=value’ assignments define Sendmail macros, args... are passed as arguments to the main function defined in file, and option stands for any other mailfromd options that might be needed.

3.17.1 The Top of a Script File

The --run option makes it possible to use mailfromd scripts as standalone programs. The traditional way to do so was to set the executable bit on the script file and to begin the script with the interpreter selector, i.e. the characters ‘#!’ followed by the name of the mailfromd executable, e.g.:

#! /usr/sbin/mailfromd --run

This would cause the shell to invoke mailfromd with the command line constructed from the --run option, the name of the invoked script file itself, and any actual arguments from the invocation. Once invoked, mailfromd would treat the initial ‘#!’ line as a usual single-line comment (see Comments).

However, the interpretation of the ‘#!’ by shells has various deficiencies, which depend on the actual shell being used. For example, some shells pass any characters following the whitespace after the interpreter name as a single argument, some others silently truncate the command line after some number of characters, etc. This often make it impossible to pass additional arguments to mailfromd. For example, a script which begins with the following line would most probably fail to be executed properly:

#! /usr/sbin/mailfromd --echo --run

To compensate for these deficiencies and to allow for more complex invocation sequences, mailfromd handles initial ‘#’ in a special way. If the first line of a source file begins with ‘#!/’ or ‘#! /’ (with a single space between ‘!’ and ‘/’), it is treated as a start of a multi-line comment, which is closed by the two characters ‘!#’ on a line by themselves.

Thus, the correct way to begin a mailfromd script is:

#! /usr/sbin/mailfromd --run
!#

Using this feature, you can start the mailfromd with arbitrary shell code, provided it ends with an exec statement invoking the interpreter itself. For example:

#!/bin/sh
exec /usr/sbin/mailfromd --echo --run $0 $@
!#

func main(...)
  returns number
do
  /* actual mfl code goes here */
done

Note the use of ‘$0’ and ‘$@’ to pass the actual script file name and command line arguments to mailfromd.

3.17.2 Parsing Command Line Arguments

A special function is provided to break (parse) the command line into options, and to check them for validity. It uses the GNU getopt routines (see getopt in The GNU C Library Reference Manual).

Built-in Function: string getopt (number argc, pointer argv, ...)

The getopt function parses the command line arguments, as supplied by argc and argv. The argc argument is the argument count, and argv is an opaque data structure, representing the array of arguments⁹. The operator vaptr (see vaptr) is provided to initialize this argument.

An argument that starts with ‘-’ (and is not exactly ‘-’ or ‘--’), is an option element. An argument that starts with a ‘-’ is called short or traditional option. The characters of this element, except for the initial ‘-’ are option characters. Each option character represents a separate option. An argument that starts with ‘--’ is called long or GNU option. The characters of this element, except for the initial ‘--’ form the option name.

Options may have arguments. The argument to a short option is supplied immediately after the option character, or as the next word in command line. E.g., if option -f takes a mandatory argument, then it may be given either as -farg or as -f arg. The argument to a long option is either given immediately after it and separated from the option name by an equals sign (as --file=arg), or is given as the next word in the command line (e.g. --file arg).

If the option argument is optional, i.e. it may not necessarily be given, then only the first form is allowed (i.e. either -farg or --file=arg.

The ‘--’ command line argument ends the option list. Any arguments following it are not considered options, even if they begin with a dash.

If getopt is called repeatedly, it returns successively each of the option characters from each of the option elements (for short options) and each option name (for long options). In this case, the actual arguments are supplied only to the first invocation. Subsequent calls must be given two nulls as arguments. Such invocation instructs getopt to use the values saved on the previous invocation.

When the function finds another option, it returns its character or name updating the external variable optind (see below) so that the next call to getopt can resume the scan with the following option.

When there are no more options left, or a ‘--’ argument is encountered, getopt returns an empty string. Then optind gives the index in argv of the first element that is not an option.

The legitimate options and their characteristics are supplied in additional arguments to getopt. Each such argument is a string consisting of two parts, separated by a vertical bar (‘|’). Any one of these parts is optional, but at least one of them must be present. The first part specifies short option character. If it is followed by a colon, this character takes mandatory argument. If it is followed by two colons, this character takes an optional argument. If only the first part is present, the ‘|’ separator may be omitted. Examples:

"c"
"c|": Short option -c.
"f:"
"f:|": Short option -f, taking a mandatory argument.
"f::"
"f::|": Short option -f, taking an optional argument.

If the vertical bar is present and is followed by any characters, these characters specify the name of a long option, synonymous to the short one, specified by the first part. Any mandatory or optional arguments to the short option remain mandatory or optional for the corresponding long option. Examples:

"f:|file": Short option -f, or long option --file, requiring an argument.
"f::|file": Short option -f, or long option --file, taking an optional argument.

In any of the above cases, if this option appears in the command line, getopt returns its short option character.

To define a long option without a short equivalent, begin it with a bar, e.g.:

"|help"

If this option is to take an argument, this is specified using the mechanism described above, except that the short option character is replaced with a minus sign. For example:

"-:|output": Long option --output, which takes a mandatory argument.
"-::|output": Long option --output, which takes an optional argument.

If an option is returned that has an argument in the command line, getopt stores this argument in the variable optarg.

After each invocation, getopt sets the variable optind to the index of the next argv element to be parsed. Thus, when the list of options is exhausted and the function returned an empty string, optind contains the index of the the first element that is not an option.

When getopt encounters an option that is not described in its arguments or if it detects a missing option argument it prints an error message using mailfromd logging facilities, stores the offending option in the variable optopt, and returns ‘?’.

If printing error message is not desired (e.g. the application is going to take care of error messaging), it can be disabled by setting the variable opterr to ‘0’.

The third argument to getopt, called controlling argument, may be used to control the behavior of the function. If it is a colon, it disables printing the error message for unrecognized options and missing option arguments (as setting opterr to ‘0’ does). In this case getopt returns ‘:’, instead of ‘?’ to indicate missing option argument.

If the controlling argument is a plus sign, or the environment variable POSIXLY_CORRECT is set, then option processing stops as soon as a non-option argument is encountered. By default, if options and non optional arguments are intermixed in argv, getopt permutes them so that the options go first, followed by non-optional arguments.

If the controlling argument is ‘-’, then each non-option element in argv is handled as if it were the argument of an option with character code 1 (‘"\001"’, in MFL notation. This can used by programs that are written to expect options and other argv-elements in any order and that care about the ordering of the two.

Any other value of the controlling argument is handled as an option definition.

A special language construct is provided to supply the second argument (argv) to getopt and similar functions:

vaptr(param)

where param is a positional parameter, from which to start the array of argv. For example:

func main(...)
  returns number
do
  set rc getopt($#, vaptr($1), "|help")
  ...

Here, vaptr($1) constructs the argv array from all the arguments, supplied to the function main.

To illustrate the use of getopt function, let’s suppose you write a script that takes the following options:

-f file
--file=file
--output[=dir]
--help

Then, the corresponding getopt invocation will be:

func main(...)
  returns number
do
  loop for string rc getopt($#, vaptr($1),
                            "f:|file", "-::|output", "h|help"),
       while rc != "",
       set rc getopt(0, 0)
  do
    switch rc
    do
      case "f":
        set file optarg
      case "output"
        set output 1
        set output_dir optarg
      case "h"
        help()
      default:
        return 1
    done
    ...

3.18 Examining Default Values

Sometimes you may need to check what are the default settings of the mailfromd binary and what values it uses actually. Both tasks are accomplished using the --show-defaults option. When used alone, it shows the settings actually in use (default values, eventually modified by your configuration settings). When used together with --no-config, it displays the compiled defaults.

The output of mailfromd --show-defaults looks like this:

version:                   9.0
script file:               /etc/mailfromd.mfl
preprocessor:              /usr/bin/m4 -s -DWITH_DKIM -DWITH_MFMOD
                           /var/mailfromd/9.0/include/pp-setup
user:                      mail
statedir:                  /var/lib/mailfromd
socket:                    mailfrom
pidfile:                   mailfromd.pid
default syslog:            blocking
include path:              /etc/mailfromd:/usr/share/mailfromd/include:
                           /usr/share/mailfromd/8.14.94/include
module path:               /usr/share/mailfromd:
                           /usr/share/mailfromd/9.0
mfmod path:                /usr/lib/mailfromd
optional features:         DKIM, mfmod, STARTTLS
supported database types:  gdbm, bdb
default database type:     bdb
greylist database:         /var/lib/mailfromd/greylist.db
greylist expiration:       86400
tbf database:              /var/lib/mailfromd/tbf.db
tbf expiration:            86400
rate database:             /var/lib/mailfromd/rates.db
rate expiration:           86400
cache database:            /var/lib/mailfromd/mailfromd.db
cache positive expiration: 604800
cache negative expiration: 86400

The above format, called human-readable, with two-column output and long lines split across several physical lines, is used if mailfromd is linked with GNU libmailutils library version 3.16 or later and its standard output is connected to a terminal. Otherwise, machine-readable output format is used, in which additional whitespace is elided, and long lines are retained verbatim. This makes it possible to easily extract default values using familiar text processing tools, e.g.:

$ mailfromd --show-defaults --no-config | grep '^script file:'
script file:/etc/mailfromd.mfl
$ mailfromd --show-defaults --no-config | sed -ne '/^script file:/s///p'
/etc/mailfromd.mfl

The following table describes each line of the output in detail:

version

Program version.

script file

The script file used by the program. It is empty if the script file is not found.

preprocessor

Preprocessor command line. See Preprocessor. This value can be changed in configuration: See conf-preprocessor.

user

System user mailfromd runs as. See conf-priv.

statedir

mailfromd local state directory. See statedir.

socket

The socket mailfromd listens on. If UNIX socket, the filename is shown. Unless it begins with ‘/’, it is relative to the local state directory. TCP sockets are shown in milter port specification.

See listen.

pidfile

PID file name (relative to local state directory, unless absolute).

See pidfile.

default syslog

Syslog implementation used: either ‘blocking’, or ‘non-blocking’.

See Using non-blocking syslog. See also Logging and Debugging.

include path

Include search path. See include search path.

It can be changed from the command line, using the -I option (see General Settings), and in configuration file, using the include-path statement (see include-path).

module path

Search path for MFL modules. see module search path.

It can be changed from the command line, using the -P (--module-path) option (see General Settings), and in configuration file, using the module-path statement (see module-path).

mfmod path

Search path for dynamically loaded modules. see mfmod-path.

optional features

Comma-delimited list of optional features, included to mailfromd at compile time. It can contain the following feature names:

Feature	Reference
DKIM	See DKIM.
GeoIP2	See Geolocation functions.
mfmod	See Dynamically Loaded Modules.
STARTTLS	See STARTTLS in call-out.

supported database types

Comma-delimited list of supported database types. See Databases. These types can be used as scheme prefixes in database names (see DBM scheme).

default database type

Type of the DBM used by default. See Databases.

greylist database

greylist expiration

File name and record expiration time of the greylisting database. See greylist database.

tbf database

tbf expiration

File name and record expiration time of the token-bucket filter rate-limiting database. See tbf database.

rate database

rate expiration

See rate database File name and record expiration time of the legacy rate-limiting database. See Rate limiting functions.

cache database

cache positive expiration

cache negative expiration

File name and record expiration times of the call-out cache database. See cache database.

The database settings can be changed using conf-database.

3.19 Logging and Debugging

Depending on its operation mode, mailfromd tries to guess whether it is appropriate to print its diagnostics and informational messages on standard error or to send them to syslog. Standard error is assumed if the program is run with one of the following command line options:

--test (see Testing Filter Scripts)
--run (see Run Mode)
--lint (see Testing Filter Scripts)
--dump-code (see Logging and Debugging Options)
--dump-grammar-trace (see Logging and Debugging Options)
--dump-lex-trace (see Logging and Debugging Options)
--dump-macros (see Logging and Debugging Options)
--dump-tree (see Logging and Debugging Options)
--xref (or --dump-xref) (see Testing Filter Scripts)

If none of these are used, mailfromd switches to syslog as soon as it finishes its startup. There are two ways to communicate with the syslogd daemon: using the syslog function from the system libc library, which is a blocking implementation in most cases, or via internal, asynchronous, syslog implementation. Whether the latter is compiled in and which implementation is used by default is determined when compiling the package, as described in Using non-blocking syslog.

The --logger command line option allows you to manually select the diagnostic channel:

--logger=stderr: Log everything to the standard error.
--logger=syslog: Log to syslog.
--logger=syslog:async: Log to syslog using the asynchronous syslog implementation.

Another way to select the diagnostic channel is by using the logger statement in the configuration file. The statement takes the same argument as its command line counterpart.

The rest of details regarding diagnostic output are controlled by the logging configuration statement.

The default syslog facility is ‘mail’; it can be changed using the --log-facility command line option or facility statement. Argument in both cases is a valid facility name, i.e. one of: ‘user’, ‘daemon’, ‘auth’, ‘authpriv’, ‘mail’, and ‘local0’ through ‘local7’. The argument can be given in upper, lower or mixed cases, and it can be prefixed with ‘log_’:

Another syslog-related parameter that can be configured is the tag, which identifies mailfromd messages. The default tag is the program name. It is changed by the --log-tag (-L command line option and the tag logging statement.

The following example configures both the syslog facility and tag:

logging {
  facility local7;
  tag "mfd";
}

As any other UNIX utility, mailfromd is very quiet unless it has something important to communicate, such as, e.g. an error condition. A set of command line options is provided for controlling the verbosity of its output.

The --trace option enables tracing Sendmail actions executed during message verifications. When this option is given, any accept, discard, continue, etc. triggered during execution of your filter program will leave their traces in the log file. Here is an example of how it looks like (syslog time stamp, tag and PID removed for readability):

k8DHxvO9030656: /etc/mailfromd.mfl:45: reject 550 5.1.1 Sender validity
not confirmed

This shows that while verifying the message with ID ‘k8DHxvO9030656’ the reject action was executed by filter script /etc/mailfromd.mfl at line 45.

The use of message ID in the log deserves a special notice. The program will always identify its log messages with the ‘Message-Id’, when it is available. Your responsibility as an administrator is to make sure it is available by configuring your MTA to export the macro ‘i’ to mailfromd. The rule of thumb is: make ‘i’ available to the very first handler mailfromd executes. It is not necessary to export it to the rest of the handlers, since mailfromd will cache it. For example, if your filter script contains ‘envfrom’ and ‘envrcpt’ handlers, export ‘i’ for ‘envfrom’. The exact instructions on how to ensure it depend on the MTA you use. For ‘Sendmail’, refer to Sendmail. For MeTA1, see MeTA1, and pmult-macros. For ‘Postfix’, see Postfix.

To push log verbosity further, use the debug configuration statement (see conf-debug) or its command line equivalent, --debug (-d, see --debug). Its argument is a debugging level, whose syntax is described in http://mailutils.org/wiki/Debug_level.

The debugging output is controlled by a set of levels, each of which can be set independently of others. Each debug level consists of a category name, which identifies the part of package for which additional debugging is desired, and a level number, which indicates how verbose should its output be.

Valid debug levels are:

error: Displays error conditions which are normally not reported, but passed to the caller layers for handling.
trace0 through trace9: Ten levels of verbosity, trace0 producing less output, trace9 producing the maximum amount of output.
prot: Displays network protocol interaction, where applicable.

The overall debugging level is specified as a list of individual levels, delimited with semicolons. Each individual level can be specified as one of:

!category: Disables all levels for the specified category.
category: Enables all levels for the specified category.
category.level: For this category, enables all levels from ‘error’ to level, inclusive.
category.=level: Enables only the given level in this category.
category.!level: Disables all levels from ‘error’ to level, inclusive, in this category.
category.!=level: Disables only the given level in this category.
category.levelA-levelB: Enables all levels in the range from levelA to levelB, inclusive.
category.!levelA-levelB: Disables all levels in the range from levelA to levelB, inclusive.

Additionally, a comma-separated list of level specifications is allowed after the dot. For example, the following specification:

acl.prot,!=trace9,!trace2

enables in category acl all levels, except trace9, trace0, trace1, and trace2.

Implementation and applicability of each level of debugging differs between various categories. Categories built-in to mailutils are described in http://mailutils.org/wiki/Debug_level. Mailfromd introduces the following additional categories:

db

trace0: Detailed debugging info about expiration and compaction.
trace5: List records being removed.

dns

trace8: Verbose information about attempted DNS queries and their results.
trace9: Enables ‘libadns’ internal debugging.

srvman

trace0: Additional information about normal conditions, such as subprocess exiting successfully or a remote party being allowed access by ACL.
trace1: Detailed transcript of server manager actions: startup, shutdown, subprocess cleanups, etc.
trace3: Additional info about fd sets.
trace4: Individual subserver status information.
trace5: Subprocess registration.

pmult

trace1: Verbosely list incoming connections, functions being executed and erroneous conditions: missing headers in SMFIR_CHGHEADER, undefined macros, etc.
trace2: List milter requests being processed.
trace7: List SMTP body content in SMFIR_REPLBODY requests.
error: Verbosely list mild errors encountered: bad recipient addresses, etc.

callout

trace0: Verification session transcript.
trace1: MX servers checks.
trace5: List emails being checked.
trace9: Additional info.

main

trace5: Info about hostnames in relayed domain list

engine

Debugging of the virtual engine.

trace5: Message modification lists.
trace6: Debug message modification operations and Sendmail macros registered.
trace7: List SMTP stages (‘xxfi_*’ calls).
trace9: Cleanup calls.

pp

Preprocessor.

trace1: Show command line of the preprocessor being run.

prog

trace8: Stack operations
trace9: Debug exception state save/restore operations.

spf

error: Mild errors.
trace0: List calls to ‘spf_eval_record’, ‘spf_test_record’, ‘spf_check_host_internal’, etc.
trace1: General debug info.
trace6: Explicitly list A records obtained when processing the ‘a’ SPF mechanism.

Categories starting with ‘bi_’ debug built-in modules:

bi_db

Database functions.

trace5: List database look-ups.
trace6: Trace operations on the greylisting database.

bi_sa

SpamAssassin and ClamAV API.

trace1: Report the findings of the ‘clamav’ function.
trace9: Trace payload in interactions with ‘spamd’.

bi_io

I/O functions.

trace1: Debug the following functions: open, spawn, write.
trace2: Report stderr redirection.
trace3: Report external commands being run.

bi_mbox

Mailbox functions.

trace1: Report opened mailboxes.

bi_other

Other built-ins.

trace1: Report results of checks for existence of usernames.

For example, the following invocation enables levels up to ‘trace2’ in category ‘engine’, all levels in category ‘savsrv’ and levels up to ‘trace0’ in category ‘srvman’:

$ mailfromd --debug='engine.trace2;savsrv;srvman.trace0'

You need to have sufficient knowledge about mailfromd internal structure to use this form of the --debug option.

To control the execution of the sender verification functions (see SMTP Callout functions), you may use --transcript (-X) command line option which enables transcripts of SMTP sessions in the logs. Here is an example of the output produced running mailfromd --transcript:

k8DHxlCa001774: RECV: 220 spf-jail1.us4.outblaze.com ESMTP Postfix
k8DHxlCa001774: SEND: HELO mail.gnu.org.ua
k8DHxlCa001774: RECV: 250 spf-jail1.us4.outblaze.com
k8DHxlCa001774: SEND: MAIL FROM: <>
k8DHxlCa001774: RECV: 250 Ok
k8DHxlCa001774: SEND: RCPT TO: <t1Kmx17Q@malaysia.net>
k8DHxlCa001774: RECV: 550 <>: No thank you rejected: Account
 Unavailable: Possible Forgery
k8DHxlCa001774: poll exited with status: not_found; sent
 "RCPT TO: <t1Kmx17Q@malaysia.net>", got "550 <>: No thank you
 rejected: Account Unavailable: Possible Forgery"
k8DHxlCa001774: SEND: QUIT

3.20 Runtime Errors

A runtime error is a special condition encountered during execution of the filter program, that makes further execution of the program impossible. There are two kinds of runtime errors: fatal errors, and uncaught exceptions. Whenever a runtime error occurs, mailfromd writes into the log file the following message:

RUNTIME ERROR near file:line: text

where file:line indicates approximate source file location where the error occurred and text gives the textual description of the error.

Fatal runtime errors

Fatal runtime errors are caused by a condition that is impossible to fix at run time. For version 9.0 these are:

Not enough memory: There is not enough memory for the execution of the program. Try to make more memory available for mailfromd or to reduce its memory requirements by rewriting your filter script.
Out of stack space; increase #pragma stacksize
Heap overrun; increase #pragma stacksize
memory chunk too big to fit into heap: These errors are reported when there is not enough space left on stack to perform the requested operation, and the attempt to resize the stack has failed. Usually mailfromd expands the stack when the need arises (see automatic stack resizing). This runtime error indicates that there were no more memory available for stack expansion. Try to make more memory available for mailfromd or to reduce its memory requirements by rewriting your filter script.
Stack underflow: Program attempted to pop a value off the stack but the stack was already empty. This indicates an internal error in the MFL compiler or mailfromd runtime engine. If you ever encounter this error, please report it to bug-mailfromd@gnu.org.ua. Include the log fragment (about 10-15 lines before and after this log message) and your filter script. See Reporting Bugs, for more information about bug reporting.
pc out of range: The program counter is out of allowed range. This is a severe error, indicating an internal inconsistency in mailfromd runtime engine. If you encounter it, please report it to bug-mailfromd@gnu.org.ua. Include the log fragment (about 10-15 lines before and after this log message) and your filter script. See Reporting Bugs, for more information about how to report a bug.

Programmatic runtime errors

These indicate a programmatic error in your filter script, which the MFL compiler was unable to discover at compilation stage:

Invalid exception number: n

The throw statement used a not existent exception number n. Fix the statement and restart mailfromd. See throw, for the information about throw statement and see Exceptions, for the list of available exception codes.

No previous regular expression

You have used a back-reference (see Back references), where there is no previous regular expression to refer to. Fix this line in your code and restart the program.

Invalid back-reference number

You have used a back-reference (see Back references), with a number greater than the number of available groups in the previous regular expression. For example:

  if $f matches "(.*)@gnu.org"
    # Wrong: there is only one group in the regexp above!
    set x \2
  …

Fix your code and restart the daemon.

Uncaught exceptions

Another kind of runtime errors are uncaught exceptions, i.e. exceptional conditions for which no handler was installed (See Exceptions, for information on exceptions and on how to handle them). These errors mean that the programmer (i.e. you), made no provision for some specific condition. For example, consider the following code:

prog envfrom
do
  if $f mx matches "yahoo.com"
    foo()
  fi
done

It is syntactically correct, but it overlooks the fact that mx matches may generate e_temp_failure exception, if the underlying DNS query has timed out (see Special comparisons). If this happens, mailfromd has no instructions on what to do next and reports an error. This can easily be fixed using a try/catch (see Catch and Throw) statement, e.g.:

prog envfrom
do
  try
  do
    if $f mx matches "yahoo.com"
      foo()
    fi
  done
  # Catch DNS errors
  catch e_temp_failure or e_failure
  do
    tempfail 451 4.1.1 "MX verification failed"
  done
done

Another common case are undefined Sendmail macros. In this case the e_macroundef exception is generated:

RUNTIME ERROR near foo.c:34: Macro not defined: {client_adr}

These can be caused either by misspelling the macro name (as in the example message above) or by failing to export the required name in Sendmail milter configuration (see exporting macros). This error should be fixed either in your source code or in sendmail.cf file, but if you wish to provide a special handling for it, you can use the following catch statement:

catch e_macroundef
do
  …
done

Sometimes the location indicated with the runtime error message is not enough to trace the origin of the error. For example, an error can be generated explicitly with throw statement (see throw):

RUNTIME ERROR near match_cidr.mfl:30: invalid CIDR (text)

If you look in module match_cidr.mfl, you will see the following code (line numbers added for reference):

23 func match_cidr(string ipstr, string cidr)
24   returns number
25 do
26   number netmask
27
28   if cidr matches '^(([0-9]{1,3}\.){3}[0-9]{1,3})/([0-9][0-9]?)'
29     return inet_aton(ipstr) & len_to_netmask(\3) = inet_aton(\1)
30   else
31     throw invcidr "invalid CIDR (%cidr)"
32   fi
33   return 0
34 done

Now, it is obvious that the value of cidr argument to match_cidr was wrong, but how to find the caller that passed the wrong value to it? The special command line option --stack-trace is provided for this. This option enables dumping stack traces when a fatal error occurs. Traces contain information about function calls. Continuing our example, using the --stack-trace option you will see the following diagnostics:

RUNTIME ERROR near match_cidr.mfl:30: invalid CIDR (127%)
mailfromd: Stack trace:
mailfromd: 0077: match_cidr.mfl:31: match_cidr
mailfromd: 0096: test.mfl:13: bar
mailfromd: 0110: mailfromd.mfl:18: foo
mailfromd: Stack trace finishes
mailfromd: Execution of the configuration program was not finished

Each trace line describes one stack frame. The lines appear in the order of most recently called to least recently called. Each frame consists of:

Value of the program counter at the time of its execution;
Source code location, if available;
Name of the function called.

Thus, the example above can be read as: “the function match_cidr was called by the function bar in file test.mfl at line 13. This function was called from the function bar, in file test.mfl at line 13. In its turn, bar was called by the function foo, in file mailfromd.mfl at line 18”.

Examining caller functions will help you localize the source of the error and fix it.

You can also request a stack trace any place in your code, by calling the stack_trace function. This can be useful for debugging.

3.21 Notes and Cautions

This section discusses some potential culprits in the MFL.

It is important to execute special caution when writing format strings for sprintf (see String formatting) and strftime (see strftime) functions. They use ‘%’ as a character introducing conversion specifiers, while the same character is used to expand a MFL variable within a string. To prevent this misinterpretation, always enclose format specification in single quotes (see singe-vs-double). To illustrate this, let’s consider the following example:

echo sprintf ("Mail from %s", $f)

If a variable s is not declared, this line will produce the ‘Variable s is not defined’ error message, which will allow you to identify and fix the bug. The situation is considerably worse if s is declared. In that case you will see no warning message, as the statement is perfectly valid, but at the run-time the variable s will be interpreted within the format string, and its value will replace %s. To prevent this from happening, single quotes must be used:

echo sprintf ('Mail from %s', $f)

This does not limit the functionality, since there is no need to fall back to variable interpretation in format strings.

Yet another dangerous feature of the language is the way to refer to variable and constant names within literal strings. To expand a variable or a constant the same notation is used (See Variables, and see Constants). Now, lets consider the following code:

const x 2
string x "X"

prog envfrom
do
  echo "X is %x"
done

Does %x in echo refers to the variable or to the constant? The correct answer is ‘to the variable’. When executed, this code will print ‘X is X’.

As of version 9.0, mailfromd will always print a diagnostic message whenever it stumbles upon a variable having the same name as a previously defined constant or vice versa. The resolution of such name clashes is described in detail in See variable--constant shadowing.

Future versions of the program may provide a non-ambiguous way of referring to variables and constants from literal strings.

• Start Up
• Simplest Configurations
• Conditional Execution
• Functions and Modules
• Domain Name System
• Checking Sender Address
• SMTP Timeouts
• Avoiding Verification Loops
• HELO Domain
• rset
• Controlling Number of Recipients
• Sending Rate
• Greylisting
• Local Account Verification
• Databases
• Testing Filter Scripts
• Run Mode
• Examining Defaults
• Logging and Debugging
• Runtime errors
• Notes

• top-block		The Top of a Script File.
• getopt		Parsing Command Line Arguments.