Next: Email processing functions, Previous: I/O functions, Up: Library [Contents][Index]
This section describes functions that transform data using
Mailutils filter pipes. Filter pipe is a string defining data
flow between several filters. Each filter takes input,
transforms it according to certain rules and produces the transformed
data on its output. As in shell, multiple filters are connected
using pipe characters (‘|’). For example, the crlf
filter
inserts a carriage return character before each newline character. A
filter doing that kind of transformation is defined as:
"crlf"
Another filter, base64
, converts its input to a BASE64 encoded
string. To transform each newline into carriage return + newline pair
and encode the resulting stream in BASE64, one would write:
"crlf | base64"
Some filters take one or more arguments. These are specified as
a comma-delimited list in parentheses after the filter name. For
example, the linelen
filter limits the length of each output
line to the given number of octets. The following filter pipe will
limit the length of base64 lines in the filter above to 62 octets:
"crlf | base64 | linelen(62)"
Many filters operate in two modes: encode and decode. By
default all MFL functions apply filters in encode mode. The desired
mode can be stated explicitly in the filter string by using
encode()
and decode()
functions. They take a filter
pipe line as their argument. For example, the following will decode
the stream produced by the example filter above:
"decode(base64 | crlf)"
See Filters, for a discussion of available filters and their arguments.
Transforms the string input using filters in filter_pipe and returns the result. Example:
set input "test\ninput\n" filter_string(input, "crlf|base64") ⇒ "dGVzdA0KaW5wdXQNCg=="
Given two I/O descriptors, reads data from src_fd, transforms it using filter_pipe and writes the result to descriptor dst_fd.
Both descriptors must be obtained using functions described in I/O functions.
• Filters and Filter Pipes |
Up: Filtering functions [Contents][Index]
A filter pipe is a string consisting of filter invocations
delimited by pipe characters (‘|’). Each invocation
is a filter name optionally followed by a comma-separated list of
parameters. Most filters can operate in two modes: encode and
decode. Unless specified otherwise, filters are invoked in
encode mode. To change the mode, the encode
and decode
meta-filters are provided. Argments to these filters are filter pipes
that will be executed in the corresponding mode.
The following Mailutils filters are available:
In encode mode, converts its input into 7-bit ASCII, by clearing the 8th bit on each processed byte.
In decode mode, it operates exactly as the 8bit filter, i.e. copies its input to the output verbatim.
The filter takes no arguments.
Copies its input to output verbatim.
Encodes or decodes the input using the base64
encoding.
The only difference between BASE64
and B
is that, in
encode mode, the former limits each ouput line length to 76 octets,
whereas the latter produces a contiguous stream of base64 data.
In decode mode, both filters operate exactly the same way.
A convenience interface to the iconv
filter, available for use
only in the message_body_to_stream
function. It decodes the
part of a MIME message from its original character set, which is
determined from the value of the Content-Type
header, to the
destination character set cset. Optional fallback
parameter specifies the representation fallback to be used for octets
that cannot be converted between the charater sets. Its use is
described in See iconv.
This filter is normally takes its input from the mimedecode
filter, as in:
message_body_to_stream(fd, msg, 'mimedecode|charset(utf-8)')
See mimedecode, for a detailed discussion.
Converts line separators from LF (ASCII 10) to CRLF (ASCII 13 10) and vice-versa.
In decode mode, translates each CRLF to LF. Takes no arguments.
In encode mode, translates each LF to CRLF. If an optional argument ‘-n’ is given, produces a normalized output, by preserving each input CRLF sequence untouched (otherwise such sequences will be are translated to CR CR LF).
In encode mode, replaces each LF (‘\n’ or ASCII 10) character with CRLF (‘\r\n’, ASCII 13 10), and byte-stuffs the output by producing an additional ‘.’ in front of any ‘.’ appearing at the beginning of a line in input. Upon end of input, it outputs additional ‘.\r\n’, if the last output character was ‘\n’, or ‘\r\n.\r\n’ otherwise.
If supplied the ‘-n’ argument, it preserves each CRLF input
sequence untranslated (see the CRLF
above).
In decode mode, the reverse is performed: each CRLF is replaced with a single LF byte, and additional dots are removed from beginning of lines. A single dot on a line by itself marks the end of the stream and causes the filter to return EOF.
In encode mode, byte-stuffs the input by outputting an additional dot (‘.’) in front of any dot appearing at the beginning of a line. Upon encountering end of input, it outputs additional ‘.\n’.
In decode mode, the reverse is performed: additional dots are removed from beginning of lines. A single dot on a line by itself (i.e. the sequence ‘\n.\n’) marks the end of the stream and causes the filter to return EOF.
This filter doesn’t take arguments.
Performs a traditional UNIX processing of lines starting with a ‘From’ followed by a space character.
In encode mode, each ‘From ’ at the beginning of a line is replaced by ‘>From ’.
In decode mode, the reverse operation is performed: initial greater-then sign (‘>’) is removed from any line starting with ‘>From ’.
The filter takes no arguments.
MBOXRD-compatible processing of envelope lines.
In encode mode, each ‘From ’ optionally preceded by any number of contiguous ‘>’ characters and appearing at the beginning of a line is prefixed by another ‘>’ character on output.
In decode mode, the reverse operation is performed: initial greater-then sign (‘>’) is removed from any line starting with one or more ‘>’ characters followed by ‘From ’.
This filter treats its input as a RFC-2822 email message. It extracts its header part (i.e. everything up to the first empty line) and copies it to the output. The body of the message is ignored.
The filter operates only in decode mode and takes no arguments.
Converts input from character set src to dst. The filter works the same way in both decode and encode modes.
It takes two mandatory arguments: the names of the input (src) and output (dst) charset. Optional third argument specifies what to do when an illegal character sequence is encountered in the input stream. Its possible values are:
Raise a e_ilseq
exception.
Copy the offending octet to the output verbatim and continue conversion from the next octet.
Print the offending octet to the output using the C octal conversion and continue conversion from the next octet.
The default is copy-octal
.
The following example creates a iconv
filter for converting from
iso-8859-2
to utf-8
, raising the e_ilseq
exception on the first conversion error:
iconv(iso-8859-2, utf-8, none)
In decode mode, the filter removes from the input all lines beginning with a given inline comment sequence str. The default comment sequence is ‘;’ (a semicolon).
The following options modify the default behavior:
Emit line number information after each contiguous sequence of removed lines. The argument str supplies an information starter – a sequence of characters which is output before the actual line number.
Remove empty lines, i.e. the lines that contain only whitespace characters.
Squeeze whitespace. Each sequence of two or more whitespace characters encountered on input is replaced by a single space character on output.
A whitespace-must-follow mode. A comment sequence is recognized only if followed by a whitespace character. The character itself is retained on output.
In encode mode the inline-comment
filter adds a comment-starter
sequence at the beginning of each line. The default comment-starter
is ‘;’ and can be changed by specifying the desired comment
starter as the first argument.
The only option supported in this mode is -S, which enables the whitespace-must-follow mode, in which a single space character (ASCII 20) is output after each comment sequence.
Implements a familiar UNIX line-continuation facility. The filter removes from itsinput stream any newline character immediately preceded by a backslash. This filter operates only in decode mode.
If given the arguments (‘-i’, str), enables the line number information facility. This facility emits current input line number (prefixed with str) after each contiguous sequence of one or more removed newline characters. It is useful for implementing parsers which are normally supposed to identify eventual erroneous lines with their input line numbers.
Limits the length of each output line to a certain number of octets. It operates in encode mode only and requires a single parameter: the desired output length in octets. This filter makes no attempt to analyze the lexical structure of the input: the newline caracters are inserted when the length of the output line reaches a predefined maximum. Any newline characters present in the input are taken into account when computing the input line length.
This is a domain-specific filter available for use only with the
message_body_to_stream
function. It decodes the part of
a MIME message from whatever encoding that was used to store it
in the message to a stream of bytes. See mimedecode.
Encodes or decodes the input using the quoted-printable encoding.
In encode mode, the xml
filter converts input stream (which must
contain valid UTF-8 characters) into a form suitable for inclusion into
a XML or HTML document, i.e. it replaces ‘<’, ‘>’, and
‘&’ with ‘<’, ‘>’, and ‘&’,
correspondingly, and replaces invalid characters with their numeric
character reference representation.
In decode mode, a reverse operation is performed.
The filter does not take arguments.
Up: Filtering functions [Contents][Index]