Tutorial (Cfpeek User Manual)

In the examples, ‘$’ represents a typical shell prompt. It precedes lines you should type. Both command line and lines which represent the program output are shown in ‘this font’.

In examples, the ⇒ symbol indicates the value of a variable or result of a function invocation, as in:

3.1 Basic Notions

A structured configuration file contains entities of two basic types. First of them is simple statement. A simple statement conceptually consists of an identifier (or keyword) and a value. Depending on the syntactic requirements, some special token may be required between them (such as an equals sign, for example), or at the end of the statement. The value, though we use the term in singular, is not necessarily a single scalar value, it may as well be a list of values (the exact form of that list depends on the particular syntax of the configuration file).

Another basic entity is compound statement, also known as block statement or section. Compound statement is used for logical grouping of other statements. It consists of identifier, an optional tag and a list of statements. The tag, if present, is similar to the value in simple statements. The same notes that we made about values apply to tags as well. Tags serve to discern between the statements having the same identifier. The list of statements may include statements of both kinds: simple as well as compound ones. Thus, compound statements form a tree-like structure of arbitrary depth, with simple statements as leaf nodes.

Each compound statement can have any number of subordinate statements, which are called its child statements. Each statement (no matter simple or compound) has only one parent statement, i.e. a compound statement of which it is a child.

A special implicit statement, called root statement, serves as the parent for the statements at the topmost level of hierarchy.

3.2 Pathnames

Given this hierarchical structure, each statement can be identified by the list of keywords and values (when present) of all compound statements that must be traversed in order to reach that statement. Such a list, written according to a set of conventions, is called a full pathname of the statement. The conventions are:

A pathname which begins with a component separator (‘.’) is called absolute pathname and identifies the statement with relation to the topmost level of hierarchy.

A pathname beginning with an identifier is called relative and identifies the statement in relation to the statement represented by that identifier.

3.3 Example Configuration

The following configuration file will assist us in further discussion. Its syntax is fairly straightforward:

A simple statement is written as identifier followed value. The two parts are separated by any amount of whitespace. Simple statements are terminated by semicolon.

A compound statement is written as identifier followed by a list of subordinate statements in curly braces. A tag (if present) is put between the identifier and the opening curly brace.

These syntax conventions roughly correspond to the Grecs configuration format, which cfpeek assumes by default (see grecs).

user smith;
group mail;
pidfile "/var/run/example";

logging {
    facility daemon;
    tag example;
}

program a {
    command "a.out";
    logging {
        facility local0;
        tag a;
    }
}

program b {
    command "b.out";
    wait yes;
    pidfile /var/run/b.pid;
}

Example 3.1: Sample configuration file

3.4 Listing the Entire File

The only argument cfpeek requires is the name of the file to parse. If no other arguments are given, it produces on the standard output a listing of that file in pathname-value form. Each simple statement in the input file is represented by a single line in the output listing. The line consists of two main parts: the full pathname of that statement and its value. The two parts are separated by a colon and space character. For example:

$ cfpeek sample.conf
.user: smith
.group: mail
.pidfile: /var/run/example
.logging.facility: daemon
.logging.tag: example
.program="a".command: a.out
.program="a".logging.facility: local0
.program="a".logging.tag: a
.program="b".command: b.out
.program="b".wait: yes
.program="b".pidfile: /var/run/b.pid

This output can be customized via the --format (-H) command line option. This option takes a list of output flags, each of which modifies some aspect of the output. Most output flags are boolean, i.e. they enable or disable the given feature. To disable the feature, the flag must be prefixed with ‘no’.

The flags ‘path’ and ‘value’ mean to print the pathname of the statement and its value. The ‘descend’ flag affects the output of compound nodes. If this flag is set and a node matching the key is a compound node, cfpeek will output this node and all nodes below it (i.e. its descendant nodes). The ‘descend’ flag is meaningful only if at least one lookup key is supplied.

You can also use --format to change the default component delimiter. For example, to use slash to delimit components:

$ cfpeek --format=delim=/ sample.conf
/user: smith
/group: mail
/pidfile: /var/run/example
/logging/facility: daemon
/logging/tag: example
/program="a"/command: a.out
/program="a"/logging/facility: local0
/program="a"/logging/tag: a
/program="b"/command: b.out
/program="b"/wait: yes
/program="b"/pidfile: /var/run/b.pid

3.5 Statement Lookups

When given more than one argument, cfpeek treats the rest of arguments as search keys. It then searches for statements with pathnames matching each of the keys and outputs them. A key can be either a pathname, or a pattern.

The following command looks for the ‘pidfile’ statement at the topmost level of hierarchy and prints it:

As you see, it uses the same output format as with full listings. If you wish to change it, use the --format option, introduced in the previous section. For example, to retrieve only the value:

This approach is quite common when cfpeek is used in shell scripts. It will be illustrated in more detail below.

If a key is not found, cfpeek prints a message on the standard error and starts searching for the next key (if any). When all keys are exhausted, the program exits with status 1 to indicate that some of them have not been found. To suppress the diagnostics output, use the --quiet (-q) option.

To illustrate all this, the following example shows how to use cfpeek in a start-up script to check whether a program has already been started and to bring it down, if requested:

#! /bin/sh
pidfile=`cfpeek -q --format=value sample.conf .pidfile`

if test -f $pidfile; then
  pid=`head -1 $pidfile`
else
  pid=
fi

case $1 in
start)  if test -n "$pid"; then
          echo >&2 "the program is already running"
        else
          # start the program
          sample-start
        fi
        ;;
status) if test -n "$pid"; then
          echo "program is running at pid $pid"
        else
          echo "program is not running"
        fi
        ;;
stop)   test -n "$pid" && kill -TERM $pid
        ;;
esac

3.6 Pattern Lookups

Apart from literal pathname, a pathname pattern is allowed as a key. A pattern can contain wildcards in place of path components. Two wildcards are defined: ‘*’ and ‘%’. A ‘%’ matches any single keyword:

Pattern lookups can be disabled using the --literal (-L) command line option. There may be two reasons for doing so. First, literal lookups are somewhat faster, so if you don’t need pattern matching using --literal can save you a couple of CPU cycles. Secondly, if any of your identifiers contain ‘*’ or ‘%’ characters, you will have to use --literal to prevent them from being treated as wildcards.

3.7 Using Various Parsers

Cfpeek can handle input files in various formats. The default one is ‘Grecs’ format, introduced in previous sections. To process input files of another format, specify the parser to use via the --parser (-p) command line option. The argument to this option is one of: ‘grecs’, ‘bind’, ‘path’, ‘meta1’ or ‘git’. See Formats, for a detailed description of each of these formats.

3.8 Specifying Nodes to Output

Sometimes you may need to see not the node which matched the search key, but its parent or other ancestor node. Consider, for example, the following task: select from the /etc/named.conf file the names of all zones for which this nameserver is a master. To do so, you will need to find all ‘zone.type’ statements with the value ‘master’, ascend to the parent node and print its value.

Cfpeek provides several special formatting flags to that effect: up, down, parent, child and sibling. They are called relative movement flags, because they select another node in the tree, relative to the position of the current node.

The up flag takes an integer number as its argument. It instructs cfpeek to ascend that many parent nodes before actually printing the node. For example, --format=up=1 means “ascend to the parent of the matched node and print it”. This is exactly what we need to solve the above task, since the ‘type’ statement is a child of a ‘zone’ statement. Thus, the solution is:

The value flag indicates that we want on output only values, without the corresponding pathnames. The nodescend flag tells cfpeek to not descend into compound statements when outputting them. It is necessary since we want only values of all relevant ‘zone’ statements, no their subordinate statements.

A counterpart of this flag is down=n flag, which descends n levels of hierarchy.

The parent flag acts in the similar manner, but it identifies the ancestor by its keyword, instead of the relative nesting level. The statement

tells cfpeek, after finding a matching node, to ascend until a node with the identifier ‘zone’ is found, and then print this node.

The child=id statement does the opposite of parent: it locates a child of the current node which has the identifier id.

Similarly, the sibling keyword instructs cfpeek to find first sibling of the current node wich has the given identifier. For example, to find names of the zone files for all master nodes in the named.conf file:

A ‘file’ statement is located on the same nesting level as ‘type’, for example:

Thus, the above command first locates the ‘type’ statement, then searches on the same nesting level for a ‘file’ statement, and finally prints its value.

3.9 Using Scripts

Cfpeek offers a scripting facility, which can be used to easily extend its functionality beyond the basic operations, described in previous chapters. Scripts must be written in Scheme, using ‘Guile’, the GNU’s Ubiquitous Intelligent Language for Extensions. For information about the language, refer to Revised(5) Report on the Algorithmic Language Scheme. For a detailed description of Guile and its features, see Overview in The Guile Reference Manual.

This section assumes that the reader has sufficient knowledge about this programming language.

The scripting facility is enabled by the use of the --expression (-e) of --file (-f command line options. The --expression (-e) option takes as its argument a Scheme expression, which will be executed for each statement matching the supplied keys (or for each statement in the tree, if no keys were supplied). The expression can obtain information about the statement from the global variable node, which represents a node in the parse tree describing this statement. The node contains complete information about the statement, including its location in the source file, its type and neighbor nodes, etc. A number of functions is provided to retrieve that information from the node. These functions are discussed in detail in Scripting.

Let’s start from the simplest example. The following command prints all nodes in the file:

$ cfpeek --expression='(display node)(newline)' sample.conf
#<node .user: "smith">
#<node .group: "mail">
#<node .pidfile: "/var/run/example">
#<node .logging.facility: "daemon">
#<node .logging.tag: "example">
#<node .program="a".command: "a.out">
#<node .program="a".logging.facility: "local0">
#<node .program="a".logging.tag: "a">
#<node .program="b".command: "b.out">
#<node .program="b".wait: "yes">
#<node .program="b".pidfile: "/var/run/b.pid">

The format shown in this example is the default Scheme representation for nodes. You can use accessor functions to format the output to your liking. For instance, the function ‘grecs-node-locus’ returns the location of the node in the input file. The returned value is a cons, with the file name as its car and the line number as its cdr. Thus, you can print statement locations with the following command:

Complex expressions are cumbersome to type in the command line, therefore the --file (-f) option is provided. This option takes the name of the script file as its argument. This file must define the function named cfpeek which takes a node as its argument. The script file is then loaded and the cfpeek function is called for each matching node.

Now, if we put the expression used in the previous example in a script file (e.g. locus.scm):

(define (cfpeek node)
  (let ((loc grecs-node-locus))
    (format #t "~A:~A~%" (car loc) (cdr loc))))

When both --file and --expression options are used in the same invocation, the cfpeek function is not invoked by default. In fact, it even does not need to be defined. When used this way, cfpeek first loads the requested script file, and then applies the expression to each matching node, the same way it always does when --expression is supplied. It is the responsibility of the expression itself to call any function or functions defined in the file. This way of invoking ‘cfpeek’ is useful for supplying additional parameters to the script. For example:

It is supposed that the function process-node is defined somewhere in script.scm and takes two arguments: a node and a boolean.

The --init=expr (-i expr) option provides an initialization expression expr. This expression is evaluated once, after loading the script file, if one is specified, and before starting the main loop.

Similarly, the option --done=expr (-d expr) introduces a Scheme expression to be evaluated at the end of the run, after all nodes have been processed.

3.9.1 Example: Converter to GIT Configuration Format

Here is a more practical example of Scheme scripting. This script converts entire parse tree into a GIT configuration file format. The format itself is described in git.

The script traverses entire tree itself, so it must be called only once, for the root node of the parse tree. The root node is denoted by a single dot, so the invocation syntax is:

Traversal is performed by the main function, cfpeek, using the grecs-node-next and grecs-node-down functions. The grecs-node-next function returns a node which follows its argument at the same nesting level. For example, if n is the very first node in our sample parse tree, then:

n ⇒ #<node .user: "smith">
(grecs-node-next n) ⇒ #<node .group: "mail">

Similarly, the grecs-node-down function returns the first subordinate node of its argument. For example:

n ⇒ #<node .logging>
(grecs-node-down n) ⇒ #<node .logging.facility: "daemon">

Both functions return ‘#f’ if there are no next or subordinate node, correspondingly.

The grecs-node-type function is used to determine how to handle that particular node. It returns a type of the node given to it as argument. The type is an integer constant, with the following possible values:

Type	The node is
grecs-node-root	the root (topmost) node
grecs-node-stmt	a simple statement
grecs-node-block	a compound (block) statement

The print-section function prints a GIT section header corresponding to its node. It ascends the parent node chain to find the topmost node and prints the traversed nodes in the correct order.

(define (print-section node delim)
  "Print a Git section header for the given node.
End it with delim.

The function recursively calls itself until the topmost
node is reached.
"
  (cond
   ((grecs-node-up? node)
    ;; Ascend to the parent node
    (print-section (grecs-node-up node) #\space)
    ;; Print its identifier, ...
    (display (grecs-node-ident node))
    (if (grecs-node-has-value? node)
        ;; ... value,
        (begin
          (display " ")
          (display (grecs-node-value node))))
    ;; ... and delimiter
    (display delim))
   (else              ;; mark the root node
    (display "["))))  ;;  with a [


(define (cfpeek node)
  "Main entry point.  Calls itself recursively to descend
into subordinate nodes and to iterate over nodes on the
same nesting level (tail recursion)."
  (let loop ((node node))
    (if node
        (let ((type (grecs-node-type node)))
          (cond
           ((= type grecs-node-root)
            (let ((dn (grecs-node-down node)))
              ;; Each statement in a Git config file must
              ;; belong to a section.  If the first node
              ;; is not a block statement, provide the
              ;; default [core] section:
              (if (not (= (grecs-node-type dn)
                          grecs-node-block))
                  (display "[core]\n"))
              ;; Continue from the first node
              (loop dn)))
           ((= type grecs-node-block)
            ;; print the section header
            (print-section node #\])
            (newline)
            ;; descend into subnodes
            (loop (grecs-node-down node))
            ;; continue from the next node
            (loop (grecs-node-next node)))
           ((= type grecs-node-stmt)
            ;; print the simple statement
            (display #\tab)
            (display (grecs-node-ident node))
            (display " = ")
            (display (grecs-node-value node))
            (newline)
            ;; continue from the next node
            (loop (grecs-node-next node))))))))

$ cfpeek -f togit.scm sample.conf .
[core]
        user = smith
        group = mail
        pidfile = /var/run/example
[logging]
        facility = daemon
        tag = example
[program a]
        command = a.out
[program a logging]
        facility = local0
        tag = a
[program b]
        command = b.out
        wait = yes
        pidfile = /var/run/b.pid

CFPEEK