Chapter 25: Concrete Examples

In this chapter concrete examples of C++ programs, classes and templates are presented. Topics covered by the C++ Annotations such as virtual functions, static members, etc. are illustrated in this chapter. The examples roughly follow the organization of earlier chapters.

As an additional topic, not just providing examples of C++ the subjects of scanner and parser generators are covered. We show how these tools may be used in C++ programs. These additional examples assume a certain familiarity with the concepts underlying these tools, like grammars, parse-trees and parse-tree decoration. Once the input for a program exceeds a certain level of complexity, it's attractive to use scanner- and parser-generators to create the code doing the actual input processing. One of the examples in this chapter describes the usage of these tools in a C++ environment.

25.1: Using file descriptors with `streambuf' classes

25.1.1: Classes for output operations

Reading and writing from and to file descriptors are not part of the C++ standard. But on most operating systems file descriptors are available and can be considered a device. It seems natural to use the class std::streambuf as the starting point for constructing classes interfacing such file descriptor devices.

Below we'll construct classes that can be used to write to a device given its file descriptor. The devices may be files, but they could also be pipes or sockets. Section 25.1.2 covers reading from such devices; section 25.2.3 reconsiders redirection, discussed earlier in section 6.6.2.

Using the streambuf class as a base class it is relatively easy to design classes for output operations. The only member function that must be overridden is the (virtual) member int streambuf::overflow(int c). This member's responsibility is to write characters to the device. If fd is an output file descriptor and if output should not be buffered then the member overflow() can simply be implemented as:

    class UnbufferedFD: public std::streambuf
    {
        public:
            virtual int overflow(int c);
            ...
    };

    int UnbufferedFD::overflow(int c)
    {
        if (c != EOF)
        {
            if (write(d_fd, &c, 1) != 1)
                return EOF;
        }
        return c;
    }

The argument received by overflow is either written to the file descriptor (and returned from overflow), or EOF is returned.

This simple function does not use output buffering. For various reasons, using a buffer is usually a good idea (see also the next section).

When output buffering is used, the overflow member is a bit more complex as it is only called when the buffer is full. Once the buffer is full, we first have to flush the buffer. Flushing the buffer is the responsibility of the (virtual) function streambuf::sync. Since sync is a virtual function, classes derived from streambuf may redefine sync to flush a buffer streambuf itself doesn't know about.

Overriding sync and using it in overflow is not all that has to be done. When the object of the class defining the buffer reaches the end of its lifetime the buffer may be only partially full. In that situation the buffer must also be flushed. This is easily done by simply calling sync from the class's destructor.

Now that we've considered the consequences of using an output buffer, we're almost ready to design our derived class. Several more features are added as well, though:

To save some space in the C++ Annotations, the successful completion of the functions designed here is not checked in the example code. In `real life' implementations these checks should of course not be omitted. Our class OFdnStreambuf has the following characteristics: The next program uses the OFfdStreambuf class to copy its standard input to file descriptor STDOUT_FILENO, which is the symbolic name of the file descriptor used for the standard output:
    #include <string>
    #include <iostream>
    #include <istream>
    #include "fdout.h"
    using namespace std;

    int main(int argc, char **argv)
    {
        OFdnStreambuf   fds(STDOUT_FILENO, 500);
        ostream         os(&fds);

        switch (argc)
        {
            case 1:
                for (string  s; getline(cin, s); )
                    os << s << '\n';
                os << "COPIED cin LINE BY LINE\n";
            break;

            case 2:
                cin >> os.rdbuf();      // Alternatively, use:  cin >> &fds;
                os << "COPIED cin BY EXTRACTING TO os.rdbuf()\n";
            break;

            case 3:
                os << cin.rdbuf();
                os << "COPIED cin BY INSERTING cin.rdbuf() into os\n";
            break;
        }
    }

25.1.2: Classes for input operations

When classes for input operation are derived from std::streambuf, they should be provided with an input buffer of at least one character. The one-character input buffer allows for the use of the member functions istream::putback or istream::ungetc. Strictly speaking it is not necessary to implement a buffer in classes derived from streambuf. But using buffers in these classes is strongly advised. Their implementation is very simple and straightforward and the applicability of such classes is greatly improved. Therefore, all our classes derived from the class streambuf define a buffer of at least one character.

25.1.2.1: Using a one-character buffer

When deriving a class (e.g., IFdStreambuf) from streambuf using a buffer of one character, at least its member streambuf::underflow should be overridden, as this member eventually receives all requests for input. The member streambuf::setg is used to inform the streambuf base class of the size and location of the input buffer, so that it is able to set up its input buffer pointers accordingly. This ensures that streambuf::eback, streambuf::gptr, and streambuf::egptr return correct values.

The class IFdStreambuf is designed like this:

The following main function shows how IFdStreambuf can be used:
    int main()
    {
        IFdStreambuf fds(STDIN_FILENO);
        istream      is(&fds);

        cout << is.rdbuf();
    }

25.1.2.2: Using an n-character buffer

How complex would things get if we decided to use a buffer of substantial size? Not that complex. The following class allows us to specify the size of a buffer, but apart from that it is basically the same class as IFdStreambuf developed in the previous section. To make things a bit more interesting, in the class IFdNStreambuf developed here, the member streambuf::xsgetn is also overridden, to optimize reading a series of characters. Also a default constructor is provided that can be used in combination with the open member to construct an istream object before the file descriptor becomes available. In that case, once the descriptor becomes available, the open member can be used to initiate the object's buffer. Later, in section 25.2, we'll encounter such a situation.

To save some space, the success of various calls was not checked. In `real life' implementations, these checks should of course not be omitted. The class IFdNStreambuf has the following characteristics:

The member function xsgetn is called by streambuf::sgetn, which is a streambuf member. Here is an example illustrating the use of this member function with an IFdNStreambuf object:
    #include <unistd.h>
    #include <iostream>
    #include <istream>
    #include "ifdnbuf.h"
    using namespace std;

    int main()
    {
                                    // internally: 30 char buffer
        IFdNStreambuf fds(STDIN_FILENO, 30);

        char buf[80];               // main() reads blocks of 80
                                    // chars
        while (true)
        {
            size_t n = fds.sgetn(buf, 80);
            if (n == 0)
                break;
            cout.write(buf, n);
        }
    }

25.1.2.3: Seeking positions in `streambuf' objects

When devices support seek operations, classes derived from std::streambuf should override the members streambuf::seekoff and streambuf::seekpos. The class IFdSeek, developed in this section, can be used to read information from devices supporting seek operations. The class IFdSeek was derived from IFdStreambuf, so it uses a character buffer of just one character. The facilities to perform seek operations, which are added to our new class IFdSeek, ensure that the input buffer is reset when a seek operation is requested. The class could also be derived from the class IFdNStreambuf. In that case the arguments to reset the input buffer must be adapted so that its second and third parameters point beyond the available input buffer. Let's have a look at the characteristics of IFdSeek: Here is an example of a program using the class IFdSeek. If this program is given its own source file using input redirection then seeking is supported (and with the exception of the first line, every other line is shown twice):
    #include "fdinseek.h"
    #include <string>
    #include <iostream>
    #include <istream>
    #include <iomanip>
    using namespace std;

    int main()
    {
        IFdSeek fds(0);
        istream is(&fds);
        string  s;

        while (true)
        {
            if (!getline(is, s))
                break;

            streampos pos = is.tellg();

            cout << setw(5) << pos << ": `" << s << "'\n";

            if (!getline(is, s))
                break;

            streampos pos2 = is.tellg();

            cout << setw(5) << pos2 << ": `" << s << "'\n";

            if (!is.seekg(pos))
            {
                cout << "Seek failed\n";
                break;
            }
        }
    }

25.1.2.4: Multiple `unget' calls in `streambuf' objects

Streambuf classes and classes derived from streambuf should support at least ungetting the last read character. Special care must be taken when series of unget calls must be supported. In this section the construction of a class supporting a configurable number of istream::unget or istream::putback calls is discussed.

Support for multiple (say `n') unget calls is implemented by reserving an initial section of the input buffer, which is gradually filled up to contain the last n characters read. The class is implemented as follows:

An example using FdUnget

The next example program illustrates the use of the class FdUnget. It reads at most 10 characters from the standard input, stopping at EOF. A guaranteed unget-buffer of 2 characters is defined in a buffer holding 3 characters. Just before reading a character, the program tries to unget at most 6 characters. This is, of course, not possible; but the program nicely ungets as many characters as possible, considering the actual number of characters read:

    #include "fdunget.h"
    #include <string>
    #include <iostream>
    #include <istream>
    using namespace std;

    int main()
    {
        FdUnget fds(0, 3, 2);
        istream is(&fds);
        char    c;

        for (int idx = 0; idx < 10; ++idx)
        {
            cout << "after reading " << idx << " characters:\n";
            for (int ug = 0; ug <= 6; ++ug)
            {
                if (!is.unget())
                {
                    cout
                    << "\tunget failed at attempt " << (ug + 1) << "\n"
                    << "\trereading: '";

                    is.clear();
                    while (ug--)
                    {
                        is.get(c);
                        cout << c;
                    }
                    cout << "'\n";
                    break;
                }
            }

            if (!is.get(c))
            {
                cout << " reached\n";
                break;
            }
            cout << "Next character: " << c << '\n';
        }
    }
    /*
        Generated output after 'echo abcde | program':

        after reading 0 characters:
                unget failed at attempt 1
                rereading: ''
        Next character: a
        after reading 1 characters:
                unget failed at attempt 2
                rereading: 'a'
        Next character: b
        after reading 2 characters:
                unget failed at attempt 3
                rereading: 'ab'
        Next character: c
        after reading 3 characters:
                unget failed at attempt 4
                rereading: 'abc'
        Next character: d
        after reading 4 characters:
                unget failed at attempt 4
                rereading: 'bcd'
        Next character: e
        after reading 5 characters:
                unget failed at attempt 4
                rereading: 'cde'
        Next character:

        after reading 6 characters:
                unget failed at attempt 4
                rereading: 'de
        '
         reached
    */

25.1.3: Fixed-sized field extraction from istream objects

Usually when extracting information from istream objects operator>>, the standard extraction operator is perfectly suited for the task as in most cases the extracted fields are white-space (or otherwise clearly) separated from each other. But this does not hold true in all situations. For example, when a web-form is posted to some processing script or program, the receiving program may receive the form field's values as url-encoded characters: letters and digits are sent unaltered, blanks are sent as + characters, and all other characters start with % followed by the character's ascii-value represented by its two digit hexadecimal value.

When decoding url-encoded information, simple hexadecimal extraction won't work, as that extracts as many hexadecimal characters as available, instead of just two. Since the letters a-f` and 0-9 are legal hexadecimal characters, a text like My name is `Ed', url-encoded as

    My+name+is+%60Ed%27

results in the extraction of the hexadecimal values 60ed and 27, instead of 60 and 27. The name Ed disappears from view, which is clearly not what we want.

In this case, having seen the %, we could extract 2 characters, put them in an istringstream object, and extract the hexadecimal value from the istringstream object. A bit cumbersome, but doable. Other approaches are possible as well.

The class Fistream for fixed-sized field istream defines an istream class supporting both fixed-sized field extractions and blank-delimited extractions (as well as unformatted read calls). The class may be initialized as a wrapper around an existing istream, or it can be initialized using the name of an existing file. The class is derived from istream, allowing all extractions and operations supported by istreams in general. Fistream defines the following data members:

Here is the initial section of Fistream's class interface:
    class Fistream: public std::istream
    {
        std::unique_ptr<std::filebuf> d_filebuf;
        std::streambuf *d_streambuf;
        std::istringstream d_iss;
        size_t d_width;
As stated, Fistream objects can be constructed from either a filename or an existing istream object. The class interface therefore declares two constructors:
            Fistream(std::istream &stream);
            Fistream(char const *name,
                std::ios::openmode mode = std::ios::in);
When an Fistream object is constructed using an existing istream object, the Fistream's istream part simply uses the stream's streambuf object:
Fistream::Fistream(istream &stream)
:
    istream(stream.rdbuf()),
    d_streambuf(rdbuf()),
    d_width(0)
{}
When an fstream object is constructed using a filename, the istream base initializer is given a new filebuf object to be used as its streambuf. Since the class's data members are not initialized before the class's base class has been constructed, d_filebuf can only be initialized thereafter. By then, the filebuf is only available as rdbuf, returning a streambuf. However, as it is actually a filebuf, a static_cast is used to cast the streambuf pointer returned by rdbuf to a filebuf *, so d_filebuf can be initialized:
Fistream::Fistream(char const *name, ios::openmode mode)
:
    istream(new filebuf()),
    d_filebuf(static_cast<filebuf *>(rdbuf())),
    d_streambuf(d_filebuf.get()),
    d_width(0)
{
    d_filebuf->open(name, mode);
}

25.1.3.1: Member functions and example

There is only one additional public member: setField(field const &). This member defines the size of the next field to extract. Its parameter is a reference to a field class, a manipulator class defining the width of the next field.

Since a field & is mentioned in Fistream's interface, field must be declared before Fistream's interface starts. The class field itself is simple and declares Fistream as its friend. It has two data members: d_width specifies the width of the next field, and d_newWidth which is set to true if d_width's value should actually be used. If d_newWidth is false, Fistream returns to its standard extraction mode. The class field has two constructors: a default constructor, setting d_newWidth to false, and a second constructor expecting the width of the next field to extract as its value. Here is the class field:

    class field
    {
        friend class Fistream;
        size_t d_width;
        bool     d_newWidth;

        public:
            field(size_t width);
            field();
    };

    inline field::field(size_t width)
    :
        d_width(width),
        d_newWidth(true)
    {}

    inline field::field()
    :
        d_newWidth(false)
    {}
Since field declares Fistream as its friend, setField may inspect field's members directly.

Time to return to setField. This function expects a reference to a field object, initialized in one of three different ways:

Here is setField's implementation:
std::istream &Fistream::setField(field const &params)
{
    if (params.d_newWidth)                  // new field size requested
        d_width = params.d_width;           // set new width

    if (!d_width)                           // no width?
        rdbuf(d_streambuf);                 // return to the old buffer
    else
        setBuffer();                        // define the extraction buffer

    return *this;
}

The private member setBuffer defines a buffer of d_width + 1 characters and uses read to fill the buffer with d_width characters. The buffer is an NTBS. This buffer is used to initialize the d_iss member. Fistream's rdbuf member is used to extract the d_str's data via the Fistream object itself:

void Fistream::setBuffer()
{
    char *buffer = new char[d_width + 1];

    rdbuf(d_streambuf);                         // use istream's buffer to
    buffer[read(buffer, d_width).gcount()] = 0; // read d_width chars,
                                                // terminated by a 0-byte
    d_iss.str(buffer);
    delete[] buffer;

    rdbuf(d_iss.rdbuf());                       // switch buffers
}

Although setField could be used to configure Fistream to use or not to use fixed-sized field extraction, using manipulators is probably preferable. To allow field objects to be used as manipulators an overloaded extraction operator was defined. This extraction operator accepts istream & and a field const & objects. Using this extraction operator, statements like

fis >> field(2) >> x >> field(0);

are possible (assuming fis is a Fistream object). Here is the overloaded operator>>, as well as its declaration:

istream &std::operator>>(istream &str, field const &params)
{
    return static_cast<Fistream *>(&str)->setField(params);
}
Declaration:
namespace std
{
    istream &operator>>(istream &str, FBB::field const &params);
}

Finally, an example. The following program uses a Fistream object to url-decode url-encoded information appearing at its standard input:

    int main()
    {
        Fistream fis(cin);

        fis >> hex;
        while (true)
        {
            size_t x;
            switch (x = fis.get())
            {
                case '\n':
                    cout << '\n';
                break;
                case '+':
                    cout << ' ';
                break;
                case '%':
                    fis >> field(2) >> x >> field(0);
                // FALLING THROUGH
                default:
                    cout << static_cast<char>(x);
                break;
                case EOF:
                return 0;
            }
        }
    }
    /*
        Generated output after:
            echo My+name+is+%60Ed%27 | a.out

        My name is `Ed'
    */

25.2: The `fork' system call

From the C programming language the fork system call is well known. When a program needs to start a new process, system can be used. The function system requires the program to wait for the child process to terminate. The more general way to spawn subprocesses is to use fork.

In this section we investigate how C++ can be used to wrap classes around a complex system call like fork. Much of what follows in this section directly applies to the Unix operating system, and the discussion therefore focuses on that operating system. Other systems usually provide comparable facilities. What follows is closely related to the Template Design Pattern (cf. Gamma et al. (1995) Design Patterns, Addison-Wesley)

When fork is called, the current program is duplicated in memory, thus creating a new process. Following this duplication both processes continue their execution just below the fork system call. The two processes may inspect fork's return value: the return value in the original process (called the parent process) differs from the return value in the newly created process (called the child process):

25.2.1: A basic Fork class

A basic Fork class should hide all bookkeeping details of a system call like fork from its users. The class Fork developed here does just that. The class itself only ensures the proper execution of the fork system call. Normally, fork is called to start a child process, usually boiling down to the execution of a separate process. This child process may expect input at its standard input stream and/or may generate output to its standard output and/or standard error streams. Fork does not know all this, and does not have to know what the child process will do. Fork objects should be able to start their child processes.

Fork's constructor cannot know what actions its child process should perform. Similarly, it cannot know what actions the parent process should perform. For these kind of situations, the template method design pattern was developed. According to Gamma c.s., the template method design pattern

``Define(s) the skeleton of an algorithm in an operation, deferring some steps to subclasses. [The] Template Method (design pattern) lets subclasses redefine certain steps of an algorithm, without changing the algorithm's structure.''

This design pattern allows us to define an abstract base class already providing the essential steps related to the fork system call, deferring the implementation of other parts of the fork system call to subclasses.

The Fork abstract base class has the following characteristics:

25.2.2: Parents and Children

The member function fork calls the system function fork (Caution: since the system function fork is called by a member function having the same name, the :: scope resolution operator must be used to prevent a recursive call of the member function itself). The function ::fork's return value determines whether parentProcess or childProcess is called. Maybe redirection is necessary. Fork::fork's implementation calls childRedirections just before calling childProcess, and parentRedirections just before calling parentProcess:
    #include "fork.ih"

    void Fork::fork()
    {
        if ((d_pid = ::fork()) < 0)
            throw "Fork::fork() failed";

        if (d_pid == 0)                 // childprocess has pid == 0
        {
            childRedirections();
            childProcess();
            exit(1);                    // we shouldn't come here:
        }                               // childProcess() should exit

        parentRedirections();
        parentProcess();
    }
In fork.cc the class's internal header file fork.ih is included. This header file takes care of the inclusion of the necessary system header files, as well as the inclusion of fork.h itself. Its implementation is:
    #include "fork.h"
    #include <cstdlib>
    #include <unistd.h>
    #include <sys/types.h>
    #include <sys/wait.h>

Child processes should not return: once they have completed their tasks, they should terminate. This happens automatically when the child process performs a call to a member of the exec... family, but if the child itself remains active, then it must make sure that it terminates properly. A child process normally uses exit to terminate itself, but note that exit prevents the activation of destructors of objects defined at the same or more superficial nesting levels than the level at which exit is called. Destructors of globally defined objects are activated when exit is used. When using exit to terminate childProcess, it should either itself call a support member function defining all nested objects it needs, or it should define all its objects in a compound statement (e.g., using a throw block) calling exit beyond the compound statement.

Parent processes should normally wait for their children to complete. Terminating child processes inform their parents that they are about to terminate by sending a signal that should be caught by their parents. If child processes terminate and their parent processes do not catch those signals then such child processes remain visible as so-called zombie processes.

If parent processes must wait for their children to complete, they may call the member waitForChild. This member returns the exit status of a child process to its parent.

There exists a situation where the child process continues to live, but the parent dies. This is a fairly natural event: parents tend to die before their children do. In our context (i.e. C++), this is called a daemon program. In a daemon the parent process dies and the child program continues to run as a child of the basic init process. Again, when the child eventually dies a signal is sent to its `step-parent' init. This does not create a zombie as init catches the termination signals of all its (step-) children. The construction of a daemon process is very simple, given the availability of the class Fork (cf. section 25.2.4).

25.2.3: Redirection revisited

Earlier, in section 6.6.2 streams were redirected using the ios::rdbuf member function. By assigning the streambuf of a stream to another stream, both stream objects access the same streambuf, thus implementing redirection at the level of the programming language itself.

This may be fine within the context of a C++ program, but once we leave that context the redirection terminates. The operating system does not know about streambuf objects. This situation is encountered, e.g., when a program uses a system call to start a subprogram. The example program at the end of this section uses C++ redirection to redirect the information inserted into cout to a file, and then calls

    system("echo hello world")

to echo a well-known line of text. Since echo writes its information to the standard output, this would be the program's redirected file if the operating system would recognize C++'s redirection.

But redirection doesn't happen. Instead, hello world still appears at the program's standard output and the redirected file is left untouched. To write hello world to the redirected file redirection must be realized at the operating system level. Some operating systems (e.g., Unix and friends) provide system calls like dup and dup2 to accomplish this. Examples of the use of these system calls are given in section 25.2.5.

Here is the example of the failing redirection at the system level following C++ redirection using streambuf redirection:

    #include <iostream>
    #include <fstream>
    #include <cstdlib>
    using namespace std;

    int main()
    {
        ofstream of("outfile");

        streambuf *buf = cout.rdbuf(of.rdbuf());
        cout << "To the of stream\n";
        system("echo hello world");
        cout << "To the of stream\n";
        cout.rdbuf(buf);
    }
    /*
        Generated output: on the file `outfile'

        To the of stream
        To the of stream

        On standard output:

        hello world
    */

25.2.4: The `Daemon' program

Applications exist in which the only purpose of fork is to start a child process. The parent process terminates immediately after spawning the child process. If this happens, the child process continues to run as a child process of init, the always running first process on Unix systems. Such a process is often called a daemon, running as a background process.

Although the next example can easily be constructed as a plain C program, it was included in the C++ Annotations because it is so closely related to the current discussion of the Fork class. I thought about adding a daemon member to that class, but eventually decided against it because the construction of a daemon program is very simple and requires no features other than those currently offered by the class Fork. Here is an example illustrating the construction of such a daemon program. Its child process doesn't do exit but throw 0 which is caught by the catch clause of the child's main function. Doing this ensures that any objects defined by the child process are properly destroyed:

    #include <iostream>
    #include <unistd.h>
    #include "fork.h"

    class Daemon: public Fork
    {
        virtual void parentProcess()        // the parent does nothing.
        {}
        virtual void childProcess()         // actions by the child
        {
            sleep(3);
                                            // just a message...
            std::cout << "Hello from the child process\n";
            throw 0;                        // The child process ends
        }
    };

    int main()
    try
    {
        Daemon{}.fork();
    }
    catch(...)
    {}

    /*
        Generated output:
    The next command prompt, then after 3 seconds:
    Hello from the child process
    */

25.2.5: The class `Pipe'

Redirection at the system level requires the use of file descriptor, created by the pipe system call. When two processes want to communicate using such file descriptors, the following happens: Though basically simple, errors may easily creep in. Functions of file descriptors available to the two processes (child or parent) may easily get mixed up. To prevent bookkeeping errors, the bookkeeping may be properly set up once, to be hidden thereafter inside a class like the Pipe class developed here. Let's have a look at its characteristics (before using functions like pipe and dup the compiler must have read the <unistd.h> header file): Now that redirection can be configured easily using one or more Pipe objects, we'll use Fork and Pipe in various example programs.

25.2.6: The class `ParentSlurp'

The class ParentSlurp, derived from Fork, starts a child process executing a stand-alone program (like /bin/ls). The (standard) output of the executed program is not shown on the screen but is read by the parent process.

For demonstration purposes the parent process writes the lines it receives to its standard output stream, prepending linenumbers to the lines. It is attractive to redirect the parent's standard input stream to allow the parent to read the output from the child process using its std::cin input stream. Therefore, the only pipe in the program is used as an input pipe for the parent, and an output pipe for the child.

The class ParentSlurp has the following characteristics:

The following program simply constructs a ParentSlurp object, and calls its fork() member. Its output consists of a numbered list of files in the directory where the program is started. Note that the program also needs the fork.o, pipe.o and waitforchild.o object files (see earlier sources):
    int main()
    {
        ParentSlurp{}.fork();
    }
    /*
        Generated Output (example only, actually obtained output may differ):

        1: a.out
        2: bitand.h
        3: bitfunctional
        4: bitnot.h
        5: daemon.cc
        6: fdinseek.cc
        7: fdinseek.h
        ...
    */

25.2.7: Communicating with multiple children

The next step up the ladder is the construction of a child-process monitor. Here, the parent process is responsible for all its child processes, but it also must read their standard output. The user enters information at the standard input of the parent process. A simple command language is used for this: If a child process hasn't received text for some time it will complain by sending a message to the parent-process. Those messages are simply transmitted to the user by copying them to the standard output stream.

A problem with programs like our monitor is that they allow asynchronous input from multiple sources. Input may appear at the standard input as well as at the input-sides of pipes. Also, multiple output channels are used. To handle situations like these, the select system call was developed.

25.2.7.1: The class `Selector': interface

The select system call was developed to handle asynchronous I/O multiplexing. The select system call is used to handle, e.g., input appearing simultaneously at a set of file descriptors.

The select function is rather complex, and its full discussion is beyond the C++ Annotations' scope. By encapsulating select in a class Selector, hiding its details and offering an intuitively attractive interface, its use is simplified. The Selector class has these features:

25.2.7.2: The class `Selector': implementation

Selector's member functions serve the following tasks: The class's remaining (two) members are support members, and should not be used by non-member functions. Therefore, they are declared in the class's private section:

25.2.7.3: The class `Monitor': interface

The monitor program uses a Monitor object doing most of the work. The class Monitor's public interface only offers a default constructor and one member, run, to perform its tasks. All other member functions are located in the class's private section.

Monitor defines the private enum Commands, symbolically listing the various commands its input language supports, as well as several data members. Among the data members are a Selector object and a map using child order numbers as its keys and pointer to Child objects (see section 25.2.7.7) as its values. Furthermore, Monitor has a static array member s_handler[], storing pointers to member functions handling user commands.

A destructor should be implemented as well, but its implementation is left as an exercise to the reader. Here is Monitor's interface, including the interface of the nested class Find that is used to create a function object:

    class Monitor
    {
        enum Commands
        {
            UNKNOWN,
            START,
            EXIT,
            STOP,
            TEXT,
            sizeofCommands
        };

        using MapIntChild = std::map<int, std::shared_ptr<Child>>;

        friend class Find;
        class Find
        {
            int     d_nr;
            public:
                Find(int nr);
                bool operator()(MapIntChild::value_type &vt) const;
        };

        Selector    d_selector;
        int         d_nr;
        MapIntChild d_child;

        static void (Monitor::*s_handler[])(int, std::string const &);
        static int s_initialize;

        public:
            enum Done
            {};

            Monitor();
            void run();

        private:
            static void killChild(MapIntChild::value_type it);
            static int initialize();

            Commands    next(int *value, std::string *line);
            void    processInput();
            void    processChild(int fd);

            void    createNewChild(int, std::string const &);
            void    exiting(int = 0, std::string const &msg = std::string{});
            void    sendChild(int value, std::string const &line);
            void    stopChild(int value, std::string const &);
            void    unknown(int, std::string const &);
    };

Since there's only one non-class type data member, the class's constructor is a very simple function which could be implemented inline:

    inline Monitor::Monitor()
    :
        d_nr(0)
    {}

25.2.7.4: The class `Monitor': s_handler

The array s_handler, storing pointers to functions needs to be initialized as well. This can be accomplished in several ways:

25.2.7.5: The class `Monitor': the member `run'

Monitor's core activities are performed by run. It performs the following tasks: Here is run's implementation and s_initialize's definition:
    #include "monitor.ih"

    int Monitor::s_initialize = Monitor::initialize();

    void Monitor::run()
    {
        d_selector.addReadFd(STDIN_FILENO);

        while (true)
        {
            cout << "? " << flush;
            try
            {
                d_selector.wait();

                int fd;
                while ((fd = d_selector.readFd()) != -1)
                {
                    if (fd == STDIN_FILENO)
                        processInput();
                    else
                        processChild(fd);
                }
                cout << "NEXT ...\n";
            }
            catch (char const *msg)
            {
                exiting(1, msg);
            }
        }
    }

The member function processInput reads the commands entered by the user using the program's standard input stream. The member itself is rather simple. It calls next to obtain the next command entered by the user, and then calls the corresponding function using the matching element of the s_handler[] array. Here are the members processInput and next:

    void Monitor::processInput()
    {
        string line;
        int    value;
        Commands cmd = next(&value, &line);
        (this->*s_handler[cmd])(value, line);
    }
    Monitor::Commands Monitor::next(int *value, string *line)
    {
        if (!getline(cin, *line))
            exiting(1, "Monitor::next(): reading cin failed");

        if (*line == "start")
            return START;

        if (*line == "exit" || *line == "quit")
        {
            *value = 0;
            return EXIT;
        }

        if (line->find("stop") == 0)
        {
            istringstream istr(line->substr(4));
            istr >> *value;
            return !istr ? UNKNOWN : STOP;
        }

        istringstream istr(line->c_str());
        istr >> *value;
        if (istr)
        {
            getline(istr, *line);
            return TEXT;
        }

        return UNKNOWN;
    }

All other input sensed by d_select is created by child processes. Because d_select's readFd member returns the corresponding input file descriptor, this descriptor can be passed to processChild. Using a IFdStreambuf (see section 25.1.2.1), its information is read from an input stream. The communication protocol used here is rather basic. For every line of input sent to a child, the child replies by sending back exactly one line of text. This line is then read by processChild:

    void Monitor::processChild(int fd)
    {
        IFdStreambuf ifdbuf(fd);
        istream istr(&ifdbuf);
        string line;

        getline(istr, line);
        cout << d_child[fd]->pid() << ": " << line << '\n';
    }
The construction d_child[fd]->pid() used in the above source deserves some special attention. Monitor defines the data member map<int, shared_ptr<Child>> d_child. This map contains the child's order number as its key, and a (shared) pointer to the Child object as its value. A shared pointer is used here, rather than a Child object, since we want to use the facilities offered by the map, but don't want to copy a Child object time and again.

25.2.7.6: The class `Monitor': example

Now that run's implementation has been covered, we'll concentrate on the various commands users might enter: The program's main function is simple and needs no further comment:
    int main()
    try
    {
        Monitor{}.run();
    }
    catch (int exitValue)
    {
        return exitValue;
    }

25.2.7.7: The class `Child'

When the Monitor object starts a child process, it creates an object of the class Child. The Child class is derived from the class Fork, allowing it to operate as a daemon (as discussed in the previous section). Since Child is a daemon class, we know that its parent process must be defined as an empty function. Its childProcess member has a non-empty implementation. Here are the characteristics of the class Child:

25.3: Adding binary operators to classes

As we've seen in section 11.6 binary operators expecting const & arguments can be implemented using a member implementing the operation, only offering the basic exception guarantee. This latter function can in turn be implemented using the binary assignment member. The following examples illustrated this approach for a fictitious class Binary:
    class Binary
    {
        public:
            Binary();
            Binary(int value);
                // copy and move constructors are available by default, or
                // they can be explicitly declared and implemented.

            Binary &operator+=(Binary const &other) &;    // see the text
            Binary &&operator+=(Binary const &other) &&;

        private:
            void add(Binary const &rhs);

        friend Binary operator+(Binary const &lhs, Binary const &rhs);
        friend Binary operator+(Binary &&lhs, Binary const &rhs);
    };

Eventually, the implementation of binary operators depends on the availability of the member implementing the basic binary operation, modifying the object calling that member (i.e., void Binary::add(Binary const &) in the example).

Since template functions are not instantiated before they are actually used we can call non-existing functions from template functions that are never instantiated. If such a template function is never instantiated, nothing happens; if it is (accidentally) instantiated, then the compiler generates an error message, complaining about the missing function.

This allows us to implement all binary operators, movable and non-movable, as templates. In the following subsections we develop the class template Binops, prividing binary operators. A complete implementation of a class Derived illustrating how addition and insertion operators can be added to a class is provided in the file annotations/yo/concrete/examples/binopclasses.cc in the C++ Annotations' source archive.

25.3.1: Merely using operators

In section 11.6 addition operators are implemented in terms of a support member add. This is less attractive when developing function templates, as add is a private member, requiring us to provide friend declarations for all function templates so they may access the private add member.

At the end of section 11.6 we saw that add's implementation can be provided by operator+=(Class const &rhs) &&. This operator may thereupon be used when implementing the remaining addition operators:

    inline Binary &operator+=(Binary const &rhs) &
    {
        return *this = Binary{*this} += rhs;        
    }

    Binary operator+(Binary &&lhs, Binary const &rhs)
    {
        return std::move(lhs) += rhs;
    }

    Binary operator+(Binary const &lhs, Binary const &rhs)
    {
        return Binary{lhs} += rhs;
    }

In this implementation add is no longer required. The plain binary operators are free functions, which supposedly can easily be converted to function templates. E.g.,

    template <typename Binary>
    Binary operator+(Binary const &lhs, Binary const &rhs)
    {
        return Binary{lhs} += rhs;
    }

25.3.1.1: To namespace or not to namespace?

When using the function template Binary operator+(Binary const &lhs, Binary const &rhs), however, we may encounter a subtle and unexpected complication. Consider the following program. When run, it displays the value 12, rather than 1:
    enum Values
    {
        ZERO,
        ONE
    };
    
    template <typename Tp>
    Tp operator+(Tp const &lhs, Tp const &rhs)
    {
        return static_cast<Tp>(12);
    };
    
    int main()
    {
        cout << (ZERO + ONE);       // shows 12
    }

This complication can be avoided by defining the operators in their own namespace, but then all classes using the binary operator also have to be defined in that namespace, which is not a very attractive restriction. Fortunately, there is a better alternative: using the CRTP (cf. section 22.12).

25.3.2: The CRTP and defining operator function templates

When deriving classes from a class template Binops, using the CRTP the operators are defined for arguments of the class Binops<Derived>: a base class receiving the derived class as its template argument.

Thus the class Binops as well as the additional operators are defined, expecting Binops<Derived> type of arguments:

    template <class Derived>
    struct Binops
    {
        Derived &operator+=(Derived const &rhs) &;
    };

    template <typename Derived>
    Derived operator+(Binops<Derived> const &lhs, Derived const &rhs)
    {
        return Derived{static_cast<Derived const &>(lhs) } += rhs;
    }
    // analogous implementation for Binops<Derived> &&lhs

This way, a class that derives from Binops, and that provides an operator+= member which is bound to an rvalue reference object, suddenly also provides all other binary addition operators:

    class Derived: public Binops<Derived>
    {
        ...
        public:
            ...
            Derived &&operator+=(Derived const &rhs) &&
    };

All, but one....

The operator that's not available is the compound addition operator, bound to an lvalue reference. As its function name is identical to the one in the class Derived, it is not automatically visible at the user level.

Although this problem can simply be solved by providing the class Derived with a using Binops<Derived>::operator+= declaration, it is not a very attractive solution, as separate using declarations have to be provided for each binary operator that is implemented in the class Derived.

But a much more attractive solution exists. A beautiful out-of-the-box solution, completely avoiding the hidden base class operator, was proposed by Wiebe-Marten Wijnja. Wiebe-Marten conjectured that operator+=, bound to an lvalue reference could also very well be defined as a free function. In that case no inheritance is used and therefore no function hiding occurs. Consequently, the using directive can be avoided.

The implementation of this free operator+= function looks like this:

    template <class Derived>
    Derived &operator+=(Binops<Derived> &lhs, Derived const &rhs) 
    {
        Derived tmp{ Derived{ static_cast<Derived &>(lhs) } += rhs };
        tmp.swap(static_cast<Derived &>(lhs));
        return static_cast<Derived &>(lhs);
    }

The flexibility of this design can be further augmented once we realize that the right-hand side operand doesn't have to be a Derived class object. Consider operator<<: oftentimes shifts are bit-shifts, using a size_t to specify the number of bits to shift. In fact, the type of the right-hand side operand can completely be generalized by defining a second template type parameter, which is used to specify the right-hand side's operand type. It's up to the Derived class to specify the argument type of its operator+= (or any other binary compound operator), whereafter the compiler will deduce the types of the right-hand side operands for the remaining binary operators. Here is the final implementation of the free operator+= function:

    template <class Derived, typename Rhs>
    Derived &operator+=(Binops<Derived> &lhs, Rhs const &rhs) 
    {
        Derived tmp{ Derived{ static_cast<Derived &>(lhs) } += rhs };
        tmp.swap(static_cast<Derived &>(lhs));
        return static_cast<Derived &>(lhs);
    }

25.3.3: Insertion and extraction

Classes also frequently define overloaded insertion and extraction operators. Since there are no `compound insertion operators' the design shown so far cannot be used when overloading these operators. Instead using standardized member function signatures is advocated: void insert(std::ostream &out) const to insert an object into an ostream and void extract(std::istream &in) const to extract an object from an istream. As these functions are only used by, respectively, the insertion and extraction operators, they can be declared in the Derived class's private interface. Instead of declaring the insertion and extraction operators friends of the class Derived a single friend Binops<Derived> is specified. This allows Binops<Derived> to define private, inline iWrap and eWrap members, merely calling, respectively, Derived's insert and extract members:
    template <typename Derived>
    inline void Binops<Derived>::iWrap(std::ostream &out) const
    {
        static_cast<Derived const &>(*this).insert(out);
    }

Binops<Derived> then declares the insertion and extraction operators as its friends, allowing these operators to call, respectively, iWrap and eWrap. Note that the software engineer designing the class Derived only has to provide a friend Binops<Derived> declaration. Here is the implementation of the overloaded insertion operator:

    template <typename Derived>
    std::ostream &operator<<(std::ostream &out, Binops<Derived> const &obj)
    {
        obj.iWrap(out);
        return out;
    }

This completes the coverage of the essentials of a class template Binops potentially offering binary operators and insertion/extraction operators for any class derived from Binops. Finally, as noted at the beginning of this section, a complete implementation of a class offering addition and insertion operators is provided in the file annotations/yo/concrete/examples/binopclasses.cc in the C++ Annotations' source archive.

25.4: Distinguishing lvalues from rvalues with operator[]()

A problem with operator[] is that it can't distinguish between its use as an lvalue and as an rvalue. It is a familiar misconception to think that
    Type const &operator[](size_t index) const

is used as rvalue (as the object isn't modified), and that

    Type &operator[](size_t index)

is used as lvalue (as the returned value can be modified).

The compiler, however, distinguishes between the two operators only by the const-status of the object for which operator[] is called. With const objects the former operator is called, with non-const objects the latter is always used. It is always used, irrespective of it being used as lvalue or rvalue.

Being able to distinguish between lvalues and rvalues can be very useful. Consider the situation where a class supporting operator[] stores data of a type that is very hard to copy. With data like that reference counting (e.g., using shared_ptrs) is probably used to prevent needless copying.

As long as operator[] is used as rvalue there's no need to copy the data, but the information must be copied if it is used as lvalue.

The Proxy Design Pattern (cf. Gamma et al. (1995)) can be used to distinguish between lvalues and rvalues. With the Proxy Design Pattern an object of another class (the Proxy class) is used to act as a stand in for the `real thing'. The proxy class offers functionality that cannot be offered by the data themselves, like distinguishing between its use as lvalue or rvalue. A proxy class can be used in many situations where access to the real data cannot or should not be directly provided. In this regard iterator types are examples of proxy classes as they create a layer between the real data and the software using the data. Proxy classes could also dereference pointers in a class storing its data by pointers.

In this section we concentrate on the distinction between using operator[] as lvalue and rvalue. Let's assume we have a class Lines storing lines from a file. Its constructor expects the name of a stream from which the lines are read and it offers a non-const operator[] that can be used as lvalue or rvalue (the const version of operator[] is omitted as it causes no confusion because it is always used as rvalue):

    class Lines
    {
        std::vector<std::string> d_line;

        public:
            Lines(std::istream &in);
            std::string &operator[](size_t idx);
    };

To distinguish between lvalues and rvalues we must find distinguishing characteristics of lvalues and rvalues that we can exploit. Such distinguishing characteristics are operator= (which is always used as lvalue) and the conversion operator (which is always used as rvalue). Rather than having operator[] return a string & we can let it return a Proxy object that is able to distinguish between its use as lvalue and rvalue.

The class Proxy thus needs operator=(string const &other) (acting as lvalue) and operator std::string const &() const (acting as rvalue). Do we need more operators? The std::string class also offers operator+=, so we should probably implement that operator as well. Plain characters can also be assigned to string objects (even using their numeric values). As string objects cannot be constructed from plain characters promotion cannot be used with operator=(string const &other) if the right-hand side argument is a character. Implementing operator=(char value) could therefore also be considered. These additional operators are left out of the current implementation but `real life' proxy classes should consider implementing these additional operators as well. Another subtlety is that Proxy's operator std::string const &() const is not used when using ostream's insertion operator or istream's extraction operator as these operators are implemented as templates not recognizing our Proxy class type. So when stream insertion and extraction is required (it probably is) then Proxy must be given its own overloaded insertion and extraction operator. Here is an implementation of the overloaded insertion operator inserting the object for which Proxy is a stand-in:

inline std::ostream &operator<<(std::ostream &out, Lines::Proxy const &proxy)
{
    return out << static_cast<std::string const &>(proxy);
}

There's no need for any code (except Lines) to create or copy Proxy objects. Proxy's constructor should therefore be made private, and Proxy can declare Lines to be its friend. In fact, Proxy is intimately related to Lines and can be defined as a nested class. In the revised Lines class operator[] no longer returns a string but instead a Proxy is returned. Here is the revised Lines class, including its nested Proxy class:

    class Lines
    {
        std::vector<std::string> d_line;

        public:
            class Proxy;
            Proxy operator[](size_t idx);
            class Proxy
            {
                friend Proxy Lines::operator[](size_t idx);
                std::string &d_str;
                Proxy(std::string &str);
                public:
                    std::string &operator=(std::string const &rhs);
                    operator std::string const &() const;
            };
            Lines(std::istream &in);
    };

Proxy's members are very lightweight and can usually be implemented inline:

    inline Lines::Proxy::Proxy(std::string &str)
    :
        d_str(str)
    {}
    inline std::string &Lines::Proxy::operator=(std::string const &rhs)
    {
        return d_str = rhs;
    }
    inline Lines::Proxy::operator std::string const &() const
    {
        return d_str;
    }

The member Lines::operator[] can also be implemented inline: it merely returns a Proxy object initialized with the string associated with index idx.

Now that the class Proxy has been developed it can be used in a program. Here is an example using the Proxy object as lvalue or rvalue. On the surface Lines objects won't behave differently from Lines objects using the original implementation, but by adding an identifying cout statement to Proxy's members it can be shown that operator[] behaves differently when used as lvalue or as rvalue:

    int main()
    {
        ifstream in("lines.cc");
        Lines lines(in);

        string s = lines[0];        // rvalue use
        lines[0] = s;               // lvalue use
        cout << lines[0] << '\n';   // rvalue use
        lines[0] = "hello world";   // lvalue use
        cout << lines[0] << '\n';   // rvalue use
    }

25.5: Implementing a `reverse_iterator'

In section 22.14.1 the construction of iterators and reverse iterators was discussed. In that section the iterator was constructed as an inner class in a class derived from a vector of pointers to strings.

An object of this nested iterator class handles the dereferencing of the pointers stored in the vector. This allowed us to sort the strings pointed to by the vector's elements rather than the pointers.

A drawback of this is that the class implementing the iterator is closely tied to the derived class as the iterator class was implemented as a nested class. What if we would like to provide any class derived from a container class storing pointers with an iterator handling pointer-dereferencing?

In this section a variant of the earlier (nested class) approach is discussed. Here the iterator class is defined as a class template, not only parameterizing the data type to which the container's elements point but also the container's iterator type itself. Once again, we concentrate on developing a RandomIterator as it is the most complex iterator type.

Our class is named RandomPtrIterator, indicating that it is a random iterator operating on pointer values. The class template defines three template type parameters:

RandomPtrIterator has one private data member, a BaseIterator. Here is the class interface and the constructor's implementation:

    #include <iterator>
    #include <compare>

    template <typename Class, typename BaseIterator, typename Type>
    struct RandomPtrIterator;

    #define PtrIterator RandomPtrIterator<Class, BaseIterator, Type>
    #define PtrIteratorValue RandomPtrIterator<Class, BaseIterator, value_type>

    template <typename Class, typename BaseIterator, typename Type>
    bool operator==(PtrIterator const &lhs, PtrIterator const &rhs);

    template <typename Class, typename BaseIterator, typename Type>
    auto operator<=>(PtrIterator const &lhs, PtrIterator const &rhs);

    template <typename Class, typename BaseIterator, typename Type>
    int operator-(PtrIterator const &lhs, PtrIterator const &rhs);

    template <typename Class, typename BaseIterator, typename Type>
    struct RandomPtrIterator
    {
        using iterator_category = std::random_access_iterator_tag;
        using difference_type   = std::ptrdiff_t;
        using value_type        = Type;
        using pointer           = value_type *;
        using reference         = value_type &;


        friend PtrIterator Class::begin();
        friend PtrIterator Class::end();

        friend bool operator==<>(RandomPtrIterator const &lhs,
                                 RandomPtrIterator const &rhs);
        friend auto operator<=><>(RandomPtrIterator const &lhs,
                                  RandomPtrIterator const &rhs);
        friend int operator-<>(RandomPtrIterator const &lhs,
                               RandomPtrIterator const &rhs);
        private:
            BaseIterator d_current;

        public:
            int operator-(RandomPtrIterator const &rhs) const;
            RandomPtrIterator operator+(int step) const;
            value_type &operator*() const;
            RandomPtrIterator &operator--();
            RandomPtrIterator operator--(int);
            RandomPtrIterator &operator++();
            RandomPtrIterator operator++(int);
            RandomPtrIterator operator-(int step) const;
            RandomPtrIterator &operator-=(int step);
            RandomPtrIterator &operator+=(int step);
            value_type *operator->() const;

        private:
            RandomPtrIterator(BaseIterator const &current);
    };

    template <typename Class, typename BaseIterator, typename value_type>
    PtrIteratorValue::RandomPtrIterator(BaseIterator const &current)
    :
        d_current(current)
    {}
Looking at its friend declarations, we see that the members begin and end of a class Class, returning a RandomPtrIterator object for the types Class, BaseIterator and Type are granted access to RandomPtrIterator's private constructor. That is exactly what we want. The Class's begin and end members are declared as bound friends.

All RandomPtrIterator's remaining members are public. Since RandomPtrIterator is just a generalization of the nested class iterator developed in section 22.14.1, re-implementing the required member functions is easy and only requires us to change iterator into RandomPtrIterator and to change std::string into Type. For example, operator<, defined in the class iterator as is now implemented as:

    template <typename Class, typename BaseIterator, typename Type>
    inline auto operator<=>(PtrIterator const &lhs, PtrIterator const &rhs)
    {
        return **lhs.d_current <=> **rhs.d_current;
    }
Some additional examples: operator*, defined in the class iterator as
inline std::string &StringPtr::iterator::operator*() const
{
    return **d_current;
}
is now implemented as:
    template <typename Class, typename BaseIterator, typename value_type>
    value_type &PtrIteratorValue::operator*() const
    {
        return **d_current;
    }
The pre- and postfix increment operators are now implemented as:
    template <typename Class, typename BaseIterator, typename value_type>
    PtrIteratorValue &PtrIteratorValue::operator++()
    {
        ++d_current;
        return *this;
    }
    template <typename Class, typename BaseIterator, typename value_type>
    PtrIteratorValue PtrIteratorValue::operator++(int)
    {
        return RandomPtrIterator(d_current++);
    }
Remaining members can be implemented accordingly, their actual implementations are left as exercises to the reader (or can be obtained from the cplusplus.yo.zip archive, of course).

Re-implementing the class StringPtr developed in section 22.14.1 is not difficult either. Apart from including the header file defining the class template RandomPtrIterator, it only requires a single modification. Its iterator using-declaration must now be associated with a RandomPtrIterator. Here is the full class interface and the class's inline member definitions:

    #ifndef INCLUDED_STRINGPTR_H_
    #define INCLUDED_STRINGPTR_H_

    #include <vector>
    #include <string>
    #include "iterator.h"

    class StringPtr: public std::vector<std::string *>
    {
        public:
            using iterator =
                    RandomPtrIterator
                    <
                        StringPtr,
                        std::vector<std::string *>::iterator,
                        std::string
                    >;

            using reverse_iterator = std::reverse_iterator<iterator>;

            iterator begin();
            iterator end();
            reverse_iterator rbegin();
            reverse_iterator rend();
    };

    inline StringPtr::iterator StringPtr::begin()
    {
        return iterator(this->std::vector<std::string *>::begin() );
    }
    inline StringPtr::iterator StringPtr::end()
    {
        return iterator(this->std::vector<std::string *>::end());
    }
    inline StringPtr::reverse_iterator StringPtr::rbegin()
    {
        return reverse_iterator(end());
    }
    inline StringPtr::reverse_iterator StringPtr::rend()
    {
        return reverse_iterator(begin());
    }
    #endif

Including StringPtr's modified header file into the program given in section 22.14.2 results in a program behaving identically to its earlier version. In this case StringPtr::begin and StringPtr::end return iterator objects constructed from a template definition.

25.6: Using `bisonc++' and `flexc++'

The example discussed below digs into the peculiarities of using parser- and scanner generators generating C++ sources. Once the input for a program exceeds a certain level of complexity, it becomes attractive to use scanner- and parser-generators generating the code which does the actual input recognition.

The examples in this and subsequent sections assume that the reader knows how to use the scanner generator flex and the parser generator bison. Both bison and flex are well documented elsewhere. The original predecessors of bison and flex, called yacc and lex are described in several books, e.g. in O'Reilly's book `lex & yacc'.

Scanner- and parser generators are also available as free software. Both bison and flex are usually part of software distributions or they can be obtained from ftp://prep.ai.mit.edu/pub/non-gnu. Flex creates a C++ class when %option c++ is specified.

For parser generators the program bison is available. In the early 90's Alain Coetmeur (coetmeur@icdc.fr) created a C++ variant (bison++) creating a parser class. Although the bison++ program produces code that can be used in C++ programs it also shows many characteristics that are more suggestive of a C context than a C++ context. In January 2005 I rewrote parts of Alain's bison++ program, resulting in the original version of the program bisonc++. Then, in May 2005 a complete rewrite of the bisonc++ parser generator was completed (version number 0.98). Current versions of bisonc++ can be downloaded from https://fbb-git.gitlab.io/bisoncpp/. Binary versions for various architectures are available as, e.g., Debian package (including bisonc++'s documentation).

Bisonc++ creates a cleaner parser class than bison++. In particular, it derives the parser class from a base-class, containing the parser's token- and type-definitions as well as all member functions which should not be (re)defined by the programmer. As a result of this approach, the generated parser class is very small, declaring only members that are actually defined by the programmer (as well as some other members, generated by bisonc++ itself, implementing the parser's parse() member). One member that is not implemented by default is lex, producing the next lexical token. When the directive %scanner (see section 25.6.2.1) is used, bisonc++ produces a standard implementation for this member; otherwise it must be implemented by the programmer.

In early 2012 the program flexc++ http://flexcpp.org/ reached its initial release. Like bisonc++ it is part of the Debian linux distribution.

Jean-Paul van Oosten (j.p.van.oosten@rug.nl) and Richard Berendsen (richardberendsen@xs4all.nl) started the flexc++ project in 2008 and the final program was completed by Jean-Paul and me between 2010 and 2012.

These sections of the C++ Annotations focus on bisonc++ as our parser generator and flexc++ as our lexical scanner generator. Previous releases of the C++ Annotations were using flex as the scanner generator.

Using flex++ and bisonc++ class-based scanners and parsers are generated. The advantage of this approach is that the interface to the scanner and the parser tends to become cleaner than without using class interfaces. Furthermore, classes allow us to get rid of most if not all global variables, making it easy to use multiple parsers in one program.

Below two example programs are developed. The first example only uses flexc++. The generated scanner monitors the production of a file from several parts. That example focuses on the lexical scanner and on switching files while churning through the information. The second example uses both flexc++ and bisonc++ to generate a scanner and a parser transforming standard arithmetic expressions to their postfix notations, commonly used in code generated by compilers and in HP-calculators. In the second example the emphasis is mainly on bisonc++ and on composing a scanner object inside a generated parser.

25.6.1: Using `flexc++' to create a scanner

The lexical scanner developed in this section is used to monitor the production of a file from several subfiles. The setup is as follows: the input-language defines #include directives, followed by a text string specifying the file (path) which should be included at the location of the #include.

In order to avoid complexities irrelevant to the current example, the format of the #include statement is restricted to the form #include <filepath>. The file specified between the angle brackets should be available at the location indicated by filepath. If the file is not available, the program terminates after issuing an error message.

The program is started with one or two filename arguments. If the program is started with just one filename argument, the output is written to the standard output stream cout. Otherwise, the output is written to the stream whose name is given as the program's second argument.

The program defines a maximum nesting depth. Once this maximum is exceeded, the program terminates after issuing an error message. In that case, the filename stack indicating where which file was included is printed.

An additional feature of the program is that (standard C++) comment-lines are ignored. Include-directives in comment-lines are also ignored.

The program is created in five major steps:

25.6.1.1: The derived class `Scanner'

The function matching the regular expression rules (lex) is a member of the class Scanner. Since Scanner is derived from ScannerBase, it has access to all of ScannerBase's protected members that execute the lexical scanner's regular expression matching algorithm.

Looking at the regular expressions themselves, notice that we need rules to recognize comment, #include directives, and all remaining characters. This all is fairly standard practice. When an #include directive is sensed, the directive is parsed by the scanner. This too is common practice. Our lexical scanner performs the following tasks:

25.6.1.2: The lexical scanner specification file

The lexical scanner specification file is organized comparably to the one used for flex in C contexts. However, in C++ contexts, flexc++ creates a class Scanner, rather than just a scanner function.

Flexc++'s specification file consists of two sections:

25.6.1.3: Implementing `Scanner'

The class Scanner is generated once by flexc++. This class has access to several members defined by its base class ScannerBase. Some of these members have public access rights and can be used by code external to the class Scanner. These members are extensively documented in the flexc++(1) man-page, and the reader is referred to this man-page for further information.

Our scanner performs the following tasks:

The #include statements in the input allow the scanner to distill the name of the file where the scanning process must continue. This file name is stored in a local variable d_nextSource and a member stackSource handles the switch to the next source. Nothing else is required. Pushing and popping input files is handled by the scanner's members pushStream and popStream, provided by flexc++. Scanner's interface, therefore, only needs one additional function declaration: switchSource.

Switching streams is handled as follows: once the scanner has extracted a filename from an #include directive, a switch to another file is realized by switchSource. This member calls pushStream, defined by flexc++, to stack the current input stream and to switch to the stream whose name is stored in d_nextSource. This also ends the include mini-scanner, so to return the scanner to its default scanning mode begin(StartCondition__::INITIAL) is called. Here is its source:

#include "scanner.ih"

void Scanner::switchSource()
{
    pushStream(d_nextSource);
    begin(StartCondition__::INITIAL);
}
The member pushStream, defined by flexc++, handles all necessary checks, throwing an exception if the file could not be opened or if too many files are stacked.

The member performing the lexical scan is defined by flexc++ in Scanner::lex, and this member can be called by code to process the tokens returned by the scanner.

25.6.1.4: Using a `Scanner' object

The program using our Scanner is very simple. It expects a filename indicating where to start the scanning process.

The program first checks the number of arguments. If at least one argument was given, then that argument is passed to Scanner's constructor, together with a second argument "-", indicating that the output should go to the standard output stream.

If the program receives more than one argument debug output, extensively documenting the lexical scanner's actions, is written to the standard output stream as well.

Next the Scanner's lex member is called. If anything fails, a std::exception is thrown, which is caught by main's try-block's catch clause. Here is the program's source:

#include "lexer.ih"

int main(int argc, char **argv)
try
{
    if (argc == 1)
    {
        cerr << "Filename argument required\n";
        return 1;
    }

    Scanner scanner(argv[1], "-");

    scanner.setDebug(argc > 2);

    return scanner.lex();
}
catch (exception const &exc)
{
    cerr << exc.what() << '\n';
    return 1;
}

25.6.1.5: Building the program

The final program is constructed in two steps. These steps are given for a Unix system, on which flexc++ and the GNU C++ compiler g++ have been installed: Flexc++ can be downloaded from https://fbb-git.gitlab.io/flexcpp/, and requires the bobcat library, which can be downloaded from http://fbb-git.gitlab.io/bobcat/.

25.6.2: Using `bisonc++' and `flexc++'

Once an input language exceeds a certain level of complexity, a parser is often used to control the complexity of the language. In this case, a parser generator can be used to generate the code verifying the input's grammatical correctness. The lexical scanner (preferably composed into the parser) provides chunks of the input, called tokens. The parser then processes the series of tokens generated by the lexical scanner.

Starting point when developing programs that use both parsers and scanners is the grammar. The grammar defines a set of tokens that can be returned by the lexical scanner (called the scanner below).

Finally, auxiliary code is provided to `fill in the blanks': the actions performed by the parser and by the scanner are not normally specified literally in the grammar rules or lexical regular expressions, but should be implemented in member functions, called from the parser's rules or which are associated with the scanner's regular expressions.

In the previous section we've seen an example of a C++ class generated by flexc++. In the current section we concentrate on the parser. The parser can be generated from a grammar specification file, processed by the program bisonc++. The grammar specification file required by bisonc++ is similar to the file processed by bison (or bison++, bisonc++'s predecessor, written in the early nineties by Alain Coetmeur).

In this section a program is developed converting infix expressions, where binary operators are written between their operands, to postfix expressions, where operators are written behind their operands. Also, the unary operator - is converted from its prefix notation to a postfix form. The unary + operator is ignored as it requires no further actions. In essence our little calculator is a micro compiler, transforming numeric expressions into assembly-like instructions.

Our calculator recognizes a rather basic set of operators: multiplication, addition, parentheses, and the unary minus. We'll distinguish real numbers from integers, to illustrate a subtlety in bison-like grammar specifications. That's all. The purpose of this section is, after all, to illustrate the construction of a C++ program that uses both a parser and a lexical scanner, rather than to construct a full-fledged calculator.

In the coming sections we'll develop the grammar specification for bisonc++. Then, the regular expressions for the scanner are specified. Following that, the final program is constructed.

25.6.2.1: The `bisonc++' specification file

The grammar specification file required by bisonc++ is comparable to the specification file required by bison. Differences are related to the class nature of the resulting parser. Our calculator distinguishes real numbers from integers, and supports a basic set of arithmetic operators.

Bisonc++ should be used as follows:

The bisonc++ specification file has two sections:

Readers familiar with bison may note that there is no header section anymore. Header sections are used by bison to provide for the necessary declarations allowing the compiler to compile the C function generated by bison. In C++ declarations are part of or already used by class definitions. Therefore, a parser generator generating a C++ class and some of its member functions does not require a header section anymore.

The declaration section

The declaration section contains several sets of declarations, among which definitions of all the tokens used in the grammar and the priorities and associativities of the mathematical operators. Moreover, several new and important specifications can be used here as well. Those relevant to our current example and only available in bisonc++ are discussed here. The reader is referred to bisonc++'s man-page for a full description. An example of a %union declaration is:
    %union
    {
        int     i;
        double  d;
    };

In pre-C++11 code a union cannot contain objects as its fields, as constructors cannot be called when a union is created. This means that a string cannot be a member of the union. A string *, however, is a possible union member. It might also be possible to use unrestricted unions (cf. section 9.9), having class type objects as fields.

As an aside: the scanner does not have to know about such a union. It can simply pass its scanned text to the parser through its matched member function. For example using a statement like

    $$.i = A2x(d_scanner.matched());

matched text is converted to a value of an appropriate type.

Tokens and non-terminals can be associated with union fields. This is strongly advised, as it prevents type mismatches, since the compiler may then check for type correctness. At the same time, the bison specific variables $$, $1, $2, etc. may be used, rather than the full field specification (like $$.i). A non-terminal or a token may be associated with a union field using the <fieldname> specification. E.g.,

    %token <i> INT          // token association (deprecated, see below)
           <d> DOUBLE
    %type  <i> intExpr      // non-terminal association

In the example developed here, both the tokens and the non-terminals can be associated with a union field. However, as noted before, the scanner does not have to know about all this. In our opinion, it is cleaner to let the scanner do just one thing: scan texts. The parser, knowing what the input is all about, may then convert strings like "123" to an integer value. Consequently, the association of a union field and a token is discouraged. Below, while describing the grammar's rules, this is further illustrated.

In the %union discussion the %token and %type specifications should be noted. They are used to specify the tokens (terminal symbols) that can be returned by the scanner, and to specify the return types of non-terminals. Apart from %token the token declarators %left, %right, and %nonassoc can be used to specify the associativity of operators. The tokens mentioned at these indicators are interpreted as tokens indicating operators, associating in the indicated direction. The precedence of operators is defined by their order: the first specification has the lowest priority. To overrule a certain precedence in a certain context %prec can be used. As all this is standard bisonc++ practice, it isn't further elaborated here. The documentation provided with bisonc++'s distribution should be consulted for further reference.

Here is the specification of the calculator's declaration section:

%filenames parser
%scanner ../scanner/scanner.h

%union {
    int i;
    double d;
};

%token  INT DOUBLE

%type   <i> intExpr
%type   <d> doubleExpr

%left   '+'
%left   '*'
%right  UnaryMinus
In the declaration section %type specifiers are used, associating the intExpr rule's value (see the next section) to the i-field of the semantic-value union, and associating doubleExpr's value to the d-field. This approach, admittedly, is rather complex, as expression rules must be included for each of the supported union types. Alternatives are definitely possible, and involve the use of polymorphic semantic values, covered in detail in the Bisonc++ user guide.

The grammar rules

The rules and actions of the grammar are specified as usual. The grammar for our little calculator is given below. There are quite a few rules, but they illustrate various features offered by bisonc++. In particular, note that no action block requires more than a single line of code. This keeps the grammar simple, and therefore enhances its readability and understandability. Even the rule defining the parser's proper termination (the empty line in the line rule) uses a single member function called done. The implementation of that function is simple, but it is worth while noting that it calls Parser::ACCEPT, showing that ACCEPT can be called indirectly from a production rule's action block. Here are the grammar's production rules:
    lines:
        lines
        line
    |
        line
    ;

    line:
        intExpr
        '\n'
        {
            display($1);
        }
    |
        doubleExpr
        '\n'
        {
            display($1);
        }
    |
        '\n'
        {
            done();
        }
    |
        error
        '\n'
        {
            reset();
        }
    ;

    intExpr:
        intExpr '*' intExpr
        {
            $$ = exec('*', $1, $3);
        }
    |
        intExpr '+' intExpr
        {
            $$ = exec('+', $1, $3);
        }
    |
        '(' intExpr ')'
        {

            $$ = $2;
        }
    |
        '-' intExpr         %prec UnaryMinus
        {
            $$ = neg($2);
        }
    |
        INT
        {
            $$ = convert<int>();
        }
    ;

    doubleExpr:
        doubleExpr '*' doubleExpr
        {
            $$ = exec('*', $1, $3);
        }
    |
        doubleExpr '*' intExpr
        {
            $$ = exec('*', $1, d($3));
        }
    |
        intExpr '*' doubleExpr
        {
            $$ = exec('*', d($1), $3);
        }
    |
        doubleExpr '+' doubleExpr
        {
            $$ = exec('+', $1, $3);
        }
    |
        doubleExpr '+' intExpr
        {
            $$ = exec('+', $1, d($3));
        }
    |
        intExpr '+' doubleExpr
        {
            $$ = exec('+', d($1), $3);
        }
    |
        '(' doubleExpr ')'
        {
            $$ = $2;
        }
    |
        '-' doubleExpr         %prec UnaryMinus
        {

            $$ = neg($2);
        }
    |
        DOUBLE
        {
            $$ = convert<double>();
        }
    ;
This grammar is used to implement a simple calculator in which integer and real values can be negated, added, and multiplied and in which standard priority rules can be overruled by parentheses. The grammar shows the use of typed nonterminal symbols: doubleExpr is linked to real (double) values, intExpr is linked to integer values. Precedence and type association is defined in the parser's definition section.

The Parser's header file

Several class members called from the grammar are defined as member templates. Bisonc++ generates multiple files, among which the file defining the parser's class. Functions called from the production rule's action blocks are usually member functions of the parser. These member functions must be declared and defined. Once bisonc++ has generated the header file defining the parser's class, that header file isn't automatically rewritten, allowing the programmer to add new members to the parser class whenever required. Here is `parser.h' as used in our little calculator:
#ifndef Parser_h_included
#define Parser_h_included

#include <iostream>
#include <sstream>
#include <bobcat/a2x>

#include "parserbase.h"
#include "../scanner/scanner.h"

#undef Parser
class Parser: public ParserBase
{
    std::ostringstream d_rpn;
    // $insert scannerobject
    Scanner d_scanner;

    public:
        int parse();

    private:
        template <typename Type>
        Type exec(char c, Type left, Type right);

        template <typename Type>
        Type neg(Type op);

        template <typename Type>
        Type convert();

        void display(int x);
        void display(double x);
        void done() const;
        void reset();
        void error(char const *msg);
        int lex();
        void print();

        static double d(int i);

    // support functions for parse():

        void executeAction(int d_ruleNr);
        void errorRecovery();
        int lookup(bool recovery);
        void nextToken();
        void print__();
};

inline double Parser::d(int i)
{
    return i;
}

template <typename Type>
Type Parser::exec(char c, Type left, Type right)
{
    d_rpn << " " << c << " ";
    return c == '*' ? left * right : left + right;
}

template <typename Type>
Type Parser::neg(Type op)
{
    d_rpn << " n ";
    return -op;
}

template <typename Type>
Type Parser::convert()
{
    Type ret = FBB::A2x(d_scanner.matched());
    d_rpn << " " << ret << " ";
    return ret;
}

inline void Parser::error(char const *msg)
{
    std::cerr << msg << '\n';
}

inline int Parser::lex()
{
    return d_scanner.lex();
}

inline void Parser::print()
{}

#endif

25.6.2.2: The `flexc++' specification file

The flex-specification file used by the calculator is simple: blanks are ignored, single characters are returned, and numeric values are returned as either Parser::INT or Parser::DOUBLE tokens.

The flexc++ directive %interactive is provided since the calculator is a program actively interacting with its human user.

Here is the complete flexc++ specification file:

%interactive
%filenames scanner

%%

[ \t]                       // ignored

[0-9]+                      return Parser::INT;

"."[0-9]*                   |
[0-9]+("."[0-9]*)?          return Parser::DOUBLE;

.|\n                        return matched()[0];

25.6.2.3: Building the program

The calculator is built using bisonc++ and flexc++. Here is the implementation of the calculator's main function:
#include "parser/parser.h"

using namespace std;

int main()
{
    Parser parser;

    cout << "Enter (nested) expressions containing ints, doubles, *, + and "
            "unary -\n"
            "operators. Enter an empty line to stop.\n";

    return parser.parse();
}

The parser's files parse.cc and parserbase.h are generated by the command:

    bisonc++ grammar

The file parser.h is created only once, to allow the developer to add members to the Parser class occe the need for them arises.

The program flexc++ is used to create a lexical scanner:

    flexc++ lexer

On Unix systems a command like

    g++ -Wall -o calc *.cc -lbobcat -s

can be used to compile and link the source of the main program and the sources produced by the scanner and parser generators. The example uses the A2x class, which is part of the bobcat library (cf. section 25.6.1.5) (the bobcat library is available on systems offering either bisonc++ or flexc++). Bisonc++ can be downloaded from

http://fbb-git.gitlab.io/bisoncpp/.