Chapter 7: Classes

The C programming language offers two methods for structuring data of different types. The C struct holds data members of various types, and the C union also defines data members of various types. However, a union's data members all occupy the same location in memory and the programmer may decide on which one to use.

In this chapter classes are introduced. A class is a kind of struct, but its content is by default inaccessible to the outside world, whereas the content of a C++ struct is by default accessible to the outside world. In C++ structs find little use: they are mainly used to aggregate data within the context of classes or to define elaborate return values. Often a C++ struct merely contains plain old data (POD, cf. section 9.10). In C++ the class is the main data structuring device, by default enforcing two core concepts of current-day software engineering: data hiding and encapsulation (cf. sections 3.2.1 and 7.1.1).

The union is another data structuring device the language offers. The traditional C union is still available, but C++ also offers unrestricted unions. Unrestricted unions are unions whose data fields may be of class types. The C++ Annotations covers these unrestricted unions in section 9.9, after having introduced several other new concepts of C++,

C++ extends the C struct and union concepts by allowing the definition of member functions (introduced in this chapter) within these data types. Member functions are functions that can only be used with objects of these data types or within the scope of these data types. Some of these member functions are special in that they are always, usually automatically, called when an object starts its life (the so-called constructor) or ends its life (the so-called destructor). These and other types of member functions, as well as the design and construction of, and philosophy behind, classes are introduced in this chapter.

We step-by-step construct a class Person, which could be used in a database application to store a person's name, address and phone number.

Let's start by creating a class Person right away. From the onset, it is important to make the distinction between the class interface and its implementation. A class may loosely be defined as `a set of data and all the functions operating on those data'. This definition is later refined but for now it is sufficient to get us started.

A class interface is a definition, defining the organization of objects of that class. Normally a definition results in memory reservation. E.g., when defining int variable the compiler ensures that some memory is reserved in the final program storing variable's values. Although it is a definition no memory is set aside by the compiler once it has processed the class definition. But a class definition follows the one definition rule: in C++ entities may be defined only once. As a class definition does not imply that memory is being reserved the term class interface is preferred instead.

Class interfaces are normally contained in a class header file, e.g., person.h. We'll start our class Person interface here (cf section 7.7 for an explanation of the const keywords behind some of the class's member functions):

    #include <string>

    class Person
    {
        std::string d_name;         // name of person
        std::string d_address;      // address field
        std::string d_phone;        // telephone number
        size_t      d_mass;         // the mass in kg.

        public:                     // member functions
            void setName(std::string const &name);
            void setAddress(std::string const &address);
            void setPhone(std::string const &phone);
            void setMass(size_t mass);

            std::string const &name()    const;
            std::string const &address() const;
            std::string const &phone()   const;
            size_t mass()                const;
    };
The member functions that are declared in the interface must still be implemented. The implementation of these members is properly called their definition.

In addition to member functions classes also commonly define the data that are manipulated by those member functions. These data are called the data members. In Person they are d_name, d_address, d_phone and d_mass. Data members should be given private access rights. Since the class uses private access rights by default they are usually simply listed at the top of the class interface.

All communication between the outer world and the class data is routed through the class's member functions. Data members may receive new values (e.g., using setName) or they may be retrieved for inspection (e.g., using name). Functions merely returning values stored inside the object, not allowing the caller to modify these internally stored values, are called accessors.

Syntactically there is only a marginal difference between a class and a struct. Classes by default define private members, structs define public members. Conceptually, though, there are differences. In C++ structs are used in the way they are used in C: to aggregate data, which are all freely accessible. Classes, on the other hand, hide their data from access by the outside world (which is aptly called data hiding) and offer member functions to define the communication between the outer world and the class's data members.

Following Lakos (Lakos, J., 2001) Large-Scale C++ Software Design (Addison-Wesley) I suggest the following setup of class interfaces:

Style conventions usually take a long time to develop. There is nothing obligatory about them, though. I suggest that readers who have compelling reasons not to follow the above style conventions use their own. All others are strongly advised to adopt the above style conventions.

Finally, referring back to section 3.1.2 that

    using namespace std;

must be used in most (if not all) examples of source code. As explained in sections 7.11 and 7.11.1 the using directive should follow the preprocessor directive(s) including the header files, using a setup like the following:

    #include <iostream>
    #include "person.h"

    using namespace std;

    int main()
    {
        ...
    }

7.1: The constructor

C++ classes usually contain two special categories of member functions which are essential to the proper working of classes. These categories are the constructors and the destructor. The destructor's primary task is to return memory allocated by an object to the common pool when an object goes `out of scope'. Allocation of memory is discussed in chapter 9, and an in-depth coverage of destructors is therefore postponed until we reach that chapter. In the current chapter the emphasis is on the class's internal organization and on its constructors.

Constructors are recognized by their names which are equal to their class names. Constructors do not specify return values, not even void. E.g., the class Person may define a constructor Person::Person(). The C++ run-time system ensures that the constructor of a class is called when a variable of the class is defined. It is possible to define a class lacking any constructor. In that case the compiler defines a default constructor that is called when an object of that class is defined. What actually happens in that case depends on the data members that are defined by that class (cf. section 7.3.1).

Objects may be defined locally or globally. However, in C++ most objects are defined locally. Globally defined objects are hardly ever required and are somewhat deprecated.

When a function defines a local object, that object's constructor is called every time the function is called. The object's constructor is activated at the point where the object is defined (a subtlety is that an object may be defined implicitly as, e.g., a temporary variable in an expression).

When an object is defined as a static object it is constructed when the program starts. In this case its constructor is called even before the function main starts. Example:

    #include <iostream>
    using namespace std;

    class Demo
    {
        public:
            Demo();
    };

    Demo::Demo()
    {
        cout << "Demo constructor called\n";
    }

    Demo d;

    int main()
    {}

    /*
        Generated output:
    Demo constructor called
    */
The program contains one global object of the class Demo with main having an empty body. Nonetheless, the program produces some output generated by the constructor of the globally defined Demo object.

Constructors have a very important and well-defined role. They must ensure that all the class's data members have sensible or at least well-defined values once the object has been constructed. We'll get back to this important task shortly. The default constructor has no argument. It is defined by the compiler unless another constructor is defined and unless its definition is suppressed (cf. section 7.6). If a default constructor is required in addition to another constructor then the default constructor must explicitly be defined as well. C++ provides special syntax to realize that without much effort, which is also covered by section 7.6.

7.1.1: A first application

Our example class Person has three string data members and a size_t d_mass data member. Access to these data members is controlled by interface functions.

Whenever an object is defined the class's constructor(s) ensure that its data members are given `sensible' values. Thus, objects never suffer from uninitialized values. Data members may be given new values, but that should never be directly allowed. It is a core principle (called data hiding) of good class design that its data members are private. The modification of data members is therefore fully controlled by member functions and thus, indirectly, by the class-designer. The class encapsulates all actions performed on its data members and due to this encapsulation the class object may assume the `responsibility' for its own data-integrity. Here is a minimal definition of Person's manipulating members:

    #include "person.h"                 // given earlier
    using namespace std;

    void Person::setName(string const &name)
    {
        d_name = name;
    }
    void Person::setAddress(string const &address)
    {
        d_address = address;
    }
    void Person::setPhone(string const &phone)
    {
        d_phone = phone;
    }
    void Person::setMass(size_t mass)
    {
        d_mass = mass;
    }
It's a minimal definition in that no checks are performed. But it should be clear that checks are easy to implement. E.g., to ensure that a phone number only contains digits one could define:
    void Person::setPhone(string const &phone)
    {
        if (phone.empty())
            d_phone = " - not available -";
        else if (phone.find_first_not_of("0123456789") == string::npos)
            d_phone = phone;
        else
            cout << "A phone number may only contain digits\n";
    }

Note the double negation in this implementation. Double negations are very hard to read, and an encapsulating member bool hasOnly handles the test, and improves setPhone's readability:

    bool Person::hasOnly(char const *characters, string const &object)
    {
                        // object only contains 'characters'
        return object.find_first_not_of(characters) == string::npos;
    }

and setPhone becomes:

    void Person::setPhone(string const &phone)
    {
        if (phone.empty())
            d_phone = " - not available -";
        else if (hasOnly("0123456789", phone))
            d_phone = phone;
        else
            cout << "A phone number may only contain digits\n";
    }

Since hasOnly is an encapsulated member function we can ensure that it's only used with non-empty string objects, so hasOnly itself doesn't have to check for that.

Access to the data members is controlled by accessor members. Accessors ensure that data members cannot suffer from uncontrolled modifications. Since accessors conceptually do not modify the object's data (but only retrieve the data) these member functions are given the predicate const. They are called const member functions, which, as they are guaranteed not to modify their object's data, are available to both modifiable and constant objects (cf. section 7.7).

To prevent backdoors we must also make sure that the data member is not modifiable through an accessor's return value. For values of built-in primitive types that's easy, as they are usually returned by value, which are copies of the values found in variables. But since objects may be fairly large making copies is usually prevented by returning objects by reference. A backdoor is created by returning a data member by reference, as in the following example, showing the allowed abuse below the function definition:

    string &Person::name() const
    {
        return d_name;
    }

    Person somebody;
    somebody.setName("Nemo");

    somebody.name() = "Eve";    // Oops, backdoor changing the name

To prevent the backdoor objects are returned as const references from accessors. Here are the implementations of Person's accessors:

    #include "person.h"                 // given earlier
    using namespace std;

    string const &Person::name() const
    {
        return d_name;
    }
    string const &Person::address() const
    {
       return d_address;
    }
    string const &Person::phone() const
    {
       return d_phone;
    }
    size_t Person::mass() const
    {
       return d_mass;
    }

The Person class interface remains the starting point for the class design: its member functions define what can be asked of a Person object. In the end the implementation of its members merely is a technicality allowing Person objects to do their jobs.

The next example shows how the class Person may be used. An object is initialized and passed to a function printperson(), printing the person's data. Note the reference operator in the parameter list of the function printperson. Only a reference to an existing Person object is passed to the function, rather than a complete object. The fact that printperson does not modify its argument is evident from the fact that the parameter is declared const.

    #include <iostream>
    #include "person.h"                 // given earlier
    using namespace std;

    void printperson(Person const &p)
    {
        cout << "Name    : " << p.name()     << "\n"
                "Address : " << p.address()  << "\n"
                "Phone   : " << p.phone()    << "\n"
                "Mass  : " << p.mass()   << '\n';
    }

    int main()
    {
        Person p;

        p.setName("Linus Torvalds");
        p.setAddress("E-mail: Torvalds@cs.helsinki.fi");
        p.setPhone("");
        p.setMass(75);           // kg.

        printperson(p);
    }
/*
    Produced output:

Name    : Linus Torvalds
Address : E-mail: Torvalds@cs.helsinki.fi
Phone   :  - not available -
Mass  : 75

*/

7.1.2: Constructors: with and without arguments

The class Person's constructor so far has not received any parameters. C++ allows constructors to be defined with or without parameter lists. The arguments are supplied when an object is defined.

For the class Person a constructor expecting three strings and a size_t might be useful. Representing, respectively, the person's name, address, phone number and mass. This constructor can be implemented like this (but see also section 7.3.1):

    Person::Person(string const &name, string const &address,
                   string const &phone, size_t mass)
    {
        d_name = name;
        d_address = address;
        setPhone(phone);
        d_mass = mass;
    }

It must of course also be declared in the class interface:

    class Person
    {
        // data members (not altered)

        public:
            Person(std::string const &name, std::string const &address,
                   std::string const &phone, size_t mass);

            // rest of the class interface (not altered)
    };

Now that this constructor has been declared, the default constructor must explicitly be declared as well if we still want to be able to construct a plain Person object without any specific initial values for its data members. The class Person would thus support two constructors, and the part declaring the constructors now becomes:

    class Person
    {
        // data members
        public:
            Person();
            Person(std::string const &name, std::string const &address,
                   std::string const &phone, size_t mass);

            // additional members
    };

In this case, the default constructor doesn't have to do very much, as it doesn't have to initialize the string data members of the Person object. As these data members are objects themselves, they are initialized to empty strings by their own default constructor. However, there is also a size_t data member. That member is a variable of a built-in type and such variabes do not have constructors and so are not initialized automatically. Therefore, unless the value of the d_mass data member is explicitly initialized its value is:

The 0-value might not be too bad, but normally we don't want a random value for our data members. So, even the default constructor has a job to do: initializing the data members which are not initialized to sensible values automatically. Its implementation can be:
    Person::Person()
    {
        d_mass = 0;
    }

Using constructors with and without arguments is illustrated next. The object karel is initialized by the constructor defining a non-empty parameter list while the default constructor is used for the anon object. When constructing objects using constructors requiring arguments you are advised to surround the arguments by curly braces. Parentheses can often also be used, and sometimes even have to be used (cf. section 12.4.2), but mindlessly using parentheses instead of curly braces may easily result in unexpected problems (cf. section 7.2). Hence the advice to prefer curly braces rather than parentheses. Here's the example showing two constructor-calls:

    int main()
    {
        Person karel{ "Karel", "Rietveldlaan 37", "542 6044", 70 };
        Person anon;
    }

The two Person objects are defined when main starts as they are local objects, living only for as long as main is active.

If Person objects must be definable using other arguments, corresponding constructors must be added to Person's interface. Apart from overloading class constructors it is also possible to provide constructors with default argument values. These default arguments must be specified with the constructor declarations in the class interface, like so:

    class Person
    {
        public:
            Person(std::string const &name,
                   std::string const &address = "--unknown--",
                   std::string const &phone   = "--unknown--",
                   size_t mass = 0);

    };

Often, constructors use highly similar implementions. This results from the fact that the constructor's parameters are often defined for convenience: a constructor not requiring a phone number but requiring a mass cannot be defined using default arguments, since phone is not the constructor's last parameter. Consequently a special constructor is required not having phone in its parameter list. However, this doesn't necessarily mean that constructors must duplicate their code, as constructors may call each other (called constructor delegation). Constructor delegation is illustrated in section 7.4.1 below.

7.1.2.1: The order of construction

The possibility to pass arguments to constructors allows us to monitor the construction order of objects during program execution. This is illustrated by the next program using a class Test. The program defines a global Test object and two local Test objects. The order of construction is as expected: first global, then main's first local object, then func's local object, and then, finally, main's second local object:
    #include <iostream>
    #include <string>
    using namespace std;

    class Test
    {
        public:
            Test(string const &name);   // constructor with an argument
    };

    Test::Test(string const &name)
    {
        cout << "Test object " << name << " created" << '\n';
    }

    Test globaltest("global");

    void func()
    {
        Test functest("func");
    }

    int main()
    {
        Test first{ "main first" };
        func();
        Test second{ "main second" };
    }
/*
    Generated output:
Test object global created
Test object main first created
Test object func created
Test object main second created
*/

7.2: Ambiguity resolution

Calling constructors using parentheses may result in unexpected surprises. Assume the following class interface is available:
    class Data
    {
        public:
            Data();
            Data(int one);
            Data(int one, int two);

            void display();
    };

The intention is to define two objects of the class Data, using, respectively, the first and second constructors, while using parentheses in the object definitions. Your code looks like this (and compiles correctly):

    #include "data.h"
    int main()
    {
        Data d1();
        Data d2(argc);
    }

Now it's time to make some good use of the Data objects. Let's add two statements to main:

        d1.display();
        d2.display();

But, surprise, the compiler complains about the first of these two:

error: request for member 'display' in 'd1', which is of non-class type 'Data()'

What's going on here? First of all, notice the data type the compiler refers to: Data(), rather than Data. What are those () doing there?

Before answering that question, let's broaden our story somewhat. We know that somewhere in a library a factory function dataFactory exists. A factory function creates and returns an object of a certain type. This dataFactory function returns a Data object, constructed using Data's default constructor. Hence, dataFactory needs no arguments. We want to use dataFactory in our program, but must declare the function. So we add the declaration to main, as that's the only location where dataFactory will be used. It's a function, not requiring arguments, returning a Data object:

        Data dataFactory();

This, however, looks remarkably similar to our d1 object definition:

        Data d1();

We found the source of our problem: Data d1() apparently is not the definition of a d1 object, but the declaration of a function, returning a Data object. So, what's happening here and how should we define a Data object using Data's default constructor?

First: what's happening here is that the compiler, when confronted with Data d1(), actually had a choice. It could either define a Data object, or declare a function. It declares a function.

In fact, we're encountering an ambiguity in C++'s grammar here, which is solved, according to the language's standard, by always letting a declaration prevail over a definition. We'll encounter more situations where this ambiguity occurs later on in this section.

Second: there are several ways we can solve this ambiguity the way we want it to be solved. To define an object using its default constructor:

7.2.1: Types `Data' vs. `Data()'

Data() in the above context defines a default constructed anonymous Data object. This takes us back to the compiler error. According to the compiler, our original d1 apparently was not of type Data, but of type Data(). So what's that?

Let's first have a look at our second constructor. It expects an int. We would like to define another Data object, using the second constructor, but want to pass the default int value to the constructor, using int(). We know this defines a default int value, as cout << int() << '\n' nicely displays 0, and int x = int() also initialized x to 0. So we define `Data di(int())' in main.

Not good: again the compiler complains when we try to use di. After `di.display()' the compiler tells us:

error: request for member 'display' in 'di', which is of non-class type 'Data(int (*)())'

Oops, again not as expected.... Didn't we pass 0? Why the sudden pointer? It's that same `use a declaration when possible' strategy again. The notation Type() not only represents the default value of type Type, but it's also a shorthand notation for an anonymous pointer to a function, not expecting arguments, and returning a Type value, which you can verify by defining `int (*ip)() = nullptr', and passing ip as argument to di: di(ip) compiles fine.

So why doesn't the error occur when inserting int() or assigning int() to int x? In these latter cases nothing is declared. Rather, `cout' and `int x =' require expressions determining values, which is provided by int()'s `natural' interpretation. But with `Data di(int())' the compiler again has a choice, and (by design) it chooses a declaration because the declaration takes priority. Now int()'s interpretation as an anonymous pointer is available and therefore used.

Likewise, if int x has been defined, `Data b1(int(x))' declares b1 as a function, expecting an int (as int(x) represents a type), while `Data b2((int)x)' defines b2 as a Data object, using the constructor expecting a single int value.

Again, to use default entities, values or objects, prefer {} over (): Data di{ int{} } defines di of type Data, calling the Data(int x) constructor and uses int's default value 0.

7.2.2: Superfluous parentheses

Let's play some more. At some point in our program we defined int b. Then, in a compound statement we need to construct an anonymous Data object, initialized using b, followed by displaying b:
    int b = 18;
    {
        Data(b);
        cout << b;
    }

About that cout statement the compiler tells us (I modified the error message to reveal its meaning):

error: cannot bind `std::ostream & << Data const &'

Here we didn't insert int b but Data b. Had we omitted the compound statement, the compiler would have complained about a doubly defined b entity, as Data(b) simply means Data b, a Data object constructed by default. The compiler may omit superfluous parentheses when parsing a definition or declaration.

Of course, the question now becomes how a temporary object Data, initialized with int b can be defined. Remember that the compiler may remove superfluous parentheses. So, what we need to do is to pass an int to the anonymous Data object, without using the int's name.

Values and types make big differences. Consider the following definitions:

    Data (*d4)(int);    // 1
    Data (*d5)(3);      // 2

Definition 1 should cause no problems: it's a pointer to a function, expecting an int, returning a Data object. Hence, d4 is a pointer variable.

Definition 2 is slightly more complex. Yes, it's a pointer. But it has nothing to do with a function. So what's that argument list containing 3 doing there? Well, it's not an argument list. It's an initialization that looks like an argument list. Remember that variables can be initialized using the assignment statement, by parentheses or by curly parentheses. So instead of `(3)' we could have written `= 3' or `{3}'. Let's pick the first alternative, resulting in:

    Data (*d5) = 3;

Now we get to `play compiler' again. Removing some superfluous parentheses we get:

    Data *d5 = 3;

It's a pointer to a Data object, initialized to 3. This is semantically incorrect, but that's only clear after the syntactical analysis. If I had initially written

     Data (*d5)(&d1);      // 2

the fun resulting from contrasting int and 3 would most likely have been spoiled.

7.2.3: Existing types

Once a type name has been defined it also prevails over identifiers representing variables, if the compiler is given a choice. This, too, can result in interesting constructions.

Assume a function process expecting an int exists in a library. We want to use this function to process some int data values. So in main process is declared and called:

    int process(int Data);
    process(argc);

No problems here. But unfortunately we once decided to `beautify' our code, by throwing in some superfluous parentheses, like so:

    int process(int (Data));
    process(argc);

Now we're in trouble. The compiler now generates an error, caused by its rule to let declarations prevail over definitions. Data now becomes the name of the class Data, and analogous to int (x) the parameter int (Data) is parsed as int (*)(Data): a pointer to a function, expecting a Data object, returning an int.

Here is another example. When, instead of declaring

    int process(int Data[10]);

we declare, e.g., to emphasize the fact that an array is passed to process:

    int process(int (Data[10]));

the process function does not expect a pointer to int values, but a pointer to a function expecting a pointer to Data elements, returning an int.

To summarize the findings in the `Ambiguity Resolution' section:

7.3: Objects inside objects: composition

In the class Person objects are used as data members. This construction technique is called composition.

Composition is neither extraordinary nor C++ specific: in C a struct or union field is commonly used in other compound types. In C++ it requires some special thought as their initialization sometimes is subject to restrictions, as discussed in the next few sections.

7.3.1: Composition and (const) objects: (const) member initializers

Unless specified otherwise object data members of classes are initialized by their default constructors. Using the default constructor might not always be the optimal way to intialize an object and it might not even be possible: a class might simply not define a default constructor.

Earlier we've encountered the following constructor of the Person:

    Person::Person(string const &name, string const &address,
                   string const &phone, size_t mass)
    {
        d_name = name;
        d_address = address;
        d_phone = phone;
        d_mass = mass;
    }

Think briefly about what is going on in this constructor. In the constructor's body we encounter assignments to string objects. Since assignments are used in the constructor's body their left-hand side objects must exist. But when objects are coming into existence constructors must have been called. The initialization of those objects is thereupon immediately undone by the body of Person's constructor. That is not only inefficient but sometimes downright impossible. Assume that the class interface mentions a string const data member: a data member whose value is not supposed to change at all (like a birthday, which usually doesn't change very much and is therefore a good candidate for a string const data member). Constructing a birthday object and providing it with an initial value is OK, but changing the initial value isn't.

The body of a constructor allows assignments to data members. The initialization of data members happens before that. C++ defines the member initializer syntax allowing us to specify the way data members are initialized at construction time. Member initializers are specified as a list of constructor specifications between a colon following a constructor's parameter list and the opening curly brace of a constructor's body, as follows:

    Person::Person(string const &name, string const &address,
                   string const &phone, size_t mass)
    :
        d_name(name),
        d_address(address),
        d_phone(phone),
        d_mass(mass)
    {}

In this example the member initialization used parentheses surrounding the intialization expression. Instead of parentheses curly braces may also be used. E.g., d_name could also be initialized this way:

        d_name{ name },

Member initialization always occurs when objects are composed in classes: if no constructors are mentioned in the member initializer list the default constructors of the objects are called. Note that this only holds true for objects. Data members of primitive data types are not initialized automatically.

Member initialization can, however, also be used for primitive data members, like int and double. The above example shows the initialization of the data member d_mass from the parameter mass. When member initializers are used the data member could even have the same name as the constructor's parameter (although this is deprecated) as there is no ambiguity and the first (left) identifier used in a member initializer is always a data member that is initialized whereas the identifier between parentheses is interpreted as the parameter.

The order in which class type data members are initialized is defined by the order in which those members are defined in the composing class interface. If the order of the initialization in the constructor differs from the order in the class interface, the compiler complains, and reorders the initialization so as to match the order of the class interface.

Member initializers should be used as often as possible. As shown it may be required to use them (e.g., to initialize const data members, or to initialize objects of classes lacking default constructors) but not using member initializers also results in inefficient code as the default constructor of a data member is always automatically called unless an explicit member initializer is specified. Reassignment in the constructor's body following default construction is then clearly inefficient. Of course, sometimes it is fine to use the default constructor, but in those cases the explicit member initializer can be omitted.

As a rule of thumb: if a value is assigned to a data member in the constructor's body then try to avoid that assignment in favor of using a member initializer.

7.3.2: Composition and reference objects: reference member initializers

Apart from using member initializers to initialize composed objects (be they const objects or not), there is another situation where member initializers must be used. Consider the following situation.

A program uses an object of the class Configfile, defined in main to access the information in a configuration file. The configuration file contains parameters of the program which may be set by changing the values in the configuration file, rather than by supplying command line arguments.

Assume another object used in main is an object of the class Process, doing `all the work'. What possibilities do we have to tell the object of the class Process that an object of the class Configfile exists?

But a reference variable cannot be initialized using an assignment, and so the following is incorrect:
    Process::Process(Configfile &conf)
    {
        d_conf = conf;        // wrong: no assignment
    }

The statement d_conf = conf fails, because it is not an initialization, but an assignment of one Configfile object (i.e., conf), to another (d_conf). An assignment to a reference variable is actually an assignment to the variable the reference variable refers to. But which variable does d_conf refer to? To no variable at all, since we haven't initialized d_conf. After all, the whole purpose of the statement d_conf = conf was to initialize d_conf....

How to initialize d_conf? We once again use the member initializer syntax. Here is the correct way to initialize d_conf:

    Process::Process(Configfile &conf)
    :
        d_conf(conf)      // initializing reference member
    {}

The above syntax must be used in all cases where reference data members are used. E.g., if d_ir would have been an int reference data member, a construction like

    Process::Process(int &ir)
    :
        d_ir(ir)
    {}

would have been required.

7.4: Data member initializers

Non-static data members of classes are usually initialized by the class's constructors. Frequently (but not always) the same initializations are used by different constructors, resulting in multiple points where the initializations are performed, which in turn complicates class maintenance.

Consider a class defining several data members: a pointer to data, a data member storing the number of data elements the pointer points at, a data member storing the sequence number of the object. The class also offer a basic set of constructors, as shown in the following class interface:

    class Container
    {
        Data *d_data;
        size_t d_size;
        size_t d_nr;

        static size_t s_nObjects;

        public:
            Container();
            Container(Container const &other);
            Container(Data *data, size_t size);
            Container(Container &&tmp);
    };

The initial values of the data members are easy to describe, but somewhat hard to implement. Consider the initial situation and assume the default constructor is used: all data members should be set to 0, except for d_nr which must be given the value ++s_nObjects. Since these are non-default actions, we can't declare the default constructor using = default, but we must provide an actual implementation:

    Container()
    :
        d_data(0),
        d_size(0),
        d_nr(++s_nObjects)
    {}

In fact, all constructors require us to state the d_nr(++s_nObjects) initialization. So if d_data's type would have been a (move aware) class type, we would still have to provide implementations for all of the above constructors.

C++, however, also supports data member initializers, simplifying the initialization of non-static data members. Data member initializers allow us to assign initial values to data members. The compiler must be able to compute these initial values from initialization expressions, but the initial values do not have to be constant expressions. So ++s_nObjects can be an initial value.

Using data member initializers for the class Container we get:

    class Container
    {
        Data *d_data = 0;
        size_t d_size = 0;
        size_t d_nr = ++s_nObjects;

        static size_t s_nObjects;

        public:
            Container() = default;
            Container(Container const &other);
            Container(Data *data, size_t size);
            Container(Container &&tmp);
    };

Note that the data member initializations are recognized by the compiler, and are applied to its implementation of the default constructor. In fact, all constructors will apply the data member initializations, unless explicitly initialized otherwise. E.g., the move-constructor may now be implemented like this:

    Container(Container &&tmp)
    :
        d_data(tmp.d_data),
        d_size(tmp.d_size)
    {
        tmp.d_data = 0;
    }

Although d_nr's intialization is left out of the implementation it is initialized due to the data member initialization provided in the class's interface.

An aggregate is an array or a class (usually a struct with no user-defined constructors, no private or protected non-static data members, no base classes (cf. chapter 13), and no virtual functions (cf. chapter 14)). E.g.,

    struct POD      // defining aggregate POD
    {
        int first = 5; 
        double second = 1.28; 
        std::string hello{ "hello" };
    };

To initialize such aggregates braced initializer lists can be used. In fact, their use is preferred over using the older form (using parentheses), as using braces avoids confusion with function declarations. E.g.,

    POD pod{ 4, 13.5, "hi there" };

When using braced-initializer lists not all data members need to be initialized. Specification may stop at any data member, in which case the default (or explicitly defined initialization values) of the remaining data members are used. E.g.,

    POD pod{ 4 };   // uses second: 1.28, hello: "hello"

7.4.1: Delegating constructors

Often constructors are specializations of each other, allowing objects to be constructed specifying only subsets of arguments for all of its data members, using default argument values for the remaining data members.

Before the C++11 standard common practice was to define a member like init performing all initializations common to constructors. Such an init function, however, cannot be used to initialize const or reference data members, nor can it be used to perform so-called base class initializations (cf. chapter 13).

Here is an example where such an init function might have been used. A class Stat is designed as a wrapper class around C's stat(2) function. The class might define three constructors: one expecting no arguments and initializing all data members to appropriate values; a second one doing the same, but it calls stat for the filename provided to the constructor; and a third one expecting a filename and a search path for the provided file name. Instead of repeating the initialization code in each constructor, the common code can be factorized into a member init which is called by the constructors.

C++ offers an alternative by allowing constructors to call each other. This is called delegating constructors which is illustrated by the next example:

    class Stat
    {
        public:
            Stat()
            :
                Stat("", "")        // no filename/searchpath
            {}
            Stat(std::string const &fileName)
            :
                Stat(fileName, "")  // only a filename
            {}
            Stat(std::string const &fileName, std::string const &searchPath)
            :
                d_filename(fileName),
                d_searchPath(searchPath)
            {
                // remaining actions to be performed by the constructor
            }
    };

C++ allows static const integral data members to be initialized within the class interfaces (cf. chapter 8). The C++11 standard adds to this the facility to define default initializations for plain data members in class interfaces (these data members may or may not be const or of integral types, but (of course) they cannot be reference data members).

These default initializations may be overruled by constructors. E.g., if the class Stat uses a data member bool d_hasPath which is false by default but the third constructor (see above) should initialize it to true then the following approach is possible:

    class Stat
    {
        bool d_hasPath = false;

        public:
            Stat(std::string const &fileName, std::string const &searchPath)
            :
                d_hasPath(true)     // overrule the interface-specified
            {}                      // value
    };

Here d_hasPath receives its value only once: it's always initialized to false except when the shown constructor is used in which case it is initialized to true.

7.5: Uniform initialization

When defining variables and objects they may immediately be given initial values. Class type objects are always initialized using one of their available constructors. C already supports the array and struct initializer list consisting of a list of constant expressions surrounded by a pair of curly braces.

C++ supports a comparable initialization, called uniform initialization. It uses the following syntax:

    Type object{ value list };

When defining objects using a list of objects each individual object may use its own uniform initialization.

The advantage of uniform initialization over using constructors is that using constructor arguments may sometimes result in an ambiguity as constructing an object may sometimes be confused with using the object's overloaded function call operator (cf. section 11.10). As initializer lists can only be used with plain old data (POD) types (cf. section 9.10) and with classes that are `initializer list aware' (like std::vector) the ambiguity does not arise when initializer lists are used.

Uniform initialization can be used to initialize an object or variable, but also to initialize data members in a constructor or implicitly in the return statement of functions. Examples:

    class Person
    {
        // data members
        public:
            Person(std::string const &name, size_t mass)
            :
                d_name {name},
                d_mass {mass}
            {}

            Person copy() const
            {
                return {d_name, d_mass};
            }
    };

Object definitions may be encountered in unexpected places, easily resulting in (human) confusion. Consider a function `func' and a very simple class Fun (struct is used, as data hiding is not an issue here; in-class implementations are used for brevity):

    void func();

    struct Fun
    {
        Fun(void (*f)())
        {
            std::cout << "Constructor\n";
        };

        void process()
        {
            std::cout << "process\n";
        }
    };

Assume that in main a Fun object is defined as follows:

    Fun fun(func);

Running this program displays Constructor, confirming that the object fun is constructed.

Next we change this line of code, intending to call process from an anonymous Fun object:

    Fun(func).process();

As expected, Constructor appears, followed by the text process.

What about just defining an anonymous Fun object? We do:

    Fun(func);

Now we're in for a surprise. The compiler complains that Fun's default constructor is missing. Why's that? Insert some blanks immediately after Fun and you get Fun (func). Parentheses around an identifier are OK, and are stripped off once the parenthesized expression has been parsed. In this case: (func) equals func, and so we have Fun func: the definition of a Fun func object, using Fun's default constructor (which isn't provided).

So why does Fun(func).process() compile? In this case we have a member selector operator, whose left-hand operand must be an class-type object. The object must exist, and Fun(func) represents that object. It's not the name of an existing object, but a constructor expecting a function like func exists. The compiler now creates an anonymous Fun, passing it func as its argument.

Clearly, in this example, parentheses cannot be used to create an anonymous Fun object. However, the uniform initialization can be used. To define the anonymous Fun object use this syntax:

    Fun{ func };

(which can also be used to immediately call one of its members. E.g., Fun{func}.process()).

Although the uniform intialization syntax is slightly different from the syntax of an initializer list (the latter using the assignment operator) the compiler nevertheless uses the initializer list if a constructor supporting an initializer list is available. As an example consider:

    class Vector
    {
        public:
            Vector(size_t size);
            Vector(std::initializer_list<int> const &values);
    };

    Vector vi = {4};

When defining vi the constructor expecting the initializer list is called rather than the constructor expecting a size_t argument. If the latter constructor is required the definition using the standard constructor syntax must be used. I.e., Vector vi(4).

Initializer lists are themselves objects that may be constructed using another initializer list. However, values stored in an initializer list are immutable. Once the initializer list has been defined their values remain as-is.

Before using initializer lists the initializer_list header file must be included.

Initializer lists support a basic set of member functions and constructors:

7.6: Defaulted and deleted class members

In everyday class design two situations are frequently encountered: Once a class defines at least one constructor its default constructor is not automatically defined by the compiler. C++ relaxes that restriction somewhat by offering the `= default' syntax. A class specifying `= default' with its default constructor declaration indicates that the trivial default constructor should be provided by the compiler. A trivial default constructor performs the following actions: Trivial implementations can also be provided for the copy constructor, the overloaded assignment operator, and the destructor. Those members are introduced in chapter 9.

Conversely, situations exist where some (otherwise automatically provided) members should not be made available. This is realized by specifying `= delete'. Using = default and = delete is illustrated by the following example. The default constructor receives its trivial implementation, copy-construction is prevented:

    class Strings
    {
        public:
            Strings() = default;
            Strings(std::string const *sp, size_t size);

            Strings(Strings const &other) = delete;
    };

7.7: Const member functions and const objects

The keyword const is often used behind the parameter list of member functions. This keyword indicates that a member function does not alter the data members of its object. Such member functions are called const member functions. In the class Person, we see that the accessor functions were declared const:
    class Person
    {
        public:
            std::string const &name()    const;
            std::string const &address() const;
            std::string const &phone()   const;
            size_t mass()              const;
    };

The rule of thumb given in section 3.1.1 applies here too: whichever appears to the left of the keyword const, is not altered. With member functions this should be interpreted as `doesn't alter its own data'.

When implementing a const member function the const attribute must be repeated:

    string const &Person::name() const
    {
        return d_name;
    }

The compiler prevents the data members of a class from being modified by one of its const member functions. Therefore a statement like

    d_name[0] = toupper(static_cast<unsigned char>(d_name[0]));

results in a compiler error when added to the above function's definition.

Const member functions are used to prevent inadvertent data modification. Except for constructors and the destructor (cf. chapter 9) only const member functions can be used with (plain, references or pointers to) const objects.

Const objects are frequently encountered as const & parameters of functions. Inside such functions only the object's const members may be used. Here is an example:

    void displayMass(ostream &out, Person const &person)
    {
        out << person.name() << " weighs " << person.mass() << " kg.\n";
    }

Since person is defined as a Person const & the function displayMass cannot call, e.g.,
person.setMass(75).

The const member function attribute can be used to overload member functions. When functions are overloaded by their const attribute the compiler uses the member function matching most closely the const-qualification of the object:

The next example illustrates how (non) const member functions are selected:
    #include <iostream>
    using namespace std;

    class Members
    {
        public:
            Members();
            void member();
            void member() const;
    };

    Members::Members()
    {}
    void Members::member()
    {
        cout << "non const member\n";
    }
    void Members::member() const
    {
        cout << "const member\n";
    }

    int main()
    {
        Members const constObject;
        Members       nonConstObject;

        constObject.member();
        nonConstObject.member();
    }
    /*
            Generated output:

        const member
        non const member
    */
As a general principle of design: member functions should always be given the const attribute, unless they actually modify the object's data.

7.7.1: Anonymous objects

Sometimes objects are used because they offer a certain functionality. The objects only exist because of their functionality, and nothing in the objects themselves is ever changed. The following class Print offers a facility to print a string, using a configurable prefix and suffix. A partial class interface could be:
    class Print
    {
        public:
            Print(ostream &out);
            void print(std::string const &prefix, std::string const &text,
                     std::string const &suffix) const;
    };

An interface like this would allow us to do things like:

    Print print{ cout };
    for (int idx = 0; idx != argc; ++idx)
        print.print("arg: ", argv[idx], "\n");

This works fine, but it could greatly be improved if we could pass print's invariant arguments to Print's constructor. This would simplify print's prototype (only one argument would need to be passed rather than three) and we could wrap the above code in a function expecting a Print object:

    void allArgs(Print const &print, int argc, char **argv)
    {
        for (int idx = 0; idx != argc; ++idx)
            print.print(argv[idx]);
    }

The above is a fairly generic piece of code, at least it is with respect to Print. Since prefix and suffix don't change they can be passed to the constructor which could be given the prototype:

    Print(ostream &out, string const &prefix = "", string const &suffix = "");

Now allArgs may be used as follows:

    Print p1{ cout, "arg: ", "\n" };    // prints to cout
    Print p2{ cerr, "err: --", "--\n" };// prints to cerr

    allArgs(p1, argc, argv);            // prints to cout
    allArgs(p2, argc, argv);            // prints to cerr

But now we note that p1 and p2 are only used inside the allArgs function. Furthermore, as we can see from print's prototype, print doesn't modify the internal data of the Print object it is using.

In such situations it is actually not necessary to define objects before they are used. Instead anonymous objects may be used. Anonymous objects can be used:

When passing anonymous objects as arguments of const & parameters of functions they are considered constant as they merely exist for passing the information of (class type) objects to those functions. This way, they cannot be modified, nor may their non-const member functions be used. Of course, a const_cast could be used to cast away the const reference's constness, but that's considered bad practice on behalf of the function receiving the anonymous objects. Also, any modification to the anonymous object is lost once the function returns as the anonymous object ceases to exist after calling the function. These anonymous objects used to initialize const references should not be confused with passing anonymous objects to parameters defined as rvalue references (section 3.3.2) which have a completely different purpose in life. Rvalue references primarily exist to be `swallowed' by functions receiving them. Thus, the information made available by rvalue references outlives the rvalue reference objects which are also anonymous.

Anonymous objects are defined when a constructor is used without providing a name for the constructed object. Here is the corresponding example:

    allArgs(Print{ cout, "arg: ", "\n" }, argc, argv);    // prints to cout
    allArgs(Print{ cerr, "err: --", "--\n" }, argc, argv);// prints to cerr

In this situation the Print objects are constructed and immediately passed as first arguments to the allArgs functions, where they are accessible as the function's print parameter. While the allArgs function is executing they can be used, but once the function has completed, the anonymous Print objects are no longer accessible.

7.7.1.1: Subtleties with anonymous objects

Anonymous objects can be used to initialize function parameters that are const references to objects. These objects are created just before such a function is called, and are destroyed once the function has terminated. C++'s grammar allows us to use anonymous objects in other situations as well. Consider the following snippet of code:
    int main()
    {
        // initial statements
        Print{ "hello", "world" };      // assume a matching constructor
                                        // is available
        // later statements
    }

In this example an anonymous Print object is constructed, and it is immediately destroyed thereafter. So, following the `initial statements' our Print object is constructed. Then it is destroyed again followed by the execution of the `later statements'.

The example illustrates that the standard lifetime rules do not apply to anonymous objects. Their lifetimes are limited to the statements, rather than to the end of the block in which they are defined.

Plain anonymous object are at least useful in one situation. Assume we want to put markers in our code producing some output when the program's execution reaches a certain point. An object's constructor could be implemented so as to provide that marker-functionality allowing us to put markers in our code by defining anonymous, rather than named objects.

C++'s grammar contains another remarkable characteristic illustrated by the next example:

    int main(int argc, char **argv)
    {
              // assume a matching constructor is available:
        Print p{ cout, "", "" };            // 1
        allArgs(Print{ p }, argc, argv);    // 2
    }

In this example a non-anonymous object p is constructed in statement 1, which is then used in statement 2 to initialize an anonymous object. The anonymous object, in turn, is then used to initialize allArgs's const reference parameter. This use of an existing object to initialize another object is common practice, and is based on the existence of a so-called copy constructor. A copy constructor creates an object (as it is a constructor) using an existing object's characteristics to initialize the data of the object that's created. Copy constructors are discussed in depth in chapter 9, but presently only the concept of a copy constructor is used.

In the above example a copy constructor is used to initialize an anonymous object. The anonymous object was then used to initialize a parameter of a function. However, when we try to apply the same trick (i.e., using an existing object to initialize an anonymous object) to a plain statement, the compiler generates an error: the object p can't be redefined (in statement 3, below):

    int main(int argc, char *argv[])
    {
        Print p{ "", "" };                  // 1
        allArgs(Print(p), argc, argv);      // 2
        Print(p);                           // 3 error!
    }

Does this mean that using an existing object to initialize an anonymous object that is used as function argument is OK, while an existing object can't be used to initialize an anonymous object in a plain statement?

The compiler actually provides us with the answer to this apparent contradiction. About statement 3 the compiler reports something like:

    error: redeclaration of 'Print p'

which solves the problem when realizing that within a compound statement objects and variables may be defined. Inside a compound statement, a type name followed by a variable name is the grammatical form of a variable definition. Parentheses can be used to break priorities, but if there are no priorities to break, they have no effect, and are simply ignored by the compiler. In statement 3 the parentheses allowed us to get rid of the blank that's required between a type name and the variable name, but to the compiler we wrote

        Print (p);

which is, since the parentheses are superfluous, equal to

        Print p;

thus producing p's redeclaration.

As a further example: when we define a variable using a built-in type (e.g., double) using superfluous parentheses the compiler quietly removes these parentheses for us:

    double ((((a))));       // weird, but OK.

To summarize our findings about anonymous variables:

7.8: The keyword `inline'

Let us take another look at the implementation of the function Person::name():
    std::string const &Person::name() const
    {
        return d_name;
    }

This function is used to retrieve the name field of an object of the class Person. Example:

    void showName(Person const &person)
    {
        cout << person.name();
    }

To insert person's name the following actions are performed:

Especially the first part of these actions causes some time loss, since an extra function call is necessary to retrieve the value of the name field. Sometimes a faster procedure immediately making the d_name data member available is preferred without ever actually calling a function name. This can be realized using inline functions. An inline function is a request to the compiler to insert the function's code at the location of the function's call. This may speed up execution by avoiding a function call, which typically comes with some (stack handling and parameter passing) overhead. Note that inline is a request to the compiler: the compiler may decide to ignore it, and will probably ignore it when the function's body contains much code. Good programming discipline suggests to be aware of this, and to avoid inline unless the function's body is fairly small. More on this in section 7.8.2.

7.8.1: Defining members inline

Inline functions may be implemented in the class interface itself. For the class Person this results in the following implementation of name:
    class Person
    {
        public:
            std::string const &name() const
            {
                return d_name;
            }
    };

Note that the inline code of the function name now literally occurs inline in the interface of the class Person. The keyword const is again added to the function's header.

Although members can be defined in-class (i.e., inside the class interface itself), it is considered bad practice for the following reasons:

Because of the above considerations inline members should not be defined in-class. Rather, they should be defined following the class interface. The Person::name member is therefore preferably defined as follows:
    class Person
    {
        public:
            std::string const &name() const;
    };

    inline std::string const &Person::name() const
    {
        return d_name;
    }

If it is ever necessary to cancel Person::name's inline implementation, then this becomes its non-inline implementation:

    #include "person.ih"

    std::string const &Person::name() const
    {
        return d_name;
    }

Only the inline keyword needs to be removed to obtain the correct non-inline implementation.

Defining members inline has the following effect: whenever an inline-defined function is called, the compiler may insert the function's body at the location of the function call. It may be that the function itself is never actually called.

This construction, where the function code itself is inserted rather than a call to the function, is called an inline function. Note that using inline functions may result in multiple occurrences of the code of those functions in a program: one copy for each invocation of the inline function. This is probably OK if the function is a small one, and needs to be executed fast. It's not so desirable if the code of the function is extensive. The compiler knows this too, and handles the use of inline functions as a request rather than a command. If the compiler considers the function too long, it will not grant the request. Instead it will treat the function as a normal function.

7.8.2: When to use inline functions

When should inline functions be used, and when not? There are some rules of thumb which may be followed: All inline functions have one disadvantage: the actual code is inserted by the compiler and must therefore be known at compile-time. Therefore, as mentioned earlier, an inline function can never be located in a run-time library. Practically this means that an inline function is found near the interface of a class, usually in the same header file. The result is a header file which not only shows the declaration of a class, but also part of its implementation, thus always blurring the distinction between interface and implementation.

7.8.2.1: A prelude: when NOT to use inline functions

As a prelude to chapter 14 (Polymorphism), there is one situation in which inline functions should definitely be avoided. At this point in the C++ Annotations it's a bit too early to expose the full details, but since the keyword inline is the topic of this section this is considered the appropriate location for the advice.

There are situations where the compiler is confronted with so-called vague linkage
(cf. http://gcc.gnu.org/onlinedocs/gcc-4.6.0/gcc/Vague-Linkage.html). These situations occur when the compiler does not have a clear indication in what object file to put its compiled code. This happens, e.g., with inline functions, which are usually encountered in multiple source files. Since the compiler may insert the code of ordinary inline functions in places where these functions are called, vague linking is usually no problem with these ordinary functions.

However, as explained in chapter 14, when using polymorphism the compiler must ignore the inline keyword and define so-called virtual members as true (out-of-line) functions. In this situation the vague linkage may cause problems, as the compiler must decide in what object s to put their code. Usually that's not a big problem as long as the function is at least called once. But virtual functions are special in the sense that they may very well never be explicitly called. On some architectures (e.g., armel) the compiler may fail to compile such inline virtual functions. This may result in missing symbols in programs using them. To make matters slightly more complex: the problem may emerge when shared libraries are used, but not when static libraries are used.

To avoid all of these problems virtual functions should never be defined inline, but they should always be defined out-of-line. I.e., they should be defined in source files.

7.8.3: Inline variables

In addition to inline functions, inline variables can be defined (and identically initialized) in multiple translation units. E.g., a header file could contain
    inline int value = 15;                      // OK

    class Demo
    {
        // static int s_value = 15;             // ERROR
        static int constexpr s_value = 15;      // OK

        static int s_inline;                    // OK: see below: the inline 
                                                //   definition follows the 
                                                //   class declaration
    };
    inline int Demo::s_inline = 20;             // OK

7.9: Local classes: classes inside functions

Classes are usually defined at the global or namespace level. However, it is entirely possible to define a local class, i.e., inside a function. Such classes are called local classes.

Local classes can be very useful in advanced applications involving inheritance or templates (cf. section 13.8). At this point in the C++ Annotations they have limited use, although their main features can be described. At the end of this section an example is provided.

#include <iostream>
#include <string>

using namespace std;

int main(int argc, char **argv)
{
    static size_t staticValue = 0;

    class Local
    {
        int d_argc;             // non-static data members OK

        public:
            enum                // enums OK
            {
                VALUE = 5
            };
            Local(int argc)     // constructors and member functions OK
            :                   // in-class implementation required
                d_argc(argc)
            {
                                // global data: accessible
                cout << "Local constructor\n";
                                // static function variables: accessible
                staticValue += 5;
            }
            static void hello() // static member functions: OK
            {
                cout << "hello world\n";
            }
    };
    Local::hello();             // call Local static member
    Local loc{ argc };          // define object of a local class.
}

7.10: The keyword `mutable'

Earlier, in section 7.7, the concepts of const member functions and const objects were introduced.

C++ also allows the declaration of data members which may be modified, even by const member function. Declarations of such data members start with the keyword mutable.

Mutable should be used for those data members that may be modified without logically changing the object, which might therefore still be considered a constant object.

An example of a situation where mutable is appropriately used is found in the implementation of a string class. Consider the std::string's c_str and data members. The actual data returned by the two members are identical, but c_str must ensure that the returned string is terminated by an 0-byte. As a string object has both a length and a capacity an easy way to implement c_str is to ensure that the string's capacity exceeds its length by at least one character. This invariant allows c_str to be implemented as follows:

    char const *string::c_str() const
    {
        d_data[d_length] = 0;
        return d_data;
    }

This implementation logically does not modify the object's data as the bytes beyond the object's initial (length) characters have undefined values. But in order to use this implementation d_data must be declared mutable:

    mutable char *d_data;

The keyword mutable is also useful in classes implementing, e.g., reference counting. Consider a class implementing reference counting for strings. The object doing the reference counting might be a const object, but the class may define a copy constructor. Since const objects can't be modified, how would the copy constructor be able to increment the reference count? Here the mutable keyword may profitably be used, as it can be incremented and decremented, even though its object is a const object.

The keyword mutable should sparingly be used. Data modified by const member functions should never logically modify the object, and it should be easy to demonstrate this. As a rule of thumb: do not use mutable unless there is a very clear reason (the object is logically not altered) for violating this rule.

7.11: Header file organization

In section 2.5.10 the requirements for header files when a C++ program also uses C functions were discussed. Header files containing class interfaces have additional requirements.

First, source files. With the exception of the occasional classless function, source files contain the code of member functions of classes. Basically, there are two approaches:

The first alternative has the advantage of economy for the compiler: it only needs to read the header files that are necessary for a particular source file. It has the disadvantage that the program developer must include multiple header files again and again in source files: it both takes time to type the include-directives and to think about the header files which are needed in a particular source file.

The second alternative has the advantage of economy for the program developer: the header file of the class accumulates header files, so it tends to become more and more generally useful. It has the disadvantage that the compiler frequently has to process many header files which aren't actually used by the function to compile.

With computers running faster and faster (and compilers getting smarter and smarter) I think the second alternative is to be preferred over the first alternative. So, as a starting point source files of a particular class MyClass could be organized according to the following example:

    #include <myclass.h>

    int MyClass::aMemberFunction()
    {}

There is only one include-directive. Note that the directive refers to a header file in a directory mentioned in the INCLUDE-file environment variable. Local header files (using #include "myclass.h") could be used too, but that tends to complicate the organization of the class header file itself somewhat.

The organization of the header file itself requires some attention. Consider the following example, in which two classes File and String are used.

Assume the File class has a member gets(String &destination), while the class String has a member function getLine(File &file). The (partial) header file for the class String is then:

    #ifndef STRING_H_
    #define STRING_H_

    #include <project/file.h>   // to know about a File

    class String
    {
        public:
            void getLine(File &file);
    };
    #endif

Unfortunately a similar setup is required for the class File:

    #ifndef FILE_H_
    #define FILE_H_

    #include <project/string.h>   // to know about a String

    class File
    {
        public:
            void gets(String &string);
    };
    #endif

Now we have created a problem. The compiler, trying to compile the source file of the function File::gets proceeds as follows:

The solution to this problem is to use a forward class reference before the class interface, and to include the corresponding class header file beyond the class interface. So we get:
    #ifndef STRING_H_
    #define STRING_H_

    class File;                 // forward reference

    class String
    {
        public:
            void getLine(File &file);
    };

    #include <project/file.h>   // to know about a File

    #endif

A similar setup is required for the class File:

    #ifndef FILE_H_
    #define FILE_H_

    class String;               // forward reference

    class File
    {
        public:
            void gets(String &string);
    };

    #include <project/string.h>   // to know about a String

    #endif

This works well in all situations where either references or pointers to other classes are involved and with (non-inline) member functions having class-type return values or parameters.

This setup doesn't work with composition, nor with in-class inline member functions. Assume the class File has a composed data member of the class String. In that case, the class interface of the class File must include the header file of the class String before the class interface itself, because otherwise the compiler can't tell how big a File object is. A File object contains a String member, but the compiler can't determine the size of that String data member and thus, by implication, it can't determine the size of a File object.

In cases where classes contain composed objects (or are derived from other classes, see chapter 13) the header files of the classes of the composed objects must have been read before the class interface itself. In such a case the class File might be defined as follows:

    #ifndef FILE_H_
    #define FILE_H_

    #include <project/string.h>     // to know about a String

    class File
    {
        String d_line;              // composition !

        public:
            void gets(String &string);
    };
    #endif

The class String can't declare a File object as a composed member: such a situation would again result in an undefined class while compiling the sources of these classes.

All remaining header files (appearing below the class interface itself) are required only because they are used by the class's source files.

This approach allows us to introduce yet another refinement:

7.11.1: Using namespaces in header files

When entities from namespaces are used in header files, no using directive should be specified in those header files if they are to be used as general header files declaring classes or other entities from a library. When the using directive is used in a header file then users of such a header file are forced to accept and use the declarations in all code that includes the particular header file.

For example, if in a namespace special an object Inserter cout is declared, then special::cout is of course a different object than std::cout. Now, if a class Flaw is constructed, in which the constructor expects a reference to a special::Inserter, then the class should be constructed as follows:

    class special::Inserter;

    class Flaw
    {
        public:
            Flaw(special::Inserter &ins);
    };

Now the person designing the class Flaw may be in a lazy mood, and might get bored by continuously having to prefix special:: before every entity from that namespace. So, the following construction is used:

    using namespace special;

    class Inserter;
    class Flaw
    {
        public:
            Flaw(Inserter &ins);
    };

This works fine, up to the point where somebody wants to include flaw.h in other source files: because of the using directive, this latter person is now by implication also using namespace special, which could produce unwanted or unexpected effects:

    #include <flaw.h>
    #include <iostream>

    using std::cout;

    int main()
    {
        cout << "starting\n";       // won't compile
    }

The compiler is confronted with two interpretations for cout: first, because of the using directive in the flaw.h header file, it considers cout a special::Inserter, then, because of the using directive in the user program, it considers cout a std::ostream. Consequently, the compiler reports an error.

As a rule of thumb, header files intended for general use should not contain using declarations. This rule does not hold true for header files which are only included by the sources of a class: here the programmer is free to apply as many using declarations as desired, as these directives never reach other sources.

7.11.2: Modules

Since the introduction of header files in the C language header files have been the main tool for declaring elements that are not defined but are used in source files. E.g., when using printf in main the preprocessor directive #include <stdio.h> had to be specified.

This method still works in C++, but gradually proved to be inefficient. One reason being that header files have to be processed again for every source file of a set of source files each including that header file. The drawback of this approach quickly becomes apparent once classes are used, as the compiler will repeatedly have to process the class's header file for each source file using that class. Usually it's not just that one header file, but header files tend to include other header files, resulting in an avalanche of header files that must be processed by the compiler again and again for every single source file that the compiler must compile. If a typical source file includes h header files, and s source files must be compiled, then that results in a significant compilation load, as the compiler must process s * h header files.

Precompiled headers offered an initial attempt to reduce this excessive workload. But precompiled headers have problems of their own: they're enormously big (a precompiled header file of less than 100 bytes can easily result in a precompiled header of 25 MB or more), and they're kind of fragile: simply recompiling a header if it's younger than its precompiled form may quickly result in much overhead, e.g., if merely some comment is added to the header.

Another common defense mechanism encountered in traditional headers is the use of include guards, ensuring that a header file is processed once if it is included by multiple other header files. Such include guards are macros, and were extensively discussed in section 7.11. Include guards work, but completely depend on the uniqueness of the guard-identifier, which is usually a long name, written in capitals using several underscores to increase the probability of their uniqueness.

By offering modules the C++20 standard provides solutions to the problems mentioned above. At the time of this writing the Gnu g++ compiler (still) experiences problems with modules. Once these problems have been solved a separate chapter about modules will definitely be added to the C++ Annotations.

7.12: Sizeof applied to class data members

In C++ the well-known sizeof operator can be applied to data members of classes without the need to specify an object as well. Consider:
    class Data
    {
        std::string d_name;
        ...
    };

To obtain the size of Data's d_name member the following expression can be used:

    sizeof(Data::d_name);

However, note that the compiler observes data protection here as well. Sizeof(Data::d_name) can only be used where d_name may be visible as well, i.e., by Data's member functions and friends.