Object-Oriented C Programming - Part II

February 08, 2011

Methods Revisited
Encapsulation and Private Implementation
Next Time

In the last article of the series, I explored how simple C structs can be used to build objects, including inheritance. In this article I will illustrate how class methods can be implemented in C.

Methods Revisited

Methods come in two forms: virtual and non-virtual. Both are implementable in C, although virtual methods take a bit more code and boilerplate to set up.

The difference between the two is very simple. Virtual methods allow run-time polymorphism, while non-virtual methods do not. Non-virtual methods can be used to implement compile-time polymorphism in C++, although implementing similar features in C is a bit more troublesome due to the lack of function overloading. I will not be covering the use of compile-time polymorphism in C, as it’s simply easier to explicitly invoke different class’s non-virtual methods than to try to emulate function overloading.

Non-virtual Methods

Here’s an example with a non-virtual method, foo(). This method operates on an instance of our user-defined class, takes an integer, and returns a double.

// C++ class MyClass { public: int foo(double); }; /* C */ typedef struct _MyStruct { } MyStruct;

extern int mystruct_foo(MyStruct*, double);</code>

Virtual Methods

Here’s an example using virtual methods:

class MyClass { public: virtual int bar(int, int); };

typedef struct _MyStruct { struct MyStructVTable* vtable; } MyStruct; struct MyStructVTable { int (bar)(MyStruct, int, int); };

extern int mystruct_bar(MyStruct*, int, int);</code>

Looking at both examples, it’s easy to see that the C version has relatively few changes. In both cases, we simply need to move the method declaration outside of the class/struct declaration and make it a global function. For best clarity, the methods-turned-functions should be prefixed with the name of the type they operate on. They also need to take a pointer to the structure as their first argument, as there is no automatic this pointer with our C-style methods. Calling the methods looks like so:

result = myclass.foo(1.23); result = mystruct_foo(&mystruct, 1.23);

For the virtual method case, we have to construct our own virtual function table. This is a simple matter of creating a struct to contain our function pointers, initializing the table, and then storing a pointer to the table in each instance of our MyStruct type. We would abstract this all behind a constructor function for best clarity. More on that below.

Again we see that the C version is a bit more verbose than the C++ version, but it is not lacking in features. Quite the contrary, the C version gives us a bit more flexibility than the standard C++ class mechanism. For instance, say that we have only one or two virtual methods for MyStruct, or perhaps a large number of virtual methods but only a relatively small number of MyStruct objects will be alive at any given time; it may be that the overhead of having a separate virtual function table lookup for every virtual method call outweighs the overhead of simply storing the function pointers directly inside the MyStruct structure. While C++ allows this as well (it is, after all, mostly a superset of C), many programmers never even think of doing this because the C++ class mechanism is ingrained into their thought pattern. Thinking like a C programmer opens up new options that the traditional pure-C++ programmer denies himself.

Encapsulation and Private Implementation

Encapsulation is one of the hallmarks of good object-oriented design, and is often touted as the most important part of object-oriented programming. Unfortunately, C++ actually makes good encapsulation even uglier and more complicated than C does. Let’s review.

// in the public header MyClass { public: int getIntenger(); double getReal();

private: int mInteger; double mReal; };

// in the implementation .cpp file int MyClass::getInteger() { return myInteger; }

double MyClass::getReal() { return myReal; }</code>

In this example, we are three very different things:

We are declaring the existence of a new type, MyClass, which represents a distinct type of object.
We are declaring the operations available to users of the MyClass type.
We are defining the internal representation of MyClass.

The alarming part is that third item. In a public header, we are exposing some of the internal implementation details our of type. While users are in practice barred from accessing these internal member values, we are exposing information the user does not need. We also are forced to expose dependencies on any other headers necessary to declaring those internal details. For example, if MyClass had a member variable of type OtherClass, we would be forced to include the header for OtherClass into the header for MyClass. In turn, any user of MyClass gets all of the declarations, definitions, macros, and so on from the OtherClass header even though the user will never be directly using that class, as it’s an internal implementation detail of MyClass.

Interfaces for Encapsulation in C++

Fixing all of the above issues in C++ forces us to split our type into two distinct types: an interface type and an implementation type.

// in the public header class MyClass { public: virtual int getIntenger() = 0; virtual double getReal() = 0;

static MyClass* create(); };

// in the private .cpp implementation file class MyClassImpl : public MyClass { public: virtual int getIntenger() { return myInteger; }

virtual double getReal() { return myReal; }

private: int myInteger; double myReal; };

static MyClass* MyClass::create() { return new MyClassImpl(); }</code>

There are a few key differences that this cleanup requires of us. First, we are forced to make all of our methods virtual. If our type is not designed to be polymorphic, this may actually be a bit of a problem. While most code will not suffer any noticeable performance degradation from the use of virtual methods, a type which is intended to have many active instances or whose methods are intended to be called many times in inner loops will be far worse off having these virtual methods. The extra pointer in every instance and the indirect method calls can add up for performance-sensitive types.

An artifact of the interface/implementation split is that objects of type MyClass must now always be created on the heap, and that must be done using a factory method constructor (the MyClass::create method). They cannot be created as local variables or as member variables. This is because the exact layout and size of MyClassImpl is not known to the compiler in client code. This is unavoidable in C or C++, and in many cases is not a problem, but certainly can be a deal-breaker for certain types.

Private Implementations in C

Now, the above example written in C could look one of two ways. First, here’s what the C version would look like if we expose the type’s internal data to the user in the public header.

// in the public header typedef struct _MyStruct { int my_integer; double my_real; } MyStruct;

extern int mystruct_get_integer(MyStruct); extern double mystruct_get_real(MyStruct);

// in the implementation .c file int mystruct_get_integer(MyStruct* self) { return self->my_integer; }

double mystruct_get_real(MyStruct* self) { return self->my_real; }</code>

There’s very little difference at all. The methods need the type’s name prefixed for namespacing purposes, the member variables are public, and there’s no automatic this pointer in the methods. Otherwise, it’s entirely equivalent code. In fact, for just about any C++ compiler, the MyClass and MyStruct types will have identical layout, and the methods will have identical machine code.

Now, let’s look at the C version with a private implementation:

// in the public header typedef struct _MyStruct MyStruct;

extern MyStruct* mystruct_create(void); extern int mystruct_get_integer(MyStruct); extern double mystruct_get_real(MyStruct);

// in the implementation .c file struct _MyStruct { int my_integer; double my_real; };

MyStruct* mystruct_create(void) { return (MyStruct*)malloc(sizeof(MyStruct)); }

int mystruct_get_integer(MyStruct* self) { return self->my_integer; }

double mystruct_get_real(MyStruct* self) { return self->my_real; }</code>

The majority doesn’t change at all. We’re not forced to awkwardly use two different types and we’re not forced to make our methods virtual. We’ve had to add the mystruct_create method, just as the C++ version required the MyClass::create method, in order to allow client code to create objects of our custom type.

On Inlining

Many readers may have noted that my example methods in both cases are trivial, and in fact are good candidates for being made inline methods. C++ has native support for inline methods, while the most widely used version of C does not. However, note that C99 – which is supported by nearly every C compiler in common use today, except for MSVC – does support inline functions, by defining them using the inline static keywords. Even the compilers that do not support C99 almost universally support a vendor extension allowing the use of the inline or __inline keywords in C code, including MSVC.

That said, inline methods require that the type’s internal layout be exposed to the client, and hence the use of inline methods removes the ability to encapsulate the type’s internal details into a private implementation file. For simple accessor methods, the performance benefit of inlining is tremendous. However, it is not necessary to actually use the inline keyword or put functions in headers to get inlining with modern compilers. Every compiler available today that’s worth using at all – including both GCC and MSVC – support link-time optimization. This feature offsets much of the optimization and code-generation work to link time, allowing functions to be inlined across translation unit boundaries. As always, though, double check the code generated by your compiler to be sure that its optimizations are working properly.

There are also cases when inling is actively harmful. Take, for example, a library that is meant to be delivered as a shared object (such as a .DLL or .so file). Every inlined method is a bit of logic that is inserted into the client application at build time. In turn, this means that those methods cannot be changed in any meaningful way without breaking the ABI of the library. Likewise, changing any public structure or class will break the ABI. On some compilers, such as MSVC, simply changing the compilation options when using C++ can alter the ABI of a library! This in turn means that it is often best to develop your library with a pure C interface, fully encapsulating all types and methods, and keeping all of the library’s logic internal. For some types of libraries this may not be feasible, and a pure-header version may even be the best approach; for libraries where the performance of the API calls is not critical (such as libraries that are not meant to be used in performance-critical areas, or libraries where the internals need to be fast but the entry points are called infrequently), it’s better to play it safe and design your library to be resilient to changes and hence safe to update without recompiling applications.

Next Time

In the next article, we’ll cover constructors, destructors, allocators, and iterators.</div>

Game Development by Sean