Game Development by Sean

Dangers of std::shared_ptr

Table of Contents

I've had a lot of conversations with people lately based on their experience and my own regarding shared_ptr (either the C++11 standard version, the TR version, the Boost version, or home-grown versions). The almost universal conclusion has been that shared_ptr is best avoided in almost every circumstance when writing software of substantial size.

Problems

The standard shared_ptr type (which is based on Boost's version) is a flawed smart pointer. At first glance the type seems well designed. There are indeed many places that a reference-counted handle is useful. The support for weak_ptr makes shared_ptr appear even more valuable.

These are arguable the worst parts of shared_ptr: it's devilishly attractive and equally sinful. Let's go into more details about those sins.

Ownership Semantics and Lifetimes

shared_ptr represents a model of ownership called shared ownership, hence the name of the type. Ownership is used to determine the lifetime of the resource, or the time that the resource must be kept alive because it is in use.

Ownership of resources is a central part of software engineering. Whether you're using C with nothing but purely manual management or using C# with its completely automatic garbage collection, part of the job of a software engineer is to determine what resource (object) owns which other resources. Even without the need to actually manage the resources, it's critical to understand the network of references in a software architecture in order to properly maintain and develop the project.

The network of references can and typically does become a nasty web once shared ownership comes into play. Understanding which objects can own which resources is incredibly difficult when a resource could be owned by many difference objects. Objects can refer to each other, either strongly as with shared_ptr or safe-weakly as with weak_ptr or unsafe-weakly as with raw pointers and references. Keeping a working mental model of a codebase that uses shared ownership semantics is simply more difficult than one that does not once the codebase grows beyond a trivial size.

Certainly C's manual management of resources takes its toll in terms of understanding a codebase. A function could return a pointer to a resource that may be owned by either the caller or callee and it's not easy to tell which it is without reading documentation. C++ does not have these problems once uniquely owning handle types like unique_ptr are used.

Debugging References

Shared ownership can be such a burden because it can be impossible to track what object(s) actually have ownership over some resource. When a resource is mysteriously left dangling longer than expected or even destroyed it is quite handy to be able to inspect all the objects sharing ownership of the resource at some particular point in time. This actually is something that is possible to do (more later), but not with the shared_ptr interface.

These aren't hypothetical problems. Dangling owned references to shared objects have been the cause of a variety of bugs in real game engines I and my colleagues have worked on (stretching back before C++11 thanks to Boost). In almost every one of these cases complex architectural fixes have been necessary. Usually removing the use of shared_ptr has been simply too big of an architectural change so other compromises and code clutter have been required.

Hidden Performance Issues

There are several easy to miss performance problems with shared_ptr.

External Reference Counts

There is an easy to miss essential optimization to the use of shared_ptr, which is making sure that any shared object's type derives from std::enabled_shared_from_this. If this is not done then every time a new shared object is created two memory allocations are required: one for the object itself and then one for the reference count. enable_shared_from_this makes it so that the reference count is embedded into the object and no additional memory allocations are required. The type is so named because it also allows things like shared_ptr(this)</code> to be safe. Without this, creating two shared_ptrs that reference the same object will cause a crash or worse when one of the two shared reference counts reach zero but the other is still alive.</p></strike>

Update: While this implementation detail was true of some of the shared_ptr implementations I had used, it is not true of Boost nor of Microsoft's implementation, and given the feedback I've received on this post, it's also not true of GNU's libstdc++ nor Clang's libc++. That'll teach me for not double-checking the most popular implementations. The only way to get the fused allocation on these modern implementations is to use make_shared, and using enabled_shared_from_this unnecessarily will actually just.

This at least show cases one important point: if shared ownership is needed with shared_ptr that use must be encoded into the type of the object. This is true of most other shared ownership semantics possible in C++. shared_ptr simply hides this fact and lets the wrong thing be done far too easily.

Reference Bouncing

The copy constructor is a problem. This is a general theme of many C++ types, especially in light of C++11 and its move semantics. Because copy construction is explicit, copies can be made in all kinds of scenarios where it's not ideal. The specific problem with shared_ptr is that every copy must dereference the pointer to the reference count to increment the value. Memory access aren't free, especially in cases where the copies are part of ownership transfer and the object has no other need to be dereferenced.

Related to the copy constructor is the destructor. Many naive pieces of code that involve copy constructors will also necessarily invoke a destructor. Where the copy constructor has to increment the reference count, the destructor must decrement it.

A safer handle type could only allow implicit moves (in the few cases moves are implicit) and require explicit copies via a .copy() method or the like. Even before C++11, the copy constructor can be made private and unimplemented to ensure it cannot be misused.

Atomics Overhead

The shared_ptr semantics require that the type be safe to use with multiple threads. This in turn requires that the reference count itself by managed using atomic integers such that increments and decrements will be synchronized properly. If the object being shared does not need to be shared between threads, the overhead of an atomic is unnecessary.

Even when a shared object is passed between threads, the ideal case is one in which the object is created on one thread and consumed on another. The transfer of the ownership can be expressed with a move rather than a copy. There is again no need for a shared object in this case. Not only can a custom shared_ptr replacement avoid unnecessary reference modifications, it could also detect any cases where the object is bouncing between threads in debug builds and raise errors.

Safe Weak Pointers

One of the perceived advantages of shared_ptr is that it enabled use of weak_ptr. There are certainly a great many potential use cases for a safe weak reference. A Spaceships-style game for instance may want to allow a homing missile to hold a reference to the target it has locked on to but in a way that allows the target to be destroyed safely without causing the homing missile code to crash.

There are other ways of achieving this goal than weak_ptr. Games have been using weak object handles long before even Boost's version of shared_ptr and weak_ptr came around.

weak_ptr can have its own disadvantages, too. If it's used with the enable_shared_from_this base class then the memory of the shared resource will not be freed until the last weak_ptr is reset. Of course, as we went over before, not using enable_shared_from_this can be problematic in other ways.

If shared_ptr is being used, weak_ptr is an essential tool. It's a fallacy to think that weak_ptr is essential or that as a result shared_ptr is essential. Alternatives to weak_ptr will be covered later when I go over alternatives to shared_ptr.

More on Ownership

The problems with shared_ptr go beyond the implementation in C++. Shared ownership semantics are problematic even in languages like C# where resources are automatically reclaimed once unused. It's useful to examine automatic GC'd languages as a case study on shared ownership problems.

Automatic GC Woes

Automatic GC systems are often billed as solving issues with memory leaks. This is not quite true, however. Sure, a large class of potential leaks are out of the picture once an automatic GC is used (especially once that doesn't use reference counting and has no issues with cyclic references). Logical errors can still result in an excess of objects - ones with live references - are kept around despite never being used, however.

A simple thought exercise that clearly illustrates the issue is to think of a list of recently used files in a GUI application. These lists are often capped to 10 entries or so; if the data structure has more than the UI will display, those entries are inaccessible. An application could use a linked list or dynamic array and simply append an element every time a file is opened. This data structure will grow without limit as the user opens files despite only a small subset of it ever actually being accessed by the program's logic. This thought exercise is very similar to a class of bugs I've actually seen in real software; the point is simply that having a GC does not remove the need for a software engineer to actually think about what they're doing. It doesn't solve all ownership problems. The problems outlined previously with the conceptual foils of shared ownership apply as much to a language like C# as they do to C++.

The debugging issues mentioned above about tracking down the owner of a shared reference keeping an object alive is a very real problem in C#, especially for games. Good debugging tools for C# let the developer see the memory usage characteristics of their application. They might show call stacks of when an object was allocated, but I've yet to ever see a C# memory debugger that can show all objects holding references to the target object. If there is a memory spike/exhaustion/leak of same kind, the developer's only recourse is to follow the code and find every place that could refer to the target object and painstakingly track the behavior of each.

It's important to remember that all these issues with GC and shared_ptr are not limited to just memory. File handles, sockets, application state, and so on are are all resources that are cleaned up. The automatic GC systems are all about memory. Some support "finalization" so that objects holding a file handle or the like will properly clean up those handles. In many cases, however, it is important to have deterministic ordering to the release of these resources, or at the very least to ensure that they are cleaned up as soon as possible after the object becomes unreferenced. Leaving a socket handle alive for too long can exhaust the file/socket handle namespace - which is a much more limited resource than memory - or even cause some unintentional network behavior if the socket is open. Internal application state can also be an issue and non-deterministic release of certain state-managing objects could lead to a variety of bugs.

RAII

RAII (Resource Acquisition Is Initialization) is a paradigm common to C++ that deals with many of the same issues as an automatic GC. unique_ptr and shared_ptr are both an application of RAII.

A variety of the problems noted in the previous section are solved by using RAII. It offers deterministic release of resources at well-defined times. Languages like C# or Python have added an incomplete form of RAII by way of using or with directives. These directives have the short coming in that they only operate at a scope level and hence only through a single execution tree in the code. An C# IDisposable object's Dispose method will not automatically invoke the Dispose methods of any sub-objects.

RAII is not a panacea. shared_ptr is an application of RAII for example. It's simply important to note that RAII - if used correctly - allows problems to be solved that automatic GC does not.

Alternatives to shared_ptr

Having gone over all the problems with shared_ptr it would be useful to understand some alternatives.

Borrowed References vs Ownership

The first thing to realize is that shared ownership is unnecessary, even where a resource may be used by multiple objects which determine a minimum lifetime of that resource. The term I've found useful to use is "borrowed reference."

A simple example would be textures in a graphics engine. A texture might be used by multiple game objects. The lower bound on the lifetime of the texture object is determined by those game objects; it wouldn't be good to free the texture while an object is still using it. The texture can and should be owned uniquely, however. A texture manager object would be an ideal candidate for ownership of all textures.

Each game object then can simply "borrow" a reference to the texture. This would be implemented with some kind of soft-owning pointer. In optimized release builds of the game it might even be a reference count almost identical to what shared_ptr uses. The difference is that a borrowed reference is more of a request and the texture manager is still in full control of the lifetime of the texture. If the texture manager is asked to release all textures while the game is still running, the textures are released. The borrowed references in the game objects will either be invalidated (which the objects must be ready to deal with) or the handles will automatically start referring to a "missing" or "incomplete" texture object (handy for visual debugging of the game).

Game engines have long supported this kind of borrowed reference using things like unique ID handles. Instead of storing a pointer (smart or otherwise) to an object, store a numeric ID instead (possibly wrapped in a templated type to ensure type safety). Accessing the actual object is as simple as asking the owning manager object to look up the object referenced by that ID. Lifetime can be requested by notifying the manager that the resource is in use (again, possibly with a simple reference count). A smart handle type can make management of all this much easier.

In place of a request model, a borrowed reference can instead be a strong guarantee made by the engine. With this approach the object's holding a borrowed reference are guaranteed that the resource will remain available. The owner is not allowed to release the resource while a borrowed reference exists. By itself, this is little different than shared ownership or shared_ptr; the difference is explained in the following section.

Debugging Support

An additional feature that is relatively easy to add to the lifetime management of a borrowed reference is a back reference. This is a reverse registration of the object holding a borrowed reference. Each resource has a list of the objects that have a reference to it. The list may be an actual set of pointers to the objects holding the references or it may be a simple string or just a pointer back to the handle (which in C++ could be used to find the real object with a slight bit of manual work in the debugger). I personally prefer the string approach.

The smart handle should disallow copies or implicit moves and only allow explicit moving. This move would require setting the debug name. This can be used to register the name of the object holding the reference.

The owner of the resource can dump out the list of these borrowed references on request. If the choice is made for borrowed references being strong references, an error can be raised which prints out the list of reference holders when the owner of the resource tries to release it. The ownership isn't shared, it's checked.

A simplification of a list of back-references is a hard-coded selection of possible references. A threaded job system for example needs to keep track of jobs in its queue. These jobs may belong to job groups that are used to signal the completion of a batch of jobs. In such a design, the job group can be in used by both the job system itself (each job belonging to the group holds a reference to it) and to the user code (which sets up the job group). A reference count could solve the problem. The job group can have at most two references, however. A simple alternative to is to hard code this knowledge and used two flags instead of a reference count. This has the advantage in that debug code can display which of the two flags is set, indicating why the group is kept alive. In this particular example, the count of non-complete jobs could be used in place of the flag, essentially forming a pair of a reference count from the jobs themselves and a liveness flag from the user code. If the list of owners that are worth keeping track of is large, a plain reference count is certainly simply, but the argument can be made that any objects with that kind of lifetime needs should be redesigned.

Efficiency

For an ID-based alternative to shared_ptr, certain optimization opportunities become available.

In the example of a graphics system, it's often critical to be able to sort a list of visible objects by the graphics resources they use in order to perform batching and minimize state changes. One of the most efficient ways to do this sorting is to use an integer key for each object that encodes all resources it uses (material, shader package, mesh data, animation state, etc.). Since shared_ptr is essentially just a wrapper around a pointer, it's difficult to impossible to squeeze more than one of them into a single machine integer. With an ID system it becomes feasible to ensure that each ID remains within some limit, making it much easier to pack all the IDs into a single machine integer (at least on a 64-bit machine).

Another potential efficiency gain with an ID system that is stable is that it could be used for lookups of resources. Files can refer to the ID instead of a string name or the like. This is a really minor gain that's unlikely to be important in many scenarios at all, but it could be useful here and there. It's particularly handy for pre-packed data sources that intended to be copied into memory and used directly with no post-processing or decoding.