Game Development by Sean

Table of Contents

Data structures are an integral part of all of computer science, and the study of data structures makes up a large portion of most students’ school work. While an essentially limitless number of data structures exist, each with their own ups and downs, there are a relatively small number which find repeated, frequent use in the field. Games in particular have their own set of highly useful data structures applicable to many scenarios.

Today I’m going to describe one of the simpler ones: slot maps.

Suppose you have a collection of objects. Each of these objects must be referenced by other objects somewhere in your game/application. However, these objects can be destroyed and created at almost any time, so simply holding a pointer to one of these objects would be bad. Additionally, you may need to refer to these objects in places that cannot directly store a pointer or a smart pointer of any kind, such as across a network link. The logical option then is to assign each object a unique identifier of some kind and then use a data structure to look up an object by id. If the object has been destroyed, the id will not map to any object (because ids are never reused).

#Associative Container Object Map

This can be very easily implemented with any associated data container. A very common pattern seen in a great many apps and games is the following:

    // very slow id-object map

    struct object {
      object() {}
      object(int id) : id(id) {}

      int id;
      // other fields
    };

    int next_id = 0;
    std::map<int, object> object_table;

    int create_object() {
      object_table[next_id] = object(next_id);
      return next_id++;
    }

    object* get_object(int id) {
      auto iter = object_table.find(id);
      return iter == object_table.end() ? nullptr : &iter->second;
    }

    void destroy_object(int id) {
      auto iter = object_table.find(id);
      if (iter != object_table.end())
        object_table.erase(iter);
    }

That little bit of code will allow you to create objects, automatically assigning them a new unique id. Objects can then be looked up by id and removed by id. Nice and simple.

But dead slow.

The problem is the use of std::map. Now, I could go on about std::map and its hash-table based cousin std::unordered_map in an article all on its own. The problem is not necessarily that those data structures are slow (though, well, they are; the C++ standard requires implementations which are sub-optimal for many purposes), but that they simply can’t be as fast as a direct lookup. The absolute fastest lookup we can get is a direct pointer dereference, which we’ve already taken off the table. The second fastest lookup we can achieve is a direct array lookup. Ideally, an object id should be an index into an array, and looking up the object is as quick as dereferencing an array element.

#Simple Array Object Table

Let’s then restate the object lookup code using an array with a simple free list to allocate new nodes in the array. For convenience, our “array” will actually be a std::vector.

    // incomplete id-object map

    struct object {
      object(int id) : id(id) {}

      int id;
      // other fields
    };

    std::vector<object> object_table;
    std::vector<int> free_list;

    int create_object() {
      if (!free_list.empty()) {
        int free = free_list.back();
        free_list.pop_back();
        object_table[free].id = free;
        return free;
      } else {
        int id = object_table.size();
        object_table.push_back(object(id));
        return id;
      }
    }

    object* get_object(int id) {
      return object_table[id].id == -1 ? nullptr : &object_table[id];
    }

    void destroy_object(int id) {
      object_table[id].id = -1;
      free_list.push_back(id);
    }

Not much to it at all. The tricky part is that the id field is cleared out when calling destroy_object(). This is to ensure that requests for that id will properly return nullptr. Rather than resetting the id field of the object, another flag field or flag bit could be used instead. Using the id keeps the code a bit more compact for the blog post.

This implementation has (almost) the same interface as the previous version, but now object lookup via get_object() is almost as fast as it can be. Remove the overhead of debug-build versions of std::vector and this will perfect the same as if object_table was just an array. The difference then between this object table and referencing the objects by pointer directly is negligible in all but the tightest of inner loops.

Note that a big difference with this implementation is that the object instances are not constructed or destroyed during the calls to create_object() or destroy_object(), but rather are reused (the exception being when create_object() is called and there are no free slots left). If some explicit initialization or cleanup logic is required, it can be put in special methods designed for that use case. More sophisticated implementations can also use the array simple as a memory store and use placement new and explicit destructor calls to fully construct and destruct objects in the object_table, too.

This new version has a serious difference from the original version which does change the interface’s semantics in a very important way. This new version recycles objects identifiers. When an object is freed, its “unique” identifier (its index into the array) is put on the free list and eventually reused. This can cause bugs in many use cases. Say, for instance, that a delayed event is put in a queue to send a message to object #27. That object is subsequently destroyed. However, before that delayed event fires, a new object is created, and it gets given the id #27. When the event fires, it is now delivered to the wrong object!

A further problem with this version, which does not exist in the previous version, is that the pointers returned by get_object() can be invalidated by calls to create_object(). It’s not at all uncommon in games to have code like the following (which assumes that our create_object() also has some extra parameters for initializing the new objects):

    object* bomb = get_object(bomb_id);
    // create some mini-explosion particles around the bomb
    for (int i = 0; i < 8; ++8)
	    create_object(bomb->pos + rand_vector());

Note that the bomp’s position is accessed after creating the first explosion particle. If any of the create_object() calls cause the object_table to be resized, that code is going to crash at best and cause a very difficult to find memory corruption at worst. This could be fixed by using a fixed size for object_table so that it never grows, and simply refuse to create new objects on overflow. This is very likely the correct solution for games on consoles and mobile devices. In the general case, we can do something much more flexible.

The required fixes hence are two fold. We need to ensure we have actually unique, unrecycled object identifiers while maintaining our array-access speeds, and we need to ensure that creating an object doesn’t invalidate any pointers to other objects; object pointers should only be invalidated when explicitly destroyed (or even better, during a delayed cleanup pass in the main game loop).

#Chunked Array Object Table

Let’s take care of the pointer invalidation issue first. The easiest and most direct solution here is to replace the std::vector with a list of arrays. On many STL implementations, this could just be a std::deque for convenience, as they are quite often implemented as an array of fixed-size chunks. However, some std::deque implementations use a ring buffer instead, or they use horrendously small chunk sizes, which makes them undesirable for this use case. Writing a custom data structure here is pretty easy.

    // fast object table with id recycling bug unfixed

    const size_t chunk_size = 256;
    std::vector<object*> object_table;
    std::vector<int> free_list;

    int create_object() {
      if (free_list.empty()) {
        object* chunk = new object[chunk_size];
        for (int i = chunk_size - 1; i >= 0; --i)
          free_list.push_back(object_table.size() * chunk_size + i);
        object_table.push_back(chunk);
      }

      int free = free_list.back();
      free_list.pop_back();
      object_table[free / chunk_size][free % chunk_size].id = free;
      return free;
    }

    object* get_object(int id) {
      object* obj = object_table[id / chunk_size] + (id % chunk_size);
      return obj->id == -1 ? nullptr : obj;
    }

    void destroy_object(int id) {
      get_object(id)->id = -1;
      free_list.push_back(id);
    }

There’s a few things to explain here. First, note that I’m not ever deallocating the memory allocated in object_create(). This can be fixed up in a real implementation quite easily, of course. Using a std::vector instead of an object*, or using some other smart pointer, or using arrays of fixed size would all work just fine.

Second, there is that scary division and modulo operation going on. Quite a few folks are instantly turned off by division because it’s “slow.” Of course, division is much faster than doing a lot of memory dereferences in a complex data structure. A hash table, the usual go-to choice of O(1) data structures, generally also includes at minimum a single modulo operation, in addition to all the operations necessary to actually calculate a hash value. Note that on many CPUs, the division and modulo are calculated with a single instruction, so doing both is not particular expensive. More importantly, the choice of chunk_size as a power-of-two means that the division can be done with a simple shift, and the modulo can be done with a bit-wise and operation. Check assembly output in release builds and you should be happy with the results. If your compiler lets you down, it’s quite trivial to change the operations yourself.

The really tricky part here is the free list update in create_object(). The idea there is that we are creating chunk_size objects at a time when we grow object_table, but we’re only immediately using one of them. We decide that the first element in the new chunk will be the object we return (index 0 in the chunk), and so we add all the remaining indices to the free list. We do this in reverse order so that subsequent objects created are inserted in order in the array; this helps to improve data locality for objects created in batches, which tends to be at least a small performance win.

The object table now efficiently allows the lookup of objects with unique (but recycled) ids, and does so in a way that will never invalidate a pointer. The lookup of an object requires two array dereferences rather than one, but this is still a marked improvement over using a tree like std::map or even a hash table (especially one implemented like std::unordered_map).

#The Slot Map

The only problem left to solve is that of the recycled object identifiers. The identifer must continue to be an index that can be used for extremely efficient lookup in the object table, but it must also somehow be unique and never recycled. The solution is to just split the identifier into two parts: an index and a version.

Each id is now two fields. This can be accomplished with simple bit manipulation, of course. If an id is a 64-bit number, the lower 32 bits could be the object table index and the upper 32 bits could be a version tag. It’s possible to use a 32-bit number for the id as well, but this takes a small bit more care in ensuring that version tags don’t roll-over too frequently. It’s easiest and safest to use a 64-bit identifier.

The implementation is simple. Unlike the previous iterations of the code, destroy_object() will no longer clear out the id of the object. Instead, it’ll increment the version field of the id by 1. Additionally, only the index portion of the id is added to the free list.

The get_object() function makes sure to only use the lower 32 bits of the id as the index, and compares to the full 64-bit id of the object at that index with the requested value. If they don’t match, nullptr is returned. The end effect is that when an object is destroyed, its id changes, and the old id is not reused again, even though the object at the same index in the table will be reused.

This is the slot map.

    // complete simplified slot map

    typedef long long object_id;

    const size_t chunk_size = 256;
    std::vector<object*> object_table;
    std::vector<int> free_list;

    object_id create_object() {
      if (free_list.empty()) {
        object* chunk = new object[chunk_size];
        for (int i = chunk_size - 1; i >= 0; --i) {
          chunk[i].id = object_table.size() * chunk_size + i;
          free_list.push_back(object_table.size() * chunk_size + i);
        }
        object_table.push_back(chunk);
      }

      int free = free_list.back();
      free_list.pop_back();
      return object_table[free / chunk_size][free % chunk_size].id;
    }

    object* get_object(object_id id) {
      object* obj = object_table[(id & 0xFFFFFFFF) / chunk_size] + ((id & 0xFFFFFFFF) % chunk_size);
      return obj->id != id ? nullptr : obj;
    }

    void destroy_object(object_id id) {
      object* obj = get_object(id);
      obj->id = (obj->id & 0xFFFFFFFF) | (((obj->id >> 32) + 1) << 32);
      free_list.push_back(id & 0xFFFFFFFF);
    }

And that’s that. The code is certainly the least clean and simple of the implementations presented so far. Industrious coders can clean that up a fair bit. A lot of the simple math can be refactored and split out. The object_id type could be restated as a struct or union. The code could be made more robust as well, such as checking for out-of-bounds accesses with the object ids.

#Further Steps

There are a few different things that could be done with the simple slot map presented here. A lot of tuning can go into its behavior, for one.

The size of the chunks chosen makes the calculation of indices efficient, but better sizes could be chosen depending on the size of the object type stored in the actual slot map. The best size is going to vary by application, access pattern, and so on. Picking the right value requires profiling and understanding the specific application.

The creation of chunks currently needs to touch memory all across the chunk with the loop in create_object(). It would be possible to remove this loop by instead setting the index portion of the id when returning an object. The version tag portion can be left completely uninitialized, since it doesn’t really matter what initial value it starts with. Not iterating over the chunk will help improve performance slightly in many cases.

One major change that can be made is to split the object storage from the slot map itself. The slot map can be a simple array or chunked array of version tags and indices/pointers to another data structure. The object id index is the index into the slot map, but the actual object is then looked up through a second layer of indirection. While this indirection does add overhead, it allows the backing object store to rearrange objects. In particular, this allows for compaction of the object store, which can be useful for some applications.

#Performance Notes

The original goal was to be able to access objects with a simple integer look up in an array. This goal is not quite achieved with a slot map. In particular, the final implementation has two indirections instead of one, and the index is a 64-bit value that may be larger than is ideal for some environments.

In practice, this isn’t bad. The extra layer of indirection in the slot map comes through a very small chunk array. If this slot map is used for game objects or frequently accessed components, it is likely to be in CPU cache. The extra bit manipulations on the index are practically free. The larger object id size puts it on par with a pointer in a 64-bit environment, which we’re moving towards anyways.

That said, if you can put up with the limitations, it can be quite advantageous to use the fixed-size slot map variant that does not require the extra layer indirection. Another alternative is to simply deal with the ideal that creating a new object can invalidate existing pointers, though actually working around this limitation can be difficult for a lot of game logic code.

Likewise, if you know that you will not have many objects, or that objects are only destroyed and reallocated relatively slowly, you can try using a 32-bit object id. But be warned: if the version tag is only 8 bits, then a single slot in the slot map will recycle an object id after only 256 free/create cycles. Because of the way the free list works (and the way you want it to work with CPU cache), any code that creates and destroys a single object repeatedly is likely to trigger this case. Assuming objects are only fully destroyed at the end of a frame, that means it’ll take only 256 frames to recycle the id, which could be well under a second for many simpler games that aren’t stressing the hardware. With any delayed events, network messages, or so on, the odds of seeing a recycled object id in a dangerous context are fairly high under those behavior patterns. Using a 16-bit version tag leaves only 16 bits for the index, which limits you to 65,536 objects total. Finding a safe balance between index and version tag in 32 bits is not easy, and is going to require placing some assumptions on how quickly objects can be destroyed and created. Using 64 bits is much safer.

Obviously, other data structures can be faster than a slot map. A hash table with a perfect hash function and accessed with precomputed hashes gives the ideal access pattern that slot maps strive for: all object lookups are a simple index (modulo array size) into an array. Of course, perfect hash functions don’t exist in the general case. One way to generate a perfect hash function for a predefined set of data is to hash every key in the data and, if there are any collisions, increment/modify the hash function’s seed (most hash functions have one) and repeat the process. This obviously is not something that can be done at runtime in most apps, but it is applicable in a surprising number of circumstances. The final seed need to be saved with the data set and then the resulting hash is a unique for each key. If you need to create new keys at runtime, however, either a slot map or a full hash table is required.

The fastest general-purpose data structure that has the same general properties of a slot map is a hash table. Note that there is a huge difference in quality between different hash table implementations. The implementation of std::unordered_map, for instance, is required to make several trade-offs that don’t make a lot of sense in most contexts, and especially in the games context. A well-written custom hash table can significantly improve on the performance of std::unordered_map and come very close to the slot map. Depending on specific needs, either the hash table or the slot map may be the superior option.

As always, profile and test your code and see which data structure, algorithm, and implementation works best. Never assume.

#Code Link

All of the code in this article, as well as a really hokey and poorly executed test suite, is available on GitHub.</div>