Resource Pipelines (Part 4 - Dependencies)

November 17, 2018

Run-time Dependencies for Complex Resources
Deploying Run-time Dependencies
Dependency Databases
Dependency “Roots” and Packages
Package Dependencies
Source Dependencies and Baking
Hidden Dependencies
Multiple Outputs and Dependencies
Preloading Dependencies
Dependencies and Variants
Summary

In previous installments of this series, we spoke briefly about the complexities of dependencies in a game’s resource pipeline. Today we’ll delve into all the myriad forms of dependencies and their particulars. More importantly, we’ll go over why the resource pipeline tools themselves need to be concerned about dependencies.

Series Index

Run-time Dependencies for Complex Resources

The first and probably most obvious use of dependencies would be for complex resources loaded by a game. By “complex” here I mean that they have multiple pieces. A basic example would be a character in the game: it likely is comprised of game data resources, meshes, animations, materials, textures, visual effects, sounds, and so on. In order to spawn the character into the game, all of those individual resources must be loaded and created.

For many game engines, this dependencies can be loaded automatically without any special logic. When the game needs a particular texture, for example, it will make a call to a function like texture_pr load_texture(string name) and will return the loaded texture. Engines may offer asynchronous analogs to avoid blocking the main loop, but the gist is the same: the game code requests a resource when it knows that resource is needed and the engine goes and loads the resource.

When one resource references another, we have a dependency. The character.model file might reference character.mesh, character_diffuse.png, character.anim, etc.

At run time, these dependencies could (and in most engines will be) loaded explicitly when requested by the dependent resource’s load routine. That is, model_ptr load_model(string name) might in its implementation make calls to mesh_ptr load_mesh(string name) to load the sub-meshes, and the same for other dependencies like materials and animations.

These dependencies exist, but for many engines there’s no reason for the engine to have any special support for dependencies. They Just Work(tm) as a consequence of one resource referencing another.

Deploying Run-time Dependencies

Simply loading dependencies may be implicit to many engines, but that’s not a complete story from the perspective of the pipeline itself. Namely, we have to worry about making sure we actually distribute the resources to our players, else the engine will fail to load them.

This problem is more complicated than it may seem. Certainly there are naive solutions, like simply copying all resources into the game’s installer. Plenty of games have shipped doing exactly that, after all. Engines that support more sophisticated means still often have mechanisms for blindly copying resources; Unity for instance has an Assets/Resources/ folder, and anything placed in that folder will be packaged and distributed when the game is built (there are other folders with the same or similar function in Unity as well).

This naive approach has a severe gotcha to it though: unused resources would still be distributed to users. This may not sound like much a problem for many small games, where unused resources may be small or simply rare. However, larger productions with many artists and other content creators can easily accumulate many hundreds or even thousands of resources that ultimately aren’t actively used.

Packaging these unused resources bloats the game’s install size. There are absolutely games that have shipped with megabytes of content to download and install that are never used.

Worse, sometimes this content can be even more problematic. A rather infamous case in games is the Hot Coffee debacle (not linked because the topic is rather NSFW) that Rock Star had. The game included in its installation some unused files for a rather adults-oriented scene that was not actually in the game itself; however, modders were able to find and re-enable this scene, which opened a rather understandable controversy about the game shipping a ratings-inappropriate scene to consumers. Where a more accurate dependency database used, this scene would never have made it onto the installation media in the first place, even though the files were all present in the game’s content sources. Not that I expect most of you would fall afoul of that particular gaff, but similar situations do come up; a racing game for example may have some content authored for vehicles for which the game couldn’t obtain licensing and so the vehicle would be cut from the final release, and actually shipping those vehicles (even disabled) on the installation media/download could open the company to a lawsuit.

More specifically,* *these “unused” resources may even still be very valid things the devs want to keep around. The resources may be part of experimental new features new DLC or expansions. That vehicle I mentioned in the previous example that’s unlicensed might still be in negotiations and might be unlocked as a DLC down the road, so deleting the vehicle from source control would be premature and undesirable.

There’s even the case of breaking up an installer. Many games for example have separate high-definition resource downloads, meant to lessen the download size and install space burden for most users while allowing players with beefy PCs to get an enhanced experience.

Whatever the reason, when building an installer it’s often very useful to be able to place different files in different installers and to be able to exclude some files entirely from installers. Moreover, to lessen the development and testing burden, it’s incredibly handy to automate this process to the fullest extent possible.

The common solution to this problem is to track the run-time dependency chains of resources explicitly.

Dependency Databases

In our resource pipeline, as we process our resources, we can track their dependencies and write them to a “database” of some kind. I don’t necessarily mean something like SQL, though that may indeed work just fine. By “database” here I just mean that we can and should write any knowledge we have about dependencies to some storage mechanism outside the resource files themselves.

Think of this like the .d files that some C++ build toolchains will produce. These files are essentially just lists of the dependencies of the source code, noting that foo.cpp includes foo.h for example. That information is already in the source code, but the source code is not easy for tools like the build system to efficiently consume, which they need in order to detect when files might need to be recompiled.

Likewise for our run-time resource dependencies, it can be incredibly handy to have our dependencies spelled out in a purpose-built dependency database or files. That way, when we need to ask question like, “what files do I need to load Level 1?” we have an easy way to accurately and efficiently answer the question. The database allows us to start with level1.map and see which models it depends on, and which meshes and materials those depend on, and which textures those meshes depend on, and so on.

Whether a SQLite database, a JSON file, a bunch of JSON files, or some other format, the key is that we be able to easily answer dependency questions. What other resources does this resource require to load, and which resources require this resource in order to load?

The database by itself isn’t quite enough for our ultimate needs. That is, we don’t know what resources to put into an installer based purely on this list of dependencies. This database will include all dependencies, even dependencies between files that aren’t meant to be installed. For example, if we had an experimental_hero.model file which depends on an experimental.tga texture, that dependency would be in the database. What the database would not necessarily tell us by itself is that finished_hero.model should be put in the installer while experimental_hero.model should not.

That said, we might still want to make a package for both of those heroes, even though only one will ultimately be distributed to our players. We’ve got two separate concerns here: what resources belong in a package vs which packages are releasable. The dependency database answers neither question directly but is required to answer the former.

Dependency “Roots” and Packages

Dependency roots are the resources that we want to ask questions about. For instance, if we’re trying to make an installer for each of our levels, the question we need to ask is, “which resources does each level require?” Even if we’re just trying to make a singular installer that includes all our ready-to-release levels, we’re asking essentially the same question, “for each of our releasable levels, which resources does each level require?”

I chose the term “root” in the sense of a tree data structure, which is perhaps a misnomer since dependencies really form a directed acyclic graph, but it suffices.

Given a dependency database, we can easily answer our dependency questions. If we want to know which resources Level One requires, and we know that Level One is represented by the resource level1.map, we need only recursively walk our dependency database starting with that resource. Look at each entry, and for every dependency not already in the result set, add that dependency and recursively query its dependencies. Done!

No, really, that’s it. See how handy that dependency database was? We didn’t have to load the actual resources (which is slow and memory intensive) because we already saved those dependencies separately.

The real question, though, is knowing what our “roots” are. This will be highly dependent on the game and engine. Simpler games can perhaps even have just one singular root, while more complex games (particularly “evergreen” games that are constantly releasing new content) may have specialized tools just to maintain roots.

In the simple case with a single root, this implies that every single resource the game uses is referenced by another file. A game with many levels might have a World Map for example; that World Map has icons and locations for each level and naturally needs to know which level to load when a user clicks one of those icons, so the World Map actually references the individual levels. That makes things easy; given a root of world_map.screen dependencies on all the levels will be found, and their resources found. That World Map is likely referenced by some UI which also needs to be packaged, like MainMenu.ui. Querying the main menu in the dependency database will thus find the World Map, and hence find all the levels. The game needs to know which UI file to load when it starts up though, and it also needs to know about various other startup information like splash screen images and such, so there’s probably a configuration file like game.config that spells out which files to use. And there we go: just query the dependency database for game.config and you can find every single resource the game uses. One single root finds all resources.

That approach doesn’t work for all games, though. Let’s take the same example, but suppose that DLC is meant to unlock new areas in the world. During development, we want to see all these area, but the shipped game should not see them unless they have the DLC installed. It would be troublesome for a single resource like a world_map.screen to reference every level in this case. Rather, we would expect that the list of levels on the World Map to be populated based on the available content. In this case, the world_map.screen resource might not directly reference all the levels, and so they cannot be found in the dependency database from that root. In this case, we’d want to add our ready-to-release levels explicitly as roots for our main installer, and then for each DLC level add them explicitly as a root to their own installer.

Whether we’re working in the former simple case or the latter complex case, the same tooling can accomplish our goals. A viable strategy is a simple file that lists the packages and their roots. A JSON file might look like:

[
    {
        "name":"game.installer",
        "roots":["game.config","level1.map","level2.map"]
    },
    {
        "name":"dlc.installer",
        "roots":["level3.map"]
    }
]

A simple Python script can read this file to find the list of installers to generate and the associated roots, then query the dependency database for each root to find the full set of required files for the installer, and call the appropriate command to build that package.

Let’s look at Unity again as an example of dependency tracking. I mentioned previously that all Unity assets under a Resources/ folder are packaged, used or unused. What about assets that aren’t under such a folder? Well, Unity only packs those if they’re required by its roots. Unity’s “roots” are configured in the Build Settings as a list of scene files. There might be a hundred .unity scene files in the asset folder for a project, but only those scenes explicitly included in the build settings are packaged. Unity then examines their dependencies and packs all of those asset, recursively. In this way, only the used assets are packaged in the game installer. Other assets can be packaged into Asset Bundles as well, based on build scripting, which is Unity’s equivalent to the JSON file snippet I presented previously.

Package Dependencies

As an aside, be careful with how you build these packages. Using the dependency database naively with multiple packages can lead to the same resource being packaged multiple times, which is wasteful. For example, the DLC for a game only works if the main game itself is installed, naturally, so there’s no reason to re-package resources used by the DLC that are shipped with the main game.

This can be accomplished with some fairly basic set theory operations. For a given package, identify which other packages it depends upon. Exclude resources include in that dependency package and its dependencies, recursively. Your first expansion requires the main game, and your second expansion might require the first.

You can also be clever about splitting up packages. If you have two DLC that can be independently installed, but both depend on resources not present in the main game, you can put those “shared” DLC resources into a separate package. That allows you to avoid downloading/installed those shared resources multiple times.

Source Dependencies and Baking

Thus far, we’ve only actually talked about run-time dependencies. However, that’s not even half the story for dependencies. This is an article series about asset pipelines, so it’s perhaps time - four article in - that we start talking about the resource creation pipeline!

Before we even have resources that can be loaded by a game or distributed on install media, we first have to generate our run-time resources from our content sources. True, not all games have this distinction; some engines are perfectly capable of loading source content formats at run-time, and some games do in fact ship these kinds of resources. Others require content creators to manually export ready-to-run formats. For the most part, though, the rest of the this series is going to be concerned about the case more common to “big games” which may have many gigabytes of source content which needs to be baked into run-time formats.

The biggest and hardest job of the resource pipeline in such games is actually converting all that data from source content into run-time ready formats, and doing so accurately and efficiently. Much like building source code, one essential element to efficiently baking resources is to only rebake when content has actually changed; and like source code, it’s essential that this always rebake when content has in fact changed.

Like the .d files I mentioned earlier, this is a use for dependency metadata. Via such metadata, the pipeline can track the source dependencies for a resource, and from that derive whether the baked resources are up-to-date.

Note that this source dependency metadata is different than the dependency database we talked about previously. The first difference is that the source dependency metadata needs to additionally track information like conversion time or content hashes in order to detect when files change. More notably, though, the dependencies might just be completely different!

For an example, let’s look at our friend Unity, and specifically at scenes and prefabs. When Unity bakes a .unity file for release, it “flattens” any prefabs used in the scene (not those referenced in MonoBehaviour component GameObject-typed properties, but prefabs used directly in the scene hierarchy). This is an example of a source dependency; the prefabs are not needed at run-time by the final game and never need to be distributed with the game. However, changes to these prefabs would invalidate any cached baked version of the scene and require the scene to be rebaked into the Unity Library.

Another potential example would be a tileset file that describes the tile properties and references loose .png files. The pipeline might bake these into a single tilesheet which is far more efficient to both load and use and run-time. The loose .png files don’t need to be distributed to users, but changes to the .png files should be detected by the pipeline and cause a rebake of the tilesheet.

Note that source dependencies are not just a subset of run-time dependencies, either. For that Unity scene example, remember that prefabs in GameObject properties on aMonoBehaviour are not baked. This means that they are needed at run-time, but notably that the baked scene does not need to be reprocessed when those dynamically-spawned prefabs are changed.

Run-time and source properties are wholly independent, and may not be the same files at all even for the same source content file.

Hidden Dependencies

In a previous article I gave an example of a script or gameplay code that formulates resource identifiers on the fly, at run-time. The real problem with that code pattern may not have been entirely clear at the time, but hopefully it’s clearer now in the context of dependencies.

For a refresher, we’re talking about game code that does something like spawn_object('item_' + rand(0, 10)), i.e. randomly forming a resource identifier and loading it.

That approach is problematic because our resource pipeline can’t “see” those dependencies. The tooling can’t easily inspect the code or script and see that it references item_0 through item_9 and add them to the dependency database. And if those files aren’t in the dependency database, we can’t automatically ensured they’re put into the appropriate installer.

It is essential that the resource pipeline be able to find all dependencies, or at least all of them that are ever required for a real player to use (dev-only or debug-only resources are another story). This in turn implies that all resource references are exposed either in introspectable code or file formats, and not in Turning complete languages with run-time only execution models.

Multiple Outputs and Dependencies

In the simplest of cases, when the resource pipeline is processing a resource, it is producing a single output for each input. A source .png file might be processed into an output .ddx file, for example. However, not all cases are so simple.

A single .fbx file can contain multiple meshes, animations, and material definitions. For a given engine, these might be stored in individual file resources, meaning that an input of one .fbx can create multiple output files. Each of these output files might have different run-time dependencies, too. The animations extracted from the .fbx file probably don’t depend on much, but the materials will depend on their textures. The meshes will depend on those materials, and the model as a whole probably depends on the meshes and the animations.

Those are the run-time dependencies, though. For source dependencies, there’s probably not much reason to give each output a separate set of source dependencies. If any output needs to be reprocessed, the .fbx as a whole is likely going to be reprocessed and all of its outputs recreated. This isn’t necessarily the case, but it’s a large simplification to the pipeline that will typically have little overhead in the end (though we’ll talk about some optimizations for such in a later article in this series).

The distinction to make here is that our dependency database (run-time) is only needed for the output files of the resource pipeline. It will in fact probably be generated by the resource pipeline, and if this database is truly a single file it may even be generated in a single post-process step at the end of the pipeline. The dependency metadata on the other hand is only needed for source files for the resource pipeline.

That said, the dependency metadata also has some metadata regarding outputs. The reason is that the dependency metadata needs to know what output files are produced for out-of-date checks just as it needs to know this for source dependencies; this ensures that if a user deletes or modifies an output file directly, the pipeline can detect this inconsistency and correct it.

Preloading Dependencies

We’ve covered both run-time and source dependencies, and thus far it’s only taken me ~3,300 words to do so. We still haven’t covered all the complexities of dependencies, however!

In a previous entry of this series about engine reference types in source code, I mentioned the concept of dependency contexts and multiple types of dependencies used at run-time, specifically around supporting pre-loading. I used the terms hard and soft to denote the differences.

The problem comes about from pre-loading. A number of engines have reason to try to know in advance what resources will be loaded. This used to be essential for spinning disc-based games to ensure the IO was optimized for the abysmally slow random read speeds of CD/DVD drives, and seemed to be getting less relevant with today’s systems all using SSDs and storage mediums with fantastic random read performance. However, with more and more games downloading content on demand over the Web, we’re back into the days where pre-loading is quite relevant, since we want to stream files down while we’re doing other loading and processing work. Even without the storage speed concerns, some engines and games benefit from pre-loading into memory any resources that it can.

This all in turn means we need to know what to pre-load. The natural choice might seem to be our dependency database, and that certainly makes sense. There’s a small flaw with this plan given the capabilities we’ve outlined so far; namely, a naive dependency database would not be able to distinguish between dependencies needed to just to load a resource versus those that are eventually needed by a resource.

For example, say we have an loot_table.json file. This resource outlines a ton of different items a player might gain from treasure chests in a game. There might be a thousand entries, each a dependency on another resource which defines a specific item. What do we do with these dependencies?

For the packaging and distribution case I outlined earlier, we clearly need to make sure we package all thousand of those items. Otherwise we’d have loot table entries that won’t work outside of developers’ own machines, which would be no good. Even if we’re not using a traditional installer and are individually addressing file over a CDN, we probably want to know to download all those resources early on so we don’t have huge hitches from downloading resources mid-game where we can help it. At the very least we want to download all the resources for the current level!

For the run-time loading case, we do not want to pre-load those thousand items. Depending on the device on how heavy an item is in the game, we may have trouble keeping all thousand items in memory. It’d certainly be slow to load all those in any case. We only want to load an item when it’s actually spawned at the time a player opens a treasure chest.

The treasure_chest.json object however is a bit more traditional. It has a dependency on the loot table that it uses for spawning items, but it also has dependencies on typical resources like its mesh, material, sound effects, and so on. Not only do we want to make sure we package and distribute the treasure chest with all of its dependencies, including the loot table and its dependencies, we also want to make sure we can pre-load those meshes and textures when loading a level that includes a treasure chest.

And there we have the difference between soft dependencies, which are only used for packaging and distribution, and hard dependencies, which are also used for pre-loading resources.

Engine and tools code will need to represent those with custom types or flags. The dependency database generated by the resource pipeline will also need to make these distinctions if that database is used by the pre-loading support in the engine. A number of games use the concept of a manifest file to denote these kinds of dependency sets, and using a dependency database to generate such a manifest is a very easy and obvious approach.

Dependencies and Variants

There is one particular large monkey wrench that we have to toss into the dependency machine. Namely, build variants. These are things like XBox vs PC builds, high-def vs standard-def content packs, and language locales. For some platforms it may even be things like 32-bit or 64-bit support or other multi-ABI support concerns.

Roughly, dependencies might change based on the active variant. A user in the US speaking English might need to load menu.en_US.ui while a user in Germany might need to load menu.de_DE.ui, as a simple example.

The dependency database could certainly encode these kinds of variants and make it possible to generate different packages or manifests for each. Even if the underlying engine solves this problem by using the same resource identifier for different variants of the same file (e.g., there’s just menu.ui which is mapped at run-time to the appropriate real file), the dependency database still needs to be cognizant that there are multiple distinct files.

The correct approach here can vary a bit by context. For entirely separate platforms, or in publishing terms, entirely separate SKUs, it may make the most sense to just have separate output files and hence different databases. This is unlikely a huge burden on most developers as they tend not to make local builds for multiple devices at a time, and CI/CD systems can cope with the extra duplication reasonably well (and we’ll go over some techniques to essentially eliminate that overhead in a later article).

For platform variants, like multiple ABIs, and for localization concerns, this is a bit trickier. We need to bake multiple versions of a resource, and we need to be able to distinctly package them (so we can make a German language download and an English language download for our game, for instance). For pre-loading purposes especially, we only want to load the active variant’s resources (locale or ABI) and not all variant’s resources.

Given the engine likely already handles remapping resource identifiers to real files, we don’t need to invent a lot of new machinery here. For the most part, we just need to be careful and cognizant that the dependency database is going to need to track both resource identifier and real file names. The latter is necessary for packaging and distribution needs while the former is needed for pre-loader and manifest generation needs.

Source metadata is (mostly) immune to these concerns, thankfully, since it is almost exclusively concerned with real files and not resource identifier abstractions (mostly).

Summary

We covered the multiple types of and uses for resource dependencies in a resource pipeline. It’s a big topic and, frankly, I was a bit confused by it all when I first started working on resource pipelines myself. In particular, it’s hard to get a sense of why things need to be so complex until actually running into all those cases.

If an engine in early development doesn’t have any notion of pre-loading, for example, it’s hard to grasp the actual need for soft vs hard dependencies. The lack thereof will be obvious and painful if, down the road, pre-loading is added to the engine.

Likewise, many game developers don’t deal with packaging and distribution, and especially in minimizing download and install size, until a fair bit later in the project; maybe not even until getting close to alpha or even public beta. An engineer working on a resource pipeline may not see some of the value of a dependency database until they start needing to prune resources from an installer and are forced to do so manually or with big lists of file whitelists or blacklists.

Especially insidious is the temptation to allow those hidden dependency problems. It just doesn’t seem like a problem at all until it is, and at that point it can take considerable time and effort to find and fix all of them.

The lesson of this entry in the resource pipeline series then is to be aware of the myriad uses for resource dependencies and to keep those in mind during all stages of development. Even if a new engine’s tools don’t need track dependencies yet, architect with the assumption that the engine will need to do so eventually.

Game Development by Sean