Resource Pipelines (Part 2 - Reference Identifiers)

October 12, 2018

Resource Identifiers
Types of Identifiers
Pros and Cons
Usability
The Trouble With Conversion
Recommendation
Up Next

This particular entry is not going to be the most exciting. We can’t quite get into the meat of resource pipelines without getting this particular topic out of the way, though, so we’ll start here.

Series Index

Resource Identifiers

The first important element of any engine and its accompanying pipeline is determine what sorts of resource identifiers are going to be used. That is, how does the engine and its accompanying tools uniquely identify a given resource. It could be path names, it could be GUIDs, it could be content hashes, it could be anything really.

It is important to remember that a resource identifier is always an abstraction. They might look like paths in some engines most of the time, but they’ll not really be real paths. The reason is two-fold.

First, resource identifiers should be portable. The same identifier should work whether the game is running on Windows, Android, XBox, or the Web. It thus cannot be an absolute OS path and it be cognizant of the issues with different directory separators backslash (\) and forward slash (/).

Second, some resources have sub-resources. For example, an .fbx file can contain multiple meshes as well as animations, among other things. Game code will often need to uniquely reference a particular sub-resource. Engines that work primarily with path-like resource identifiers then will often have some form of referencing content within a file. Examples might be to treat composite resources as directories, as in "objects/enemies/orc.fbx/attack_animation", or they might work with specially-segmented strings, as in "objects/enemies/orc.fbx:attack_animation", or they might even store resource identifier as a composite object.

Types of Identifiers

There are plenty of options, each with strengths and weaknesses. It’s entirely possible to use different kinds of identifiers in different circumstances, too. For example, Unity used GUIDs as unique identifiers in its editor and uses a relative path as unique identifier at run time. The new Addressables system allows Unity developers to pick their own resource identifiers, too.

In addition to a full path, an engine may use hashes of paths rather than a full path. The advantage of such an approach is that a hash is typically a 32- or 64-bit integer which can be efficiently handled in memory at run-time in the engine (even more so than a GUID), especially as they’re PODs that fit in machine registers. Such hashes are then often paired with their string representation or an index mapping the hashes back to strings for debugging purposes, which may then be stripped out of the release game builds.

Yet another potential option is to use a content hash. The resource identifier for the file "orc_diffuse.tga" would be a hash of the byte contents of the file, be that a cryptographic hash like SHA2 or a simple hash like FNV-1a. Either way, the idea is that duplicate copies of a file will all share the same hash because their contents are identical. That is especially helpful for resources that are streamed over the Internet, such as from a CDN, and it implicitly versions the resources since any changes would result in a new hash. This scheme typically only works at run-time since the files will change so while content creators are authoring the content.

The most important element of a resource identifier is that it uniquely references a particular resource. Whether the identifier is the string "objects/enemies/orc.json" or the GUID {e81fa7c8-7fda-4201-af92-279cc01cb8dd} or the integer 12783475, all that really matters is that the identifier is unique, and that given the identifier you can determine which resource to load.

Pros and Cons

There are some reasons to use different kinds of identifiers, of course. Paths tend be really friendly to regular users; it’s hard to know that 12783475 is the identifier of the Orc gameplay file, but it’s super easy to remember that "objects/enemies/orc.json" is the Orc. Paths also have the advantage that the resource identifier used in the engine looks familiar even in third-party tools like Maya or File Explorer which work only in terms of OS paths.

Paths, however, have some key weaknesses. Namely, they are fragile in the face of renaming. If the Orc gameplay file references "textures/orc/diffuse.tga" and an artist renames that file to "textures/enemies/orc_diffuse.tga" then gameplay file now has a dangling and invalid reference.

Using an abstract identifier, like a GUID, provides a means of making resource identifiers durable. Unity does this by adding an accompanying .meta file with each resource, and assumes that users renaming files will also know to rename the .meta file. This .meta file is used to associate a GUID with the resource itself, as well as carry additional Unity-specific information about the resource.

This isn’t a perfect solution, as it is entirely possible to forget to move the .meta file, or to make a copy without changing the GUID. Most third-party tools won’t know to handle these files, either.

It’s still possible to handle renames while using paths as a resource identifier. The last large in-house engine with which I worked used paths (and hashes thereof) as their resource identifiers. To handle the file moving issue, the tools would create redirector files when an resource was moved or renamed. These redirectors were specially supported by the engine’s file system module and acted essentially like POSIX symbolic links. A tool run periodically would scan all resources in source control and fix up any resource identifiers that had redirectors.

That approach avoids having tons of .meta files, but is still prone to misuse. Any file renamed must have a redirector file placed in its old location for the fix-up tool to function properly. Again, third-party tools won’t know to create those.

For these reasons, it’s often a bit of a toss up between which resource identifier scheme is best. Extra tooling and education is going to be required for content creators to work efficiently and reliably. Choosing an identifier scheme will thus rely a lot on purpose and workflow, such as whether a CDN will be used for content delivery.

Usability

Path names certainly represent the most convenient resource identifier for regular users, at least in basic cases. The need to care about sub-resources adds some complication, but nothing difficult to handle.

Using abstract resource identifiers is a lot more complicated. Users are not going to remember the GUID of a resource. Users’ third-party tools are going to be working with paths even if the engine isn’t.

It will be important for the engine’s content creation tools to be able to map back and forth between raw OS paths and whatever resource identifiers it uses. This mapping may not be needed at run-time for all engines, but it’ll certainly be needed during development.

The point being that even when you work to avoid paths because of their instability, you’re still going to need to deal with them.

The Trouble With Conversion

This is part of a larger topic we’ll cover in its own installment later, but it’s worth bringing up now since it ties directly into resource identifiers.

During resource conversion, the type of a file will very probably change. Textures will be converted from PNG or TGA files into something like compressed DXT or ETC files. Text gameplay resources in JSON or XML will be converted into binary formats BSON or MessagePack.

Typically of course we expect a file extension to match the file type. This poses a problem during conversion where file types are changed. If a source file is "textures/orc/diffuse.tga" and it’s converted into an ETC file, it might make sense to name that output "textures/orc/diffuse.etc".

This is another rename problem. If source files are referencing files by their unconverted path name and the name packaged for distribution is different, this naturally can break references if path names are used as resource identifiers.

There are various solutions available to this. Some engines that rely on paths just don’t rename files after conversion, so the converted version of a TGA file keeps the .tga extension even if the converted file isn’t actually a TGA anymore. Other options include maintaining a manifest that maps from the source file names to the converted file names; this is a necessary solution when using GUIDs or other abstract resource identifiers anyway, of course, so it isn’t quite the implementation burden it might seem.

Recommendation

Resource identifiers affect various parts of the engine pipeline and tooling. I believe it’s fair to say that just about anything works and that you can’t go completely wrong with any choice. That said, it would be relatively unhelpful of me to not give some kind of recommendation, if for nothing else than as a starting point.

My recommendation for newer tech would be to stay compatible with Unity - i.e. using .meta files - which gives a number of advantages. Ecosystem and tooling already exists that treats those files specially because of Unity’s presence. Most of those tools don’t actually care about the specific contents of the files, either, which provides flexibility. If you’re in a pinch and not sure, you could do a lot worse than using GUIDs stored in .meta files next to the real source resources.

That said, my actual preference is a tad more complicated. I find working with paths and hashes a lot more natural during development. While there’s complications to deal with renames, the problem is not insurmountable.

Up Next

In the next installment of this series, we’ll talk about the engine side of resource identifiers. Namely, how identifiers are represented in code and how identifiers should be represented in resources.

Game Development by Sean