This is the second article in my Zero to Sixty Game Engine Development series.

Find the introduction article here: Zero to Sixty Game Engine Development Series.

The first items we want to make a choice regarding is which common software design patterns we will use. We will make these choices while keeping in mind our desire to get team collaborators and content creators working with us quickly.

Like stated in the introduction article, we also do not want to sacrifice performance or abstractions that enable us to maintain both engine and client code.

Dependency Management

There are a few top-level components that must be acquired at system startup. Namely, these are the Platform and the Graphics Device abstractions. These can go by other names. For our system, we will refer to the Graphics Device as simply the Device.

The Platform component wraps some of the basic boilerplate that each GUI operating system needs, such as the message processing loop and the desktop window (or mobile surface).

It is important that we do not bring in too many unnecessary system dependencies via public include files.

If we end up putting a HWND hwnd private member variable inside of a public header file, then we end up polluting the engine and client code with #include "windows.h". Also, how would we go about making a Platform implementation for Mac or Linux, with "windows.h" in the header?

There are several solutions to this. The Pimpl and the handle/body idioms, as well as all of their variants, work well. These are also referred to as compilation firewalls. See GotW #24 and GotW #100.

These design patterns will keep the HWND type and windows.h include file hidden inside a C++ source module that is only compiled for the Windows platform.

This keeps our dependency leakage and bloat to a minimum and keeps compile times from getting out-of-hand.

We will apply these patterns to the Platform and Device abstractions, as well as many other classes throughout the engine.

Another dependency management technique is less of a C++ thing and involves dynammically loaded libraries, such as Dynamic Link Libraries (DLLs) on Windows and Shared Object (.so) libraries on *nix variants.

On some platforms, such as Windows, client code can choose to use either DirectX or OpenGL. We want to avoid loading all of the engine code and system libraries for these APIs unless the client code wants to use a particular implementation.

We can accomplish this by putting different Platform and Device implementations into dynamic libraries that are loaded at runtime. However, to get started quickly, we will keep everything in one library.

Class Construction

Class construction can be done in a number of ways. Both the RAII pattern and the Factory method patterns are useful.

The factory method pattern will help us integrate our engine into many other environments. For example, if we want to use our rendering algorithms in the Unity 3D editor, we can do so by attaching an underlying hardware API device (e.g. ID3D11Device provided by Unity) to an instance of our engine’s Device.

I generally find more flexibility with custom factory methods than when a class type is coupled with its construction method via the constructor. Sometimes, I go even further than the factory method and implement a full Builder pattern solution for the construction of some classes.

For consistency, and to allow for future integration with different memory managers, I prefer all core engine types to be dynamically allocated from a heap, as opposed to directly on the stack. The core engine types, such as Platform, Device, Renderer and Scene can be quite large and take a lot of stack space.

Polymorphism

There are two broad categories of polymorphism: dynamic and static.

Dynamic polymorphism involves C++ interfaces, abstract base classes, and concrete derived classes. Think virtual methods that can be overridden by derived classes. Classes, in C++, couple both the class interface and its representation. Basic inheritance must also inherit both.

In our engine, we will use dynamic polymorphism for most classes. For some classes, such as graphics resources, we may use other techniques for code reuse and decoupling mechanisms.

The overhead of virtual function calls gets discussed frequently among the C++ community for high-performance applications. We will not worry about that as we get started. We can profile and let actual measurements guide or architectural changes, over time.

For static polymorphism, where the type and behavior of a class is determined at compile-time, the Curiously Recurring Template Pattern (CRTP) pattern is quite flexible. This pattern is especially useful with modern APIs, such as DirectX 12 and Vulkan. The reason for this is because these APIs require a lot more bolierplate code. The CRTP pattern can be used to inject algorithms that are common to these classes across hardware APIs.

I find that CRTP is a good method for ensuring a contract is upheld by all concrete implementations of a class. Interfaces and inheritance do not often give the same contract assurances that CRTP can. Also, CRTP avoids the virtual function call overhead that interfaces and abstract base classes bring.

It is worth mentioning that dynamic polymorphism can also be achieved using the Component Object Model (COM). This model was first implemented at scale at Microsoft with their Object Linking and Embedding (OLE) and ActiveX technologies. Early versions of Netscape and Firefox also employed a similar technique, referred to as XPCOM. These techniques are quite useful, but can lead to additional maintenance overhead. We can always move to a system like COM, if necessary.

Threaded Rendering

Many graphics engines employ the Command pattern so that the client code can queue up render commands on threads other than the render thread. This can often lead to better CPU utilization, especially with legacy graphics APIs, such as OpenGL and DirectX 11. In these APIs, the driver does a lot of validation in many of the API calls. So, leveraging a separate thread for this work can be a benefit.

However, with modern graphics APIs, the situation is a little bit different. Multiple threads can be utilized to fill command queues, in parallel. These command buffers can then be submitted to separate queues on different threads with hardware-supported synchronization primitives protecting shared access to resources.

We do not want to prevent either of these performance optimization possibilities.

On the other hand, I am a firm believer that good multi-threaded systems and applications should be able to scale down to a single thread as well as up to many threads.

Therefore, we will first focus on single-threaded client rendering, in order to make fast progress. In future iterations, we can layer threaded rendering by allowing the client to create Graphics Context objects for different threads.

The result of all of this is that we will do very little internal synchronization within our graphics engine. Synchronization is expected to be external.

Object Lifetime Management

One way to manage object lifetime is via Reference counting. There are two mechanisms for reference counted objects: intrusive and non-intrusive.

An example of an intrusive implementation is Microsoft’s COM API, with its IUnknown interface and AddRef and Release methods. A non-intrusive example is std::shared_ptr.

The COM API has been quite successful. Microsoft used a slimmed down version of COM (a.k.a. nano-COM and COM-Lite) for DirectX. Mozilla implemented a cross-platform version called XPCOM for the Firefox browser. However, it is no longer supported.

Performance profiles of Firefox during the 2010’s revealed that an excessive use of COM and COM smart pointers was the reason. This is largely due to the overuse of COM for every single little component. Just take a look at how a COM object was used even to encapsulate a single parameter value: class ia2AccessibleValue. Even if we were to choose COM for our engine, we will not take it this far. See Diligent Engine for a good example of a custom COM-Lite in-use for a graphics engine.

On the other hand, a major critique of std::shared_ptr is that one pays the cost of multi-threaded memory synchronization even when used in single-threaded contexts. We want our engine to run well in both single-threaded and multi-threaded scenarios.

This can be mitigated by avoiding copy-by-value std::shared_ptr semantics at interface boundaries. However, most of our engine objects will not be internally synchronized anyway. So, the benefit of a synchronized smart pointer does not seem worth it.

On the other hand, we could make use of std::unique_ptr to signal ownership and uniqueness of an object or resource. These would be efficient movable, but non-copyable, types.

However, using std::unique_ptr requires full type definition of the underlying object everywhere they are returned from functions. This can interfere with compilation firewalls and cause more type coupling than desired in the public interfaces.

In conclusion, when most heavy resource objects are created, we will return raw pointers and leave it up to client code to create std::unique_ptr or std::shared_ptr wrappers, along with custom deleters, as needed.

In high-performance systems, I prefer not to use a single solution for automated lifetime management in all situations. Also, I prefer to use smart pointers only at major system boundaries, such as between the rendering system and a game world editor, for example.

Resource Destruction

Hardware resources, such as textures and buffers, need to stay resident while they are in use by the graphics hardware. However, we want our engine objects to destroy themselves when they go out of scope.

A common pattern we will employ for such underlying resources is to use a deletion queue that will ensure the underlying resource is no longer being used by the hardware before it is destroyed.

The wrapping class, however, will be able to be destroyed immediately by our single-threaded client code. The class destructor must move its hardware resources into a deferred destruction object that gets processed later by the destruction queue.

Backend vs. Frontend

The engine code is loosely separated between backend and frontend systems.

The backend systems are those that communicate directly with OS and hardware interfaces, such as DirectX, OpenGL, or Windows API. These consist of Platform and Device.

The frontend is where our simulation algorithms are implemented and are OS and hardware agnostic. These consist of Engine, Renderer, Scene, and View.

We could start our engine’s design and implementation from either side. Starting from the backend ensures that porting and supporting multiple platforms is a possibility. However, this can sometimes result in leaky backend abstractions.

We could also start from the frontend. This has the advantage that it forces us to decide on the simulation or game architecture that we want to build.

Personally, I like to bounce between the two and refine both in an iterative fashion. This makes sure that any architecture decision in either backend or frontend is validated by the systems depending on those choices as early as possible.

Example Client Code

Now for some code. Here, we will look at what these decisions mean for the client API code. Below is a basic setup of the engine along with the rendering and animation of a textured cube to a window.

#include "zto60/Platform.h"
#include "zto60/Engine.h"

#include <glm/vec3.hpp>

using namespace zto60;

int main()
{
    // Create default platform and the main window for the OS, given the window title
    auto platform = Platform::Create(L"Zero to Sixty");

    // Create default graphics device type for our platform
    auto device = platform->CreateDevice();

    // Create the engine frontend
    auto engine = Engine::Create(platform, device);

    // Create the Renderer
    auto renderer = engine->CreateRenderer();

    // Create a Scene
    auto scene = engine->CreateScene();

    // Construct a View, with a default perspective camera
    auto view = engine->CreateView();

    // Process platform message loop, animate, and render the scene using the view
    int result = platform->Run([&]()
        {
            if(renderer->BeginFrame())
            {
            	renderer->Render(scene, view);
            	renderer->EndFrame();
            }
        });

    // Destroy all objects in the reverse order
    engine->Destroy(view);
    engine->Destroy(scene);
    engine->Destroy(renderer);
    Engine::Destroy(engine);
    platform->Destroy(device);
    Platform::Destroy(platform);

    return result;
}

Notice how most of the design patterns are under-the-hood and not entirely visible in the client code. The backend Platform, and Device objects are constructed from factory methods.

The same is true for the frontend Engine, Renderer, Scene, and View objects.

In the client code, we could wrap our objects in std::unique_ptr with a custom deleter. That way, we could directly return the result of calling the platform Run method and not worry about object destruction order.

The Scene object will initially be hardcoded with a spinning, textured cube pushed into the front of the default camera.

In the next article, we will implement this much of the engine and run the spinning, textured cube across Microsoft Windows and Linux, with two different graphics device APIs: OpenGL and DirectX 11.