While I was reading Gabriel Sassone's blog posts about shader systems, I noticed some links that refer to Our Machinery blogs. Then I found out that The Machinery is a new game engine project from the guys behind Bitsquid/Stringray engine.

I liked the most that they always tell advantages and disadvantages of each decision they make in their blogs. You can smell the experience from each blog. Then I was like: "This is amazing, I should do something like this?". And I decided to make a game engine project based on ideas from The Machinery blogs.

When I started this project and this blog post, The Machinery was still in closed beta. But not for a long time ago they went to open beta. This made this "challenge project" a little easier. Because now I have a second source for inspiration: The Machinery source code. (well... headers only but that's okay)

Philosopy

The Machinery is full of good ideas. But the main philosophy is simplicity. All of the ideas around it have based on that principle. They are using C instead of a different complicated language. They are writing easy to configure, hackable code instead of a black box system. I decided to go with C++ because I want some of its features. But I will be using their C based API style.

Plugin system

One of the major features of The Machinery is its plugin system. That is where the name came from. Little plugins work together and bring life to a much more complex thing. Each plugin is a shared library that can be loaded at runtime. This isn't a new thing for game engines, most of them are using shared libraries for code hot-reloading, bring down compile times, etc. But they are generally limited. The common use case is that game code is a shared library and if you change anything in game code, you can instantly load the shared library and see its result instantly. In The Machinery case, every system is a shared library/plugin. Not only game code but you can change something in the engine code and hot-reload it.

I implemented something similar to the system they described in their blog posts. Each system is a plugin and a system called APIRegistry is responsible for loading, registering plugins. APIRegistry is a much more complex thing in The Machinery, I implemented a very basic version of it.

DLL Hotreloading and ClangCL

Implementing a hot-reload DLL system on Windows is unnecessarily complicated. For that I decided to use ClangCL in this project. It is the MSVC compatible clang compiler. It works well with Visual Studio, MSBuild, etc. You can download ClangCL from Visual Studio Installer.

The first problem I encounter is that premake's ClangCL support for vs2019 doesn't work. I opened an issue in their GitHub and it got solved but it hasn't been released yet. If you are using the Alpha 15 or lower version, you can get ClangCL support by setting toolset to msc-ClangCL.

I can tell you this: If you planning to implement DLL hot-reloading you have to use ClangCL. It solves most of the bullshit you have to deal with when using MSVC. With ClangCL, there is no PDB locking. You can override the PDB file without detaching and reattaching the debugger. (I'm not %100 sure how it does this but I think it creates temp PDB files for old PDB and unlocks the actual PDB) You don't have to deal with random PDB names and other kinds of stupid work.

I wasn't sure how the Visual Studio debugger will react to this type of behavior but it works like a charm. It loads the DLL and the PDB correctly and breakpoints work after a hot-reload. But I can't say the same for the VSCode debugger. It works when you first hit a breakpoint, but after hot-reload breakpoints don't work anymore. I opened an issue for this problem too but it is still under investigation. (or forgotten)

There are some things that haven't been mentioned in Our Machinery blogs. One of them is plugin dependency. Some plugins need other plugins to work but The Machinery scans a plugin directory and loads all the DLLs. This means that plugins can be loaded in any order. When I got my hands on source code (headers actually) the documentation said this:

// Calling `get()` on an API that hasn't been loaded yet will return a struct of NULL function
// pointers. When the API is loaded (and calls `set()`), these NULL pointers will be replaced by
// the real function pointers of the API.
//
// To test whether an API has been loaded, you can test if it contains NULL function pointers or
// not.
void *(*get)(const char *name);

It returns a valid pointer that points a struct of null function pointers. But I'm not sure how how the APIRegistry knows requested API struct size before registering.

Physical Design

I think the boldest move in The Machinery is their physical design which is just a rule essentially:

Header files cannot include other header files. (except some basic type defining headers)

This looks infeasible at first glance. How can I use some other class/struct from other headers in a header if I can't include it? Well, the answer is simple apparently: pointers.

Compilers need some information (size, etc.) about structs when they are used in function arguments or other structs as a member. But if you use that struct as a pointer then the compiler doesn't need to know its size because pointer size is fixed. You can get away by just declaring the type, not including its header. This is called forward declaration:

struct SomeType;

struct SomeAPI
{
    void SomeFunction(SomeType* object);
};

In their blog, they talk about some of the drawbacks of this design. It makes it hard to use templates, inline functions, etc. For that, they use .inl files.

.inl files are header files that can include other headers.

While they are being used very sparingly, I feel like this is still cheating. So, I came up with a different solution. I decided to use precompiled header for templates and inline functions. Since there is a rule that you have to include the precompiled header first, we can put our type definitions there too. And the rule becomes like this:

Header files cannot include other header files. (even some basic type defining headers)

You can put all of your templated structs, atomics, intrinsics, math functions and other things to be inlined into the precompiled header. And you can also leverage pch for build speed, you can put your big-boi headers there.

To be able to do that, we need a shared precompiled header for all of our plugins. This shouldn't be that hard right? Well, yes but not with MSVC. Because it requires copying PDB and IDB files of pch, otherwise MSBuild deletes the .pch file for some reason. They posted a blog post about how to implement shared pch with MSVC. This is the reason why I switched to ClangCL, I'm tired of dealing with MSVC bullshit.

One small note for shared pch is that premake doesn't have a way to set the pchpath like symbolspath. But you can use overriding for this:

// Set all project's pch path to same pch path
if _ACTION:startswith("vs") then
    require("vstudio")

    local function precompiledHeaderOutputFile(prj)
        premake.w('<PrecompiledHeaderOutputFile>$(OutDir)sharedpch.pch</PrecompiledHeaderOutputFile>')
    end

    premake.override(premake.vstudio.vc2010.elements, "clCompile", function(base, prj)
        local calls = base(prj)
        table.insert(calls, precompiledHeaderOutputFile)
        return calls
    end)
end

With this shared pch implementation, I can create headers that don't include any headers without any exception. But it has an obvious drawback: If you change anything in the pch, it will trigger a full rebuild of your entire project. But I decided to use it anyway. As long as we don't change the precompiled header too often, we are good.

Memory allocation

There isn't an Our Machinery blog dedicated to memory allocation but they have amazing blogs about containers and data structures. In this project, I decided to implement my own allocators and containers and not use STL. Then I read this virtual memory blog and implemented a linear allocator with the technique described in the post: Reserve a big memory from 64bit address space and commit when you need memory. (duh) My implementation of dynamic array and hash table uses this linear allocator.

Generally, dynamic array implementation's push_back methods take amortized constant time. Because it needs to relocate its contents when it needs to grow. With this implementation, push_back takes constant time. Because there is no relocation, you just commit more memory from reserved address space.

ECS

Again, there isn't a blog dedicated to ECS but I know they are using an archetype-based implementation similar to Unity where you store the same type of entities together. (I know this because I asked this to Niklas)

One difference is that, the systems that run on entities that have certain components called "engine". "System" is a system that runs on all entities. So there is no component filtering and other kinds of things in "system". I went with a similar direction and implemented almost the same API for ECS. But I didn't implement "system", only "engine".

Imgui

Like every hobby game engine project, I wanted to integrate Dear Imgui into this project. In The Machinery they have custom imgui implementation but I didn't want to implement one. While I was trying to integrate, I realized one of the major drawbacks of The Machinery's general design: It is hard to integrate external systems into this architecture.

First, I decided to integrate Dear Imgui as a static library. But if you do that you cannot leverage the plugin system architecture. Then I decided to make it a plugin/DLL. But in this architecture every plugin loaded at runtime, there is no static linking. So, I had to make a wrapper function for every imgui function and put them into an ImguiAPI struct which will be registered through APIRegistry. I don't like this method either but I went with this route.

Conclusion

There are tons of things that waiting to be implemented such as proper renderer API, render graph and shader system for rendering, fiber-based multithreading, and their secret weapon creation graphs. But this is a small foundation that I will be implementing other systems on top of it. Here are my conclusions about this "challenge project" so far:

  • You should use ClangCL on windows if you planning to implement DLL hot-reload without headaches. It solves most of the stupid things coming from CL. But tooling can be problematic when using ClangCL. Be aware of it.

  • As I mentioned previously, I find that it's hard to integrate an external system into this architecture. Maybe they will make a blog post about it.

  • This C based architecture is wildly type-unsafe. Both ECS and APIRegistry systems use names for looking up components or API structs. Maybe I try some of the C++ type stuff for identifying components and make it a little more typesafe.

I'm not sure this blog gives insights or just a time waste. But I am enthusiastic about The Machinery and wanted to share my thoughts. It is very exciting that The Machinery is in open beta now and more people can use it. I hope you enjoyed it.