Filip Smola

Categories

  • Open Sea

Tags

  • C++
  • data structure
  • data-oriented design
  • memory allocation
  • meta-programming
  • templates

When writing the few component managers that are already being used, I noticed that I used a very similar approach for storing the data for all of them. This got me interested in whether I could abstract this approach as a new data structure, and base all future component managers on it. I wanted this data structure to efficiently store structures of simple data associated to entities, preferably giving a choice of how the data is laid out to work well with however it will be accessed. This took longer than expected to design and implement, due in part to how complicated the solution turned out to be as well as my university work taking priority. But now I have what I consider a good basic implementation. I call it a table due to how similar it is to relational database tables with primary keys, and it is contained in the open_sea::data namespace and the relevant pull request is available here.

My implementation is based on the approach of Stefan Reinalter from his blog post on Implementing a semi-automatic structure-of-arrays data container. Some adjustments were made to better fit my purposes and style, but I wouldn’t have been able to finish this without his great description of how templates can be used to achieve what I wanted.

Design

What I wanted as a result was an easy way to set up data storage for components without repeating code, while being able to choose the appropriate layout in the implementation. I realised this as an abstract class Table that defines the general interface, with each data layout being a concrete subclass of that. As mentioned, this data structure acts very similarly to a relational database table, with the entity in this case acting as the primary key. I also usually refer to the data stored for a particular entity as a record.

The two currently supported layouts are Structure-of-arrays (SoA) and Array-of-structures (AoS), which I introduce in the next subsection. I then introduce a kind of optional index used in some table functions. In the last subsection I go into some detail on the table interface.

Array-of-Structures, Structure-of-Arrays

The data we store in the table is a set of plain data structures, which have top be laid out in some way within the allocated block of memory. This table implementation currently supports two layouts.

The simplest approach is to store this data one record after another, in the same way a simple array or std::vector would. The other approach is to think of the members of each record, and lay out the first members of all records followed by all the second members and so on. If we think of the whole table as a matrix with records as rows, we can think of the first method as row-major and the second method as column-major. We will call the first method Array-of-structures (AoS) and the second method Structure-of-arrays (SoA), because that corresponds directly to how they are implemented.

While AoS is a simple method that anyone who has programmed with arrays will know how to use, there are some advantages to using SoA instead. When we iterate over the data, the processor always caches a region around the accessed address to speed up later accesses to adjacent data. With AoS this means the rest of the current structure as well as a number of other structures. This is great if the processing we are doing on the records always uses all the data in them, but especially for components that is not always true. For example, as mentioned in Reinalter’s blog post, if the data is used for rendering but some of the records have their visibility flag set to false, most of the cached data for that record will not be used. This means that the processor brought in data unnecessarily, wasting performance.

But with the SoA layout (and some care when doing the processing) some of this waste doesn’t happen. In the same example, accessing a visibility flag would cache the surrounding visibility flags, speeding up the access for those records, but only the visibility flags and thus more of them at once.

Another example where this difference can be seen is if we wanted to add a constant to a single field of a large number of records, for example shifting a section of the world in a transformation component. With the AoS layout, each iteration would cache all the members (e.g. translation, rotation, scale, …) in that record and then modify one of them and move on. On the other hand, with the SoA layout each iteration would cache a block of relevant members for a number of records and modify all of them on subsequent iteration.

Which method is better suited for a particular component manager depends on what data it stores and how it will be accessed. While the AoS layout is simpler to understand, for some cases the SoA layout can offer better performance. For that reason I chose to support both and to make it as simple as possible to switch between them with minimal changes to how they are used.

Optional Index

For some purposes, it is useful to be able to talk about indices of records in the table, particularly when we want to introduce some kind of internal structure. The advantage of indices over pointers is that they stay valid through reallocation, and thus we only have to worry about breaking them when we change the structure of the contained data (e.g. removing some record). One example of this in action is the transformation component, which maintains tree-like relationships between records.

The basic idea is just a simple index into the data arrays. However, there is a problem with this when we try to look up the index of a record for an entity that has no associated record in the table. Then we need some way of returning an invalid index, something like Nothing in Haskell’s optional monad.

Two naïve attempts are to use a signed integer type and dedicate negative values as invalid, and to use an unsigned integer type and dedicate zero as invalid. Both would work, but they have problems. The first one would sacrifice half of the type’s range to express this essentially single special value. While the second approach only sacrifices one value, we now need to shift all values by one which introduces a decrement operation every time we use this index.

However, taking the second approach but designating the maximum value yields the best approach. So I chose to represent this optional index using the structure opt_index that wraps a single size_t value and provides a number of operations for checking validity and unwrapping the value. The designated invalid value is set as std::numeric_limits<size_t>::max() and we can check for it with the function is_set(), while the function get() simply returns the contained value without any extra operations.

Table Interface

As already mentioned, there is an abstract class providing the general interface for the tables. This makes it easy to switch the implementation used with minimal changes to the code that accesses the data. The implementations then differ in the efficiency of the offered actions, but not their range. In this section I will describe that interface, leaving some more interesting implementation details to later sections.

The table is a class template parametrised with two types – one for the key and one for the record. When being used in component managers, the key type is the ECS entity while the record is a structure of data the component associates with each entity. However, due to some requirements of the implementation, the record type also has to provide extra information about its contents: the number of members and a pointer variant of itself. See for example the structure used as record for the transformation component:

struct Data {
    static constexpr size_t count = 8;
    struct Ptr {
        glm::vec3 *position = nullptr;
        glm::quat *orientation = nullptr;
        glm::vec3 *scale = nullptr;
        glm::mat4 *matrix = nullptr;
        data::opt_index *parent = nullptr;
        data::opt_index *first_child = nullptr;
        data::opt_index *next_sibling = nullptr;
        data::opt_index *prev_sibling = nullptr;
    };

    glm::vec3 position;
    glm::quat orientation;
    glm::vec3 scale;
    glm::mat4 matrix;
    data::opt_index parent;
    data::opt_index first_child;
    data::opt_index next_sibling;
    data::opt_index prev_sibling;
};

To ease potential future changes and simplify some of the implementation code, the table defines the following three type aliases:

  • key_t is the key type
  • record_t is the (direct) record type
  • record_ptr_t is the record reference type

The most basic actions are adding and removing records, named add and remove respectively. There are two versions of add, one for adding a single record and one for adding a batch of records. Both versions do nothing if any of the keys being added are already in the table. The main advantage of the single add is that it returns an opt_index which is set to the position of the added record or unset if the adding failed (e.g. due to key conflict). This is for example used in the transformation component to ensure the proper connection between added records, for example as siblings under a common parent. On the other hand the batch add only returns a boolean which is true if the structure was modified, i.e. any records were added.

The remove action also has two versions, this time both versions remove a single record but they differ in their arguments. One version removes a record based on the key it is associated to, while the other version removes a record based on its opt_index. However, both versions return the opt_index of the removed record, or an unset opt_index if removal failed, for example when the key doesn’t have a record associated.

To convert between keys and indices there is the lookup function. Given a key, it returns the opt_index of its associated record, or unset if there is none. Given an opt_index it returns the key associated with the record at that index, or throws an std::out_of_range exception if the index is beyond the end of the table.

The first way to access the data stored in the table is to get its copy, using the function get_copy with its index or the associated key. If such a record is found, the function returns an instance of the record type filled with a copy of the data in that record. If there is no such record, the function throws an std::out_of_range exception. This method is mainly useful when we only want to read the data. Its main advantage is that the copied data is immediately independent from the table, which means that it stays valid when the table is reallocated.

The other way to access the data is to get a reference to it using the get_reference function. This time there are four variants, one finds the record based on its index, another based on the associated key, another is a batch version based on an array of keys and the last takes no argument and refers to the first record in the table. All three single versions return instances of the record reference type filled with pointers to the relevant data, while the batch version fills an array with these. The two single versions with arguments throw an exception when no such record can be found, the batch version fills the output element with null pointers when its relevant key has no associated record, and the first record version is undefined when the table is empty. Moreover, there is the function increment_reference which increments a reference to point to the next record in the specific way that particular implementation requires. Note that this is done without checking the original reference, so the output can be invalid if given an invalid reference (e.g. due to reallocation) or reference to the last record in the table.

Conceptually, there are three ways to refer to a particular record: key, index and reference. The most reliable is the key, because it will always find the right record even when the table is reshuffled (e.g. due to removal) or reallocated. However it is not always desirable, as it requires multiple steps to resolve. Less reliable is the index, because while it stays valid through reallocation it is invalidated by reshuffling. The advantage of the index is that it can skip the key lookup and directly access the data array. Least reliable is the reference, which is invalidated by both reshuffling and reallocating. However, the reference is fastest because it points directly to the data.

For example, the transformation component uses indices to maintain its tree structure because that is faster than using the keys while it persists through reallocation unlike direct references. The problem of reshuffling is then handled explicitly in all actions of that component manager that can reshuffle the data (e.g. removal) by correcting all affected records.

Finally, there is also a number of functions meant to query the state of the table itself rather than its data:

  • size returns the number of stored records
  • keys returns a vector of the keys that have records in the table
  • allocated returns the number of records for which the table has allocated space
  • pages returns the number of pages allocated to the table (both implementations use page-based allocation)
  • type_name returns a string name of the implementation to easily identify it when debugging

In summary, the table interface supports adding and removing elements, and accessing the stored data by copy or reference. Most functions have variants to refer to records either by the associated key or by the index. Moreover, where that makes sense there is also a batch variant of the function allowing implementations to take advantage of potential optimisations. This should make the table interface fully usable at least in my intended use case of component managers, while allowing different implementations to take advantage of different access patterns.

Implementation

In the previous section, I describe the setting and interface but not how the implementations can be achieved. That task in itself is not simple. The main difficulty is that any implementation that plays around with the memory layout of the record type needs to somehow be aware of its members. To be useful, this needs to be done in an automatic way to avoid similar but not exactly equal reimplementations for each record type used. As already mentioned, my approach to this is largely based on Reinalter’s blog post which describes how templates and type based dispatching can be used to achieve this.

This problem is felt in a lesser way with the Array-of-structures implementation, because there knowledge of the members is only needed in the batch add, reference get and reference increment. For example consider the reference increment function, which takes a structure of pointers to members of the record (i.e. the record reference type) and increments each pointer in it. To do this, the function needs to know how many members there are to visit them all, and what type they point to in order to increment them properly, on top of knowing the layout to know what incrementing precisely means. This uses a templated function to invoke a helper function object for each member of the record type:

void increment_reference(record_ptr_t &ref) override {
    util::invoke_n<record_t::count, IncrementHelper>(ref);
}

This invoke_n<N, F>(args...) unwinds to:

std::invoke(F<0>(), std::forward<Args>(args)...);
std::invoke(F<1>(), std::forward<Args>(args)...);
...
std::invoke(F<N>(), std::forward<Args>(args)...);

In this case the function object used takes the Nth member of the record reference type, increments it by the size of the record type in bytes (as there is the rest of the current record and start of the next in between) and writes the properly casted value back into the reference:

template <size_t N>
struct IncrementHelper {
    void operator()(record_ptr_t &ref) {
        typedef typename util::GetMemberType<record_ptr_t, N>::type member_type;
        auto member_p = util::get_pointer_to_member<record_ptr_t, N>();
        std::invoke(member_p, ref) = (member_type) (((unsigned char *) std::invoke(member_p, ref)) + sizeof(record_t));
    }
};

Details of the implementation of invoke_n, GetMemberType and get_pointer_to_member can be seen in the repo, and they are better discussed in Reinalter’s blog post.

Similar approach is taken to perform all actions that rely on knowing the contents of the record type. This reduces the reimplementation to only specialising the relevant templates with each new record type used, specifying the order, type and name of all its members. I also made a macro for this purpose to keep it as simple as possible. See for example the calls required to establish all members of the transformation component record type:

SOA_MEMBER(open_sea::ecs::TransformationTable::Data, 0, glm::vec3, position)
SOA_MEMBER(open_sea::ecs::TransformationTable::Data, 1, glm::quat, orientation)
SOA_MEMBER(open_sea::ecs::TransformationTable::Data, 2, glm::vec3, scale)
SOA_MEMBER(open_sea::ecs::TransformationTable::Data, 3, glm::mat4, matrix)
SOA_MEMBER(open_sea::ecs::TransformationTable::Data, 4, open_sea::data::opt_index, parent)
SOA_MEMBER(open_sea::ecs::TransformationTable::Data, 5, open_sea::data::opt_index, first_child)
SOA_MEMBER(open_sea::ecs::TransformationTable::Data, 6, open_sea::data::opt_index, next_sibling)
SOA_MEMBER(open_sea::ecs::TransformationTable::Data, 7, open_sea::data::opt_index, prev_sibling)

These macros, and the correct member count constant and reference type in the record type are all that is needed to make a structure compatible with both table implementations.

Page-Based Allocation

When a table needs to add more records than it has allocated space for, it will first reallocate itself to have more space. This is done by computing the space required to store the current data and the data being added, requesting a memory block of at least that size, copying all the current data to this new space and deallocating the original space. After that adding of new records can proceed, guaranteed to have enough space to complete.

In an earlier implementation this allocation was done with bytes using the standard allocator. The size of the new block was computed by doubling the current capacity (or 1 if the current capacity is 0) until it was at least as big as the required space. This meant that each request, no matter the number of records added, would result in at most one reallocation. Moreover increasing the size geometrically (i.e. by a constant factor every iteration) leads to cost of reallocation in a sequence of additions being amortised.

However at a later date I chose to change this allocation for page-based allocation. Now the size of the allocated space is considered in memory pages instead of bytes (although byte sizes are still used for the actual allocation call), and all allocation is page-aligned. This means that all allocated blocks start and end at the page boundary. The number of pages to reallocate is now computed as the least power of two that can cover the required space:

// size is the target capacity
long pagesize = sysconf(_SC_PAGESIZE);
unsigned long space_needed = size * sizeof(record_t);
unsigned long pages_needed = space_needed / pagesize + (space_needed % pagesize != 0);
size_t pages_target;
if ((pages_needed & (pages_needed - 1)) == 0) {
    // Pages needed is a power of two -> use that
    // See http://www.graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2 for power of 2 test
    pages_target = pages_needed;
} else {
    // Not a power of two -> find the nearest greater one
    // The nearest greater power of 2 has 1 in the position of the last leading 0 and 0s everywhere else
    int shift_by = ((sizeof(unsigned long) * 8) - __builtin_clzl(pages_needed));
    pages_target = 1 << shift_by;
}
size_t space_target = pages_target * pagesize;

This space is the allocated using aligned_alloc with alignment to page size:

record_t *target = static_cast<record_t *>(aligned_alloc(pagesize, space_target));

Thus there is at most one reallocation for any request, and because the number of pages scales geometrically the cost is again amortised. On top of that there is now no external fragmentation with respect to memory pages, any page that contains a part of the table is fully usable by that table. This should help to keep the performance even better when tables are accessed frequently and store a large amount of data.

This step was inspired by the Niklas Gray’s blog post Virtual Memory Tricks on the Our Machinery blog. The post describes other interesting things we can do with the large address space 64-bit systems offer and with the memory mechanisms they use.

Known Issues

While I tried to find and fix any issues, there is still a handful of known issues with my implementation. I expect that there is also a large number of as of yet unknown issues due to the large number of moving parts in this system.

One issue I would like to solve some day is the type of data tables can store. Currently that data is limited to plain data structures but I would like to be able to store more complex objects as well, for example smart pointers. The problem lies in construction and destruction. While plain data can be copied/moved around memory freely, more complex objects require careful use of constructors and destructors. Fixing this would require changing the parts of the table implementations that manipulate the data in memory to adhere to the correct protocol, for example using placement new. Unfortunately I haven’t been able to get this right so far.

However, while being able to store more complex objects could be useful, the main use case of component managers doesn’t require it. Components are ideally comprised of simple data with complex operations done by outside systems.

Another issue is caused by how page-based allocation is implemented. The way the page size is queried and then the allocation is performed are both Linux-specific commands. This is because I am developing and testing this on Linux, and I don’t have the experience to develop a more portable solution and convenient way of testing it. This means that anyone trying to run this project on a different platform will first have to implement these two steps in a way that works for their platform.

Last issue is loosely connected to that. The whole idea of using page-based allocation assumes that we have the large address space that 64-bit systems provide. While this particular use of it shouldn’t be taking more than one extra page over the space actually used, I make no guarantees for 32-bit systems.

Cameras are not Components

One change that is not directly a table, but came from my work with tables, is the realisation that I was representing the relationship of cameras with entities wrong. Before this work, each camera had an assigned entity that it would follow through the camera component. However, when one thinks about this, it means that the assigned entity is a component of the camera and not the other way, which runs contra to the ECS. For a while I played around with making this a one-to-one relationship, with the case of more cameras following one entity being recoverable by using child entities. However, I came to realise that cameras following an entity just isn’t really a component. This is even further supported by ideas of more complicated camera movement patterns that don’t rely on exactly one entity.

The approach I chose instead is to separate cameras and their movement from the ECS. Now each camera movement is its own object, wrapping the camera and anything else it needs and implementing the movement logic. In the case of the current only movement, following an entity, this is implemented by the class open_sea::camera::AtEntity. Its transform() method then gets invoked in the main loop instead of the now removed camera following system.

With this change, the cameras are now separated from the ECS. To me this makes more sense because the entities and components represent the state of the game universe and the systems are ways of updating it, while the cameras are just ways of viewing it.

Conclusion

All in all, this is probably the most complicated feature I have implemented for open-sea. Hopefully with this adding future components and making sure their data is accessed efficiently will be much easier and require less copying of code from other components. As more components are added and use these tables, I hope to further refine their implementation and expand their interface. My hope is that they could also be useful in other applications, which is why I try to avoid being too component-specific in their specification.