Stuck

Published by

on

Some times you think you have everything working, then you go to call part of your program and it turns out, no, it’s not working.

That is the predicament I find myself in now. While I have been successful in getting UE5 to load my pmo_library.dll, it turns out there’s a strange unknown bug in flecs when it’s linked into a shared library. At first I thought it was a problem with the unreal build system and how everything was linking, but I tried to run my tests in windows and only the flecs test failed.

This led me down a rabbit hole that has consumed the past 4-5 days. It appears if you link flecs into a shared library, random bugs occur when attempting to import my flecs modules. When I statically link flecs into my shared library I get one crash. When I dynamically link it I get another crash.

If I statically link my library and statically link flecs, then I get no crashes. But I can’t do this because I can’t statically link my library for UE5, it requires it be a shared (DLL) library.

I’ve opened an issue in the flecs discord, and hope someone can help me out. While I wait, I continue to debug.

Let’s take a look at my process so far. First I updated to the latest version (I was 2 minor releases behind). Then I wanted to trace the differences when you statically link everything, vs create shared libraries.

Here were my results.

Binary with statically linked flecs (flecs_static) and my statically linked library:

  1. Call ecs.import<movement::module>() from main binary
  2. Calls impl.hpp:13 (do_import)
  3. This calls flecs_cpp.c:348 ecs_cpp_component_register_explicit once with my module component movement::module (type.size = 1, alignment = 1) so it passes assertions
  4. impl.hpp:13 do_import calls world.emplace<T>(world)
  5. Emplace calls:
inline void emplace(world_t *world, flecs::entity_t entity, flecs::id_t id, Args&&... args) {
    ecs_assert(_::cpp_type<T>::size() != 0, ECS_INVALID_PARAMETER, NULL);
    T& dst = *static_cast<T*>(ecs_emplace_id(world, entity, id));
    
    FLECS_PLACEMENT_NEW(&dst, T{FLECS_FWD(args)...}); // <-- some c++ templating magic

  1. “some c++ templating magic” calls my movement::module constructor
  2. Program works as expected

Binary with dynamically linked flecs (flecs) and my dynamically linked library:

  1. Call ecs.import<movement::module>() from main binary
  2. Calls impl.hpp:13 (do_import)
  3. This calls ecs_cpp_component_register_explicit once with my module component movement::module (type.size = 1, alignment = 1) so it passes assertions
  4. impl.hpp:13 (do_import) calls world.emplace<T>(world);
  5. emplace calls flecs_cpp.c:400 (ecs_component_init):
inline void emplace(world_t *world, flecs::entity_t entity, flecs::id_t id, Args&&... args) {
    ecs_assert(_::cpp_type<T>::size() != 0, ECS_INVALID_PARAMETER, NULL);
    T& dst = *static_cast<T*>(ecs_emplace_id(world, entity, id));
    
    FLECS_PLACEMENT_NEW(&dst, T{FLECS_FWD(args)...}); // <-- some c++ templating magic

  1. “some c++ templating magic” jumps into a number of unknown functions inside my library.dll until finally going back into ecs_cpp_component_register_explicit:
flecs.dll!ecs_cpp_component_register_explicit(ecs_world_t * world, unsigned __int64 s_id, unsigned __int64 id, const char * name, const char * type_name, const char * symbol, unsigned __int64 size, unsigned __int64 alignment, bool is_component, bool * existing_out) Line 400 (c:\Users\isaac\Documents\Unreal Projects\pmoclient\Plugins\PMO\Source\ThirdParty\pmo\build\_deps\flecs-src\src\addons\flecs_cpp.c:400)
pmo_library.dll!00007ffec7084415() (Unknown Source:0)
pmo_library.dll!00007ffec70829d7() (Unknown Source:0)
pmo_library.dll!00007ffec707f016() (Unknown Source:0)
pmo_library.dll!00007ffec707de6c() (Unknown Source:0)

  1. Now my module name is prefixed with :: such as ::movement::module in the ecs_cpp_component_register_explicit as it tries to register or check if it’s registered again.
  2. entity.c:1888 (ecs_component_init) which calls flecs_check_component attempts to check if the const_ptr of the component, matches the size/alignment but fails because size is 0.
  3. ecs_abort(ECS_INVALID_COMPONENT_SIZE, path); is called because ptr->size (1) != size (0)
  4. :crash: fatal: entity.c: 1878: movement.module (INVALID_COMPONENT_SIZE)

Clearly there is some oddness going on with the templating magic in step 6 between statically and dynamically linked versions. I just can not for the life of me figure out what. It does appear to be doing some sort of secondary look up when it’s crash-y. This makes me think there is something wrong with the component registration between DLL/binary, but it’s not clear what.

To remove as many variables as possible, I created a minimal reproducible project which I highly recommend you do anytime you are investigating oddities such as this. Also it helps when filing bug reports for the maintainers! Speaking of issues, I did come across this bug issue which seems some what similar in that there are disparities between when running as a DLL.

So if you’ve been wondering why I haven’t posted any updates lately it’s because I’m quite stuck! I really hope I don’t have to get rid of flecs, so I’ll give this another few weeks of keyboard smashing to see if I, or the maintainers, can resolve the problem.