DLLs, Memory, and You

Before we get into the topic of DLLs and memory, I just want to comment that after two weeks I am finally unstuck. This was mostly due to Sander (the flecs maintainer) fixing an issue I identified. As you can see from that issue, I tried many different methods and combinations of linking flecs. Now with that fixed, onto the next problem!

And it didn’t take long. Fixing the above lead me to clean up my CMake files quite a bit and during this process I was able to build in more debug information into my test project. When re-running my tests I started to get some weird exceptions:

Taking a look at this in the debugger, I noticed these exceptions were occurring only in destructors as the test functions exited:

At first I thought I had some errant pointer dangling or something, but I’m exclusively using smart pointers shared or unique. This required some googling.

Googling for __acrt_first_block == header leads you to a stack overflow post that EmDroid answered back in 2016.

In it they state:

As this is a DLL, the problem might lie in different heaps used for allocation and deallocation (try to build the library statically and check if that will work).

The problem is, that DLLs and templates do not agree together very well. In general, depending on the linkage of the MSVC runtime, it might be problem if the memory is allocated in the executable and deallocated in the DLL and vice versa (because they might have different heaps). …
Then you use the output vector in the executable and once it gets out of scope, it is deallocated, but inside the executable whose heap doesn’t know anything about the heap it has been allocated from. Bang, you’re dead.

This can be worked-around if both DLL and the executable use the same heap. To ensure this, both the DLL and the executable must use the dynamic MSVC runtime – so make sure, that both link to the runtime dynamically, not statically.

I am mixing static and dynamic linking (via statically linking my dependencies) hence I am getting different heaps! Also I’m using smart pointers, which means I have to be very careful when a pointer goes out of scope to ensure it’s released in the correct heap. Another really good write up on this topic can be found on Christian Aichinger’s website.

Well, this sucks, what do I do about this? Turns out there are a few options:

Dynamically link everything
Don’t allocate anything in your DLL that crosses the ‘DLL Boundary’ (e.g. passes a heap allocated object back to an external caller)
Pass in a custom allocator to the DLL that should be used
Expose a ‘Release’ or ‘Free’ method that can take back any smart pointer to be released back in the DLLs heap.

I could do #1, but I don’t like the idea of having tons of DLLs as a requirement. Also it complicates deployment a bit. My main problem I have is in #2. I’m allocating lots of smart pointers all over the place, especially in the Crypto library. I thought a bit about #3 but that ends up ‘tainting’ every allocation you make as you need to change its type from say std::vector<uint8_t> ... to std::vector<uint8_t, SomeCustomAllocator> .... I don’t want that.

I decided to refactor my code to attempt option #2 refactoring the Crypto library to take in references to buffers instead of returning smart objects turned out to be really straight forward.
Basically I changed my methods from:

UniqueUCharPtr Encrypt(const uint32_t UserId, std::array<unsigned char, NONCE_BYTES> &Nonce, const unsigned char *MessageBuffer, unsigned long long MessageLen);

to this:

bool Encrypt(const uint32_t UserId, const std::array<uint8_t, crypto_aead_xchacha20poly1305_ietf_KEYBYTES> &UserKey, std::array<unsigned char, NONCE_BYTES> &Nonce, const unsigned char *MessageBuffer, std::vector<unsigned char> &CipherText, unsigned long long MessageLen);

I also removed the LRU cache of keys being inside the Crypto class and moved it to where it should belong, the Server. Clients just need to have a reference to their single UserKey.

With all of that refactored I had one final problem. The UDP Client/Server code allocates memory for packets and deserialized flatbuffers messages, how the heck am I supposed to handle that use case?

At first I thought about adding the allocation to the LRU cache code as that is defined by the caller. But that made the API really weird, so I deleted that. I also thought about passing in a custom allocator (#3 on the above list) but hated the idea of every allocation requiring some weird custom type with the allocator specified. I also thought about passing in a std::function that would allocate the data, but that would be weird too because it’d be cyclical referencing types from the DLL.

I opt’d for #4, and expose a single function:

        /**
         * @brief Necessary for releasing objects that crossed the DLL boundary, 
         * really only used
         * in tests. If we don't release 
         * in the pmo_library, we get heap 
         * corruption.
         * 
         * @param Message 
         */
        static inline void ReleaseSocketMessage(std::unique_ptr<SocketMessage> Message)
        {
            Message.release();
        }

Now, UE5 or my pmoclient/server executables can just call this message to properly free the message instead of corrupting the heap when it goes out of scope.

Now with that all working… it was time to get some logging in for my library and UE5. That, I will cover in the next post, because that turned out to be a nightmare.

DLLs, Memory, and You

Share this: