As I’ve been continuing my attack animations and combat system, I started to notice sometimes my hits weren’t registering. Up until now I’ve done no client side correction of server location data. I figured since I was testing everything locally the drift between the clients location and the servers location would be minimal.
Boy was I wrong.
Quick refresh of where we are at
Let’s step back. Right now this is how things work:
- Client presses a key/inputs
- Client input is collected per-frame tick
- Client input is handled by Jolt Physics to apply velocity/movement with the delta_time since the last frame of the client
- Client physics is run
- Client send inputs to server
- Server collects inputs from all users per-tick
- Server iterates over users and runs their input via Jolt Physics to apply velocity/movement with the delta_time of the servers last tick.
- Server physics is run
- Server sends the calculated locations and velocities of the players back to each client
- Clients do nothing with the server calculated position/velocity for their character
- Clients update/move the other characters that they can see/is in their area of interest.
- GOTO: 1
Determining client/server desynch
To determine what the drift between client and server is, I implemented a simple actor component (inherited from a StaticMeshComponent) that uses the server responses along side the client. In my Character class, I then create this component in the constructor and detach it from the actor so it’s free to be moved with SetLocation in world space:
NetworkDebugComponent = CreateDefaultSubobject<UPMONetworkDebugComponent>(TEXT("NetworkDebugComponent"));
FDetachmentTransformRules DetachmentRules(EDetachmentRule::KeepWorld, true);
NetworkDebugComponent->DetachFromComponent(DetachmentRules);
I update this component’s world location in my flecs UpdateLocalAttributes system that runs in the ClientGameMode class every tick:
Component->Update(Attributes, Traits);
auto Debug = LocalPlayer->GetComponentByClass<UPMONetworkDebugComponent>();
if (Debug && Debug->bIsEnabled)
{
FVector ServerLoc = PMOConverters::Vector3ToFVector(NetPos.Position);
Debug->SetWorldLocation(ServerLoc);
UE_LOG(LogTemp, Error, TEXT("UpdateLocalAttributes ServerPos: %s LocalPos: %s!"), *ServerLoc.ToString(), *LocalPlayer->GetActorLocation().ToString());
}
And here’s the result (after configuring a cone as the mesh):
As you can see the server thinks I’m in a different position than the client. This was (partly) due to using different delta times when calculating velocity.
There’s a few problems with what I’m (not) doing:
- I’m currently not doing any client correction from the server state
- I’m using the server’s delta_time to calculate characters movement on the server. This is leading to different physics simulation results between client and server (e.g. different locations/velocities) since the delta_time is slightly different between the two.
- I’m not doing any lag compensation on the server. This would require us to roll back physics N frames. While common in FPS and fighting games, I do not think this is feasible for MMORPGs with possibly hundreds of players all sending various inputs and timings.
Questions I don’t have answers to:
- Do I need to send client delta time and the physics simulation result? Most techniques I could find appear to send client delta times to the servers for validation. This seems extremely risky to me. As you can now forge your input and time information? I guess you could do it only for “double checking”.
- Do I need to rollback physics simulations on the server to determine effective hits between players? Pretty sure the answer is no, I’ve yet to find any evidence that MMORPG servers do lag compensation by rolling back physics simulations, but would love confirmation on this. Client however appears it must do this.
Where to go from here
Clearly I need to make sure the client and server are processing the same input(s) and timing information. After watching this awesome breakdown of how UE5 Character Movement Component works, I learned UE sends the client input, the client time (delta time? it’s unclear) and the client’s calculated result.
For simplicity, I ended up just hardcoding the client & server’s physics delta time to 0.3333333. This should work as long as both can run all calculations for a frame within budget. I thought this would fix my problem, but alas it did not. I still see pretty excessive drift even after a few seconds.
Removing UE5 from the equation
I decided to remove some variables and use the headless client on the same host as the server. To my surprise I was NOT seeing any drift, is this a windows/linux problem? This question set me on a very… long (2 1/2 week) path of trying to get the pmoclient to build in windows. I ended up:
- Breaking both linux & windows builds by upgrading dependencies
- Removing dependencies I didn’t really need (cglm, fmt).
- Fighting to get spdlog to not crash in windows
- Ensuring Jolt was built with
set(CROSS_PLATFORM_DETERMINISTIC ON) - Realizing statically linking libraries INTO my pmo_library is a terrible idea if ANY of the 3rd party library code is accessed outside of pmo_library. (Horray CRT heap corruption fun times!)
- Created a pmo_library_server and pmo_library to separate server and client code by using preprocessors such as
#ifdef IS_SERVER ... #endif. - Upgrading my linux image to Ubuntu 24.04 from 20.
- Upgrading clang from 10 to 18.
After all of that I was able to get pmoclient to build in Windows and…. surprise! I still see drift after the first few packets.
Doing some diffs
I ended up writing some logging code to see at what point the drift begins to occur. Locations always seemed correct until the I noticed the rotations between the server & client ticks stopped matching. Changes in rotation caused different velocities… which causes different positions. After a few hours (spread over a few days) of debugging, I decided to visually split up my logs by server tick to REALLY see what was happening and was greeted with this:
- Tick 12 processes movement, gets next input for next tick (client is 3 ticks ahead)
- Tick 13 processes movement
- Tick 14 processes movement , gets next input for next tick (client is 4 ticks ahead)
[2025-02-01 09:13:50.977] [server] [debug] ------------------------Tick: 12----------------------------
[2025-02-01 09:13:51.010] [server] [debug] ProcessInput: Processing 1 inputs delta: 0.033605646 using: 0.03333333
[2025-02-01 09:13:51.010] [server] [info] PreUpdate: ProcessInput: SeqId: 22 Calling HandleInput Input: Input: 0.70710677 0 0.70710677 Rot: 0 0.38268346 0 0.9238795
[2025-02-01 09:13:51.010] [server] [info] Calling ProcessPacket with 232 bytes
[2025-02-01 09:13:51.010] [server] [debug] recv 232 bytes from client 4723954
[2025-02-01 09:13:51.010] [server] [info] PreStore: PostPhysicSimulation: Pos: 3.3954582 0.9 3.3954582 Rot 0 0.376102 0 0.92657834 Vel 5.646764 -0.327 5.646764
[2025-02-01 09:13:51.010] [server] [info] OnStore: ServerUpdateState: EntityId: 5102 SeqId: 22 Pos Now: 3.3954582 0.9 3.3954582
[2025-02-01 09:13:51.010] [server] [info] Got ClientCommand message from: 4723954, opcode: 2
[2025-02-01 09:13:51.010] [server] [warning] User 4723954 sent opcode: 2 ClientOpCode_Input
[2025-02-01 09:13:51.010] [server] [info] (MainState) Got player 4723954 ack for: 20 cur seq: 23
[2025-02-01 09:13:51.010] [server] [warning] Loading 1 new inputs for UserId:4723954
[2025-02-01 09:13:51.010] [server] [debug] ------------------------Tick: 13----------------------------
[2025-02-01 09:13:51.043] [server] [debug] ProcessInput: Processing 1 inputs delta: 0.033623856 using: 0.03333333
[2025-02-01 09:13:51.043] [server] [info] PreUpdate: ProcessInput: SeqId: 23 Calling HandleInput Input: Input: 0.70710677 0 0.70710677 Rot: 0 0.38268346 0 0.9238795
[2025-02-01 09:13:51.044] [server] [info] PreStore: PostPhysicSimulation: Pos: 3.5836837 0.90000004 3.5836837 Rot 0 0.37720025 0 0.9261318 Vel 5.6492867 -0.327 5.6492867
[2025-02-01 09:13:51.044] [server] [info] OnStore: ServerUpdateState: EntityId: 5102 SeqId: 23 Pos Now: 3.5836837 0.90000004 3.5836837
[2025-02-01 09:13:51.044] [server] [debug] ------------------------Tick: 14----------------------------
[2025-02-01 09:13:51.045] [server] [info] Calling ProcessPacket with 232 bytes
[2025-02-01 09:13:51.045] [server] [debug] recv 232 bytes from client 4723954
[2025-02-01 09:13:51.077] [server] [info] PreUpdate: PrePhysicsUpdate
[2025-02-01 09:13:51.077] [server] [info] ProcessInput: Empty inputs
[2025-02-01 09:13:51.077] [server] [info] PreStore: PostPhysicSimulation: Pos: 3.7719932 0.90000004 3.7719932 Rot 0 0.37720025 0 0.9261318 Vel 5.6492867 -0.327 5.6492867
[2025-02-01 09:13:51.077] [server] [info] OnStore: ServerUpdateState: EntityId: 5102 SeqId: 0 Pos Now: 3.7719932 0.90000004 3.7719932
[2025-02-01 09:13:51.077] [server] [info] Got ClientCommand message from: 4723954, opcode: 2
[2025-02-01 09:13:51.077] [server] [warning] User 4723954 sent opcode: 2 ClientOpCode_Input
[2025-02-01 09:13:51.077] [server] [info] (MainState) Got player 4723954 ack for: 21 cur seq: 1
[2025-02-01 09:13:51.077] [server] [warning] Loading 1 new inputs for UserId:4723954
[2025-02-01 09:13:51.077] [server] [debug] ------------------------Tick: 15----------------------------
It’s probably hard to tell but there’s a slight gap in the middle tick where it doesn’t receive the clients input in time! We don’t get it until the beginning of the last tick which is too late as we are already processing our game systems (network traffic is in a separate thread remember). So now the server processes velocity with ProcessInput: Empty inputs leading to a different value than what the client calculates (since it obviously processed the input).
I also noticed the sequence jumps up by one, but then later on drops back down to being only 3 ticks ahead again. This missed packet/jump in ticks happens a few times within a few seconds of traffic.
After 3-4 weeks I finally know what’s wrong and why they are desynchronized.
How to fix the desynchronization?
I’d love to know why the server is missing these packets in time, but it is inevitable anyways, so I will have to solve this for real world network connections. The idea is to store the servers data, compare with what we did locally, then roll back when drift is detected and re-run simulations.
For this I’ve created a ServerStateBuffer struct which will track what the server sends to us in a rolling buffer:
/**
* @brief This is what the server sees for the player as we need to possibly rollback physics
*
*/
struct ServerStateBuffer
{
uint8_t SequenceId{};
uint8_t AverageSequenceDelay{}; // number of sequences the server is "off" by
std::array<units::Vector3, 24> Position{};
std::array<units::Velocity, 24> Velocity{};
std::array<units::Quat, 24> Rotation{};
inline size_t LastSequenceId() { return (SequenceId + 24 - 1) % 24; }
void AddPosition(const units::Vector3 NewPosition) { Position.at(SequenceId % 24) = NewPosition; };
void AddVelocity(const units::Velocity NewVelocity) { Velocity.at(SequenceId % 24) = NewVelocity; };
void AddRotation(const units::Quat NewRotation) { Rotation.at(SequenceId % 24) = NewRotation; };
void AddLastPosition() { Position.at(SequenceId % 24) = Position.at(LastSequenceId()); };
void AddLastVelocity() { Velocity.at(SequenceId % 24) = Velocity.at(LastSequenceId()); };
void AddLastRotation() { Rotation.at(SequenceId % 24) = Rotation.at(LastSequenceId()); };
units::Vector3 PositionAt(const size_t Index) const { return Position.at(Index % 24);}
units::Velocity VelocityAt(const size_t Index) const { return Velocity.at(Index % 24);}
units::Quat RotationAt(const size_t Index) const { return Rotation.at(Index % 24);}
};
I’ll probably want to add the client physics calculations here as well and then be able to diff in between the two. If it detects any drift, the client will need to roll back, most likely using the technique outlined in Jolts’ Documentation . Once we get the latest positions from the newly run physics updates on the client, we will lerp between the clients current position and the adjusted position.
So yeah, guess it’s time to implement physics correction on the client using the server’s authoritative results. Next post will hopefully be on how I got this to work and not me breaking my build for another 2 weeks…
Refs:
