Latest Posts

DebuggingDesyncs

Introduction

Widelands uses parallel simulation for networking. What this means is that (initial game state including RNG + set of all player commands) is supposed to uniquely determine the state of the game at any point in time. This enables networked game by simply exchanging the initial state and all player commands, rather than exchanging the actual game data, which is significantly bigger.

To ensure that parallel simulation actually works, i.e. the game stays in sync, all parts of the game logic can write to the so-called syncstream. Data written to this stream will be hashed and hash values exchanged across the network and written to replay files.

A desync becomes visible during network games when the hash values from different hosts no longer match. They become visible during replays when the hash value calculated during simulation does not match the hash value stored in the replay file.

I have a desync, what do I do?

Don't panic!

Desync bugs can be notoriously difficult to debug, but there are tools available to help.

Drinking from the firehose

You can enable printing all data written to the syncstream in src/game.cc. Watch out, this is a lot of data (multiple megabytes per second if you speed up the game). Consider piping it into a file and using tools like grep and diff to make sense of it.

If you work from an existing replay by watching it, you might consider adding code to src/game.cc such that syncstream data is written only after a certain gametime.

A typical snippet might be something like this:

[sync:00008017 t=9952900] de ad 00
[sync:00008020 t=9952900] 84 de 97 00
[sync:00008024 t=9952900] 80 00 00 00
[sync:00008028 t=9952900] ac 04 00 00
[sync:00008032 t=9952900] ac 04 00 00
[sync:00008036 t=9952900] 84 00
[sync:00008038 t=9952900] 8d 00

The first line (de ad 00) is merely a marker that Cmd_Queue inserts into the syncstream whenever it executes a command. The second line is the duetime of the command. The third line contains the command ID, see queue_cmd_ids.h. 0x80 is a Cmd_Act. The following line is printed by Cmd_Act and indicates the serial number of the acting Map_Object. The three lines following that are printed by Bob::set_position. The first line is the serial number again, followed by the X and Y coordinates.

Try to reproduce the bug

Find some way to reproduce the desync so that you can actually create a new replay that shows the same desync. Then create a new replay while having as much debug logging enabled as possible (pipe into a file!). After that, start a new session and watch the replay, again with as much debug logging as possible.

Afterwards, cut everything in the logs before the line that says

[sync] Reset

and create a diff of everything after that line. Figure out where the first real difference is in outputs. There will be superficial differences like AI player output missing etc.; don't worry about those. You'll know where the real differences start. Then work your way from there.

Use the record/playback system

When hunting for the bug, you might have to incrementally add more debug info to figure out where the simulation starts to diverge, so you will need logs of both the original session writing the replay, and logs of the session watching the replay.

But desync bugs may seem non-deterministic. In these cases, make a recording of a game that triggers the desync (using./widelands --record=desync.rec). After adding additional debug info, you can then playback this recording to recreate the session writing the replay.

Tagged with: Development, help