Preparing your game for deterministic netcode

If this looks slightly familiar, that's because it is

Considering that frequency at which people ask me about what it takes to do netcode (and what they need ready for it) continues to rise year-to-year, I figured that it's about time that I finally make a blog post about this.

This one's about deterministic netcode as it is by far most requested and also the kind where it's possible to give more specific advice than "it depends".

What is deterministic netcode?

Deterministic netcode relies on each game client coming to an identical state given the same initial state and inputs per frame.

Common kinds of deterministic netcode include lockstep and rollback.

Lockstep

Lockstep is a (relatively) simple implementation of a deterministic protocol, having it that the next game frame is processed only once inputs/actions of each player are known for the frame (hence the name).

Lockstep networking implies adding input delay of half of median round-trip time to allow remote inputs to arrive on time, and requires stalling the game until all remote inputs for the frame are known, which can make it a suboptimal choice for games that may be played on devices with unstable connectivity (such as mobile or Switch).

Rollback

Rollback improves upon lockstep by allowing players to guess remote inputs when they do not arrive on time, subsequently rewinding the game and re-playing game frames with corrected inputs later once the inputs do become known.

This means that if, say, there is a 100ms long connection hickup, we can go on an assumption that the remote player kept holding the inputs they were already holding for those 6 frames or so, and correct once the real inputs arrive, visually only resulting in their character briefly adjusting to a new location.

Of course, the more frames are predicted, the more likely it is that something goes completely differently once the state has been corrected - a few frames can be fine, but predicting a second worth of inputs will most likely result in remote character snapping to a completely different spot, which is why competitive games tend to cap the maximum number of predicted frames before the game would do a lockstep-like stall.

Advantages

Low bandwidth use
(for the most part only inputs/actions have to be transmitted)
Relative fairness
(host does not have an advantage over other players, neither do players that are closer to the host)
Game code remains relatively separate from netcode
That is, if you are, for example, adding an extra move to your fighting game, you do not usually have to touch any netcode, which can reduce the amount of back-and-forth required in teams where different people do gameplay code and netcode.

Disadvantages

Input delay
Although rollback can help with this, setting delay lower than the median half-trip-time will cause remote players to constantly glitch around due to consistent mis-prediction.
Regardless, many people prefer that to input latency.
Scaling
Each player needs to send their inputs to each other player, meaning that there will be Sum(1...playerCount-1) number of connections between players total - 1 for 2P, 3 for 3P, 6 for 4P, 10 for 5P, 28 for 8P, 66 for 12P... needless to say, the more connections you have, the higher are the odds of any given pair of players having a poor connection to each other and causing issues for everyone else.
Generally deterministic netcode is not used in combination with mesh topology in games with >4 players - for games with humble input delay requirements (e.g. RTS games), star topology (everyone connects to host) can be used instead.
Cheating
Since each player has the entire game state at all times, people may come up with ways to view what they are not supposed to (see: map-hacking/world-hacking).
In fast-paced games, it is less of a concern - for example, in a fighting game there is rarely ever a difference between what you and your opponent see, and the only extra information that could be inferred is hitboxes/cooldowns, which many players already know by heart.
Desyncs
Your ultimate nemesis!
If some of your code is not, in fact, deterministic - say, you spawn an instance whenever a global variable reaches a set value and forget to reset it on session start - this can cause a state divergence, at which point the players will no longer see the same game state.
Games generally disconnect players on desync, as attempting to re-synchronize the game state is computationally expensive, can take up a substantial amount of bandwidth, and does not warrant that the game will not desync again (causing a death loop of repeatedly re-synchronizing the game state).
Debugging desyncs is one of the more complicated parts of working with deterministic netcode and involves comparing game state dumps/logs.

Caveats

Importance of connection quality
Since running late on packets will cause either stalls (lockstep) or visual glitches (rollback), it becomes of uttermost importance that you maintain as good of a connection between players as possible.
Techniques like packet loss mitigation become a must, and recent games increasingly lean towards routing traffic through private networks (which may offer better latencies/stability than usual P2P).
Tool specifics
Some game engines/frameworks can be inherently less fit for deterministic netcode due to built-in components favouring performance over determinism, which is not an easy issue to address even if you have source code access.
For example, GameMaker is in a relatively good spot in this regard since it has built-in epsilon for comparisons (close enough numbers are considered equal) and built-in collision checking functions historically round the coordinates (which means that small floating-point errors can go completely unnoticed).
In contrast, Unity is not a good fit for determinism since most of the built-in API (including physics) is not deterministic and you end up re-implementing a good chunk of it with fixed-point structs if you desire determinism.
Variable framerate
Making your game run at a variety of screen refresh rates can be more challenging with deterministic netcode since the game logic must progress at the same pace between players while visuals will need to be interpolated/extrapolated.

Which games need deterministic netcode?

Conventionally, deterministic netcode is used for

Fast-paced competitive games
For instance, almost any fighting or platformer fighter game that you can find has rollback (preferred) or lockstep netcode.
Mixed-genre fast-paced games (e.g. Lethal League) also lean towards rollback netcode.
Fast-paced cooperative games (occasionally)
In general, if your game is strictly cooperative, you can go for classic client-server model and favor the player where you can, but games with both cooperative and competitive modes and/or high precision requirements may utilize rollback netcode.
Perhaps the most well-known recent example of this is Spelunky 2, but rollback netcode can also be found in higher-budget beat-em-up games.
RTS games (and other games with way too many entities)
If your game has hundreds of units moving around, effectively synchronizing information about them can be a challenge, which is the reason why RTS games historically leaned towards lockstep.
As of 2021, median internet speeds are generally sufficient for many RTS games to use a client-server model instead, which can also spare them of some of the cheating issues.
Emulators
Modifying each game's ROM to include networking logic is generally unviable, and emulators are inherently deterministic, which makes them a great fit for deterministic netcode.

Preparation

This can be conveniently split into tiers of how much you want to bother:

Tier 0: General

These are good to do even if you're not sure if you'll be doing netcode:

I cannot stress it enough, but if you intend to have online multiplayer in your game, you should have local multiplayer working in some form - even if it's split-screen and is not really usable unless the player has a wide/big screen.
Making every part of the game acknowledge multiple players can be time-consuming, and in extreme cases with big games it can be cheaper to remake the game from scratch than implement netcode for it.
Keep documentation on where you are saving and loading the data, and what of it affects the game state - for example, if a specific entity on a level only appears after a player has unlocked something, the fact will need to be synchronized in multiplayer.

Tier 1: Lockstep on desktop

For this you want:

(steps from "General")
Organize your input polling to be in one place either by abstracting it to button_check(player_index, button_index) functions, or by just assigning variables to indicate each input's state somewhere.

For a practical example, try implementing a replay system in your game.

A replay is a file containing any initial state (such as gameplay-affecting settings or unlocks) and contains player inputs per frame since match/session start.

A replay can then be used to play back the game by applying initial state and taking each frame's inputs from the file rather than polling the devices.

If you can get replays working without desyncing, you're good to go!

Tier 2: Lockstep on web/mobile/consoles

First, to explain the distinction from above:

On desktop platforms, networking APIs generally have synchronous versions of functions, meaning that if you need to stall the game for a moment, you can do by repeatedly polling the socket/API until the data becomes available.

In contrast, on other platforms synchronous polling can range from being discouraged to being impossible (which is the case with HTML5 specifically).

So, for this you would want:

(steps from above)
Make it so that the game able to process an arbitrary number (including zero) of game logic frames per actual frame.
This is usually accomplished by moving game logic code to a different place that makes it easier to invoke on demand - e.g. moving Step event code to User Event in GameMaker, or moving Update/FixedUpdate to your own function in Unity.
Note that you'll also have to take care of any logic that is being processed by your engine of choice automatically! (such as animations/related states)

For a practical example, implement ability to pause and fast-forward (2x playback speed) in the earlier made replay system.

Tier 3: Rollback

This is hard to fully prepare/test for, but:

(steps from above)
Implement on-demand game state saving/loading.
This has to serialize/deserialize the entire game state (everything that affects gameplay) into some format that can be later read from - conventionally, binary serialization, but you can technically do whatever you want so long as it's fast enough (can execute in <10% of your game frame time).
The difficulty of this can vary wildly from game to game and engine to engine depending on how many game entities you have, how much data each contains, and what tools are at your disposal.

For a practical example, implement ability to save/load position in the earlier made replay system - saving would mean saving the game state and the current file read position, while loading would mean loading the game state and resetting file read position to the earlier saved one, thus effectively rewinding the replay.

If you can get that to work without causing desyncs, you're in a good spot!
(but, of course, some extra optimization might be necessary)

When to start on netcode?

Ideally, the sooner the better, but games take a while to make, and a game's code base can change dramatically thorough the development, so it's not uncommon to have the game ready for netcode but only start on actual netcode when the game is closer to being feature-complete;

If you are implementing netcode yourself for the first time, make sure to have at least a few-month buffer to test and debug any potentially arising issues.

Retrofitting an existing game for online multiplayer can be trickier - especially for rollback.

For GameMaker games specifically, lockstep is achievable for majority of games as there are fewer things that can go wrong (as per earlier).

Testing before netcode

The common motivation behind getting netcode in sooner than later is to make sure that multiplayer elements are well-tested and balanced, which can be harder to do if the game can only be played locally.

Fortunately, these days there's no lack of game streaming solutions, be it from one of the players' computer (NVIDIA GameStream, Moonlight, Parsec, Steam Remote Play, etc) or server-based (Parsec, GeForce Now, maybe more - hard to check).

And if your game is being made in GameMaker, you can use a tool I made to test the game online without having to implement anything game-side - through a variety of tricks, the tool injects lockstep netcode into games, offering better bandwidth and latency than game streaming can.

5 thoughts on “Preparing your game for deterministic netcode”

YAL-content-liker on Feb 3, 2024 at 02:17 said:

Thank you for this. this is VERY helpful for me.

Reply ↓
MMH on Nov 20, 2023 at 22:22 said:

Thanks for the input! I’ll employ user events then.

I’ll be replacing the current built-in alarms with the system you demonstrated in that link, seeing as said alarms are covered within the step event to stall them without incurring any glitches with GM’s built-in alarms.

Reply ↓
MMH on Nov 18, 2023 at 03:48 said:

Surprised this post hasn’t gotten any comments – definitely contains a lot of useful info for networking.

That said, since I’m getting into the weeds of deterministic lockstep in GameMaker myself, I’m wondering if you could shed some light on a few items.

1) You said:
“Make it so that the game able to process an arbitrary number (including zero) of game logic frames per actual frame. This is usually accomplished by moving game logic code to a different place that makes it easier to invoke on demand – e.g. moving Step event code to User Event in GameMaker”

In GameMaker, I attempted to utilize Meseta’s lockstep code to stall the game, using a keyboard check to simulate lag (i.e. “stalling”). Unfortunately the problem I experience is that the player object’s values seem to flicker during “stalling”. This is not too much of an issue for certain variables, but for countdown variables, alarms in particular, this leads to an incorrect value sometimes getting locked in after coming out of the stalling and either freezes at -1 or some other value and doesn’t count down – if the player needed this countdown to reach 0 to return to a neutral state, that can no longer happen and the player is frozen in their previous state. Image index flickering also means the player’s animation flickering. I’m wondering if you’d have some explanation or some kind of basic example to illustrate how you’d handle this stalling process for lockstep yourself?

2) Is using the user event to invoke step code really the only way to stall the game during lag? Why can’t I use something like deactivating the instance layers for the essential objects (players, projectiles etc.) when packets are lost/received late, then reactivate once the lag has passed?

Thanks for any guidance.

Reply ↓
- Vadym on Nov 19, 2023 at 16:49 said:
  
  Deactivation is inherently janky – freshly activated instances will not run the current event (so activating something in Step leaves it without the Step event for that frame) and things like alarms are slightly more obscure in their function.
  
  You could prepend all event code with a macro that evaluates to `if (global.stalled) exit;`, but you’ll still have to do something with alarms.
  
  I think user events are preferable as this better prepares you for potentially doing rollback later.
  
  Reply ↓
  - MMH on Nov 20, 2023 at 22:24 said:
    
    Sorry, thought my reply to you above was going here – might want to move that post and delete this one. Thanks.
    
    Reply ↓

YellowAfterlife