Networked Games -- Time and Event Synchronization

Time and Event Synchonization

So we know that in distributed environments, it is very complicated to share objects. Sharing object updates to be exact. In the games that you are developing, as a requirement, you only need to implement something decent that handles distributed sharing. But for the purpose of mastering this subject, let us still introduce this subject in a bit more depth.

Key Concepts

The time kept in games is virtual. Like what you have implemented when taking CS302, they are in meansured in tick. A tick may relate to 1 minute, 1 second, or a strange number such as 319 milliseconds, etc. The concept of tick is universal. It started a long time ago (measured related to the life span of Computer Science :-)). As soon as computer were used to do simulation, tick was introduced. Time can not decrease, it either stays the same or increases. Due to this reason, it is safe to say that after a server generates a timestamp 100, you won't see it generate 99.

Something new to you is, however, ticks may or may not be updated in consecutive steps. For instance, first you are on tick 1234, the next update would advance the tick to 1248, assuming a step size of 4. This is very important for simulations run with distributed computers. Different computers, especially when carrying different workloads, may need to use different step sizes.

In game language, this is saying: the virtual time moves forward in discrete increments. In many systems, the step sizes are so small that players just perceive a continuously moving time.

All events, including commands, inquiries reports, object updates, etc., must be timestamped. This we have already talked about. Note: timestamp maybe for the time they are generated, or for the time they should be executed. The latter is more essential.

Finally, as before, we know that there is no guarantee of in-order delivery of the network messages. Actually, within a given amount of time, there is not guarantee of delivery at all.

The Problem Statement

The problem statement is: to synchronize events, the time and order, execution among a set of distributed servers. In your game implementation, we are talking about your game servers, i.e. zone servers. To synchronize your clients with servers is a similar but different problem.

Common Solution (1)

The most straightforward one is the best effort method. Each server process, participating in synchronization, buffers messages in order of increasing timestamps, and then execute them one by one.

This is assuming all messages sent during the period have been received. But what if not? Then when the process gets a late message with an old timestamp, whether to still execute the message is a tricky issue. There is no general solution here. It is strictly application dependent.

Due to the above issue, a bigger problem is created. How can any process tell whether other process executed the same order?

Common Solution (2)

The second approach uses a central timeserver, which is basically THE process that set the pace of execution for all other proceses. It determines when to move forward, when to slow down and when to stop.

In this method, all other server processes are made to the slave of the timeserver. This is better than "best effort" by providing a "hope" of a universal order of events. Although there is still no way to tell whether all processes actually executed the events, the likelihood is pretty high.

Another good thing with this method is that the timeserver can choose to ignore slow or heavily loaded processes. This is a rather elegant design.

The CMB Algorithm

The distributed computing/high performance computing community came up with many algorithms that improve upon the two common solutions mentioned above. The best one that we can use for networked games is the CMB algorithm.

CMB stands for three names: Chandy, Misral and Bryant. Just google to know more about them. Or, ask Dr. Plank.

CMB is a distributed k-reduction algorithm. It is distributed by nature. So there is no single master process. Note, here distributed refers more to all processes being independent. You can use CMB to synchronize 20 processes on the same machine too.

On each process, you keep an event queue sorted by timestamp. Note, here the timestamp are time to execute. You keep a queue for each participating process. So, if you have three processes, then on each process, there are three queues (one of those is the queue for the host process).

Based on the timestamps in the queues, each process can calculate a Global Virtual Time (GVT), by look at the first element in each queue. The lowest one is the current GVT.

GVT basically describes that all processes have gone into the "future" of GVT. No one will generate any events that "live in the past". We can then be pretty sure that all processes see the same order up to the instant of GVT. So all events up till the GVT event can be executed right away. All events on the queue with a timestamp higher than GVT remains in the queue for future release.

As events get released, it is possible for a queue to become empty. In that case, there is no way to know what time that corresponding process is up to. So everybody wait. The system remains in deadlock situation until a new message arrives from that event.

To avoid the problem of deadlock, null messages are used. One way is to generate null messages on regular intervals. The length of the interval is basically the longest period of deadlock that you can tolerate. Another way is to generate a null message after every real executable event. This causes the number of messages to double, but basically enforce of deadlock time of zero.

Improvements

Originally, you are supposed to do event synchronization for all objects, be it a mermaid, a fire, or a button; basically whatever the shared universe has.

But for networked games, this number can quickly go up to thousands, if not millions. This creates a real bottleneck. so, the next idea is, why not do synchronization on processes, not objects. All we want to find out is GVT, isn't it?

So everytime a process wants to move forward, it sends a AR request (advance request). The AR requests are then kept in a queue in the GVT space. Everytime GVT advances, all object requests within each process are then released for execution, assuming that each process arranges object requests according to the AR requests they are associated with.

Another improvement comes from an observation. Due to network latency, when process A knows process B has turned in timestamp 100, process B is already past timestamp 100. So technically, the next timestamp B can ever generate is 100 + stepsize_B. If this is the case, process A can actually look ahead and release more events. The benefit is a way to outpace the slowest server by its stepsize.

Gotcha's

First, timestamp with stepsize allows a convenience in adjusting the granularity of synchronization. But really, we need to sort events in order, not just in buckets of timestamp.

It is often useful to add to the timestamp an ID, or a counter. Say in 12223.005, 12233 is the timestamp. 005 is the counter. It can only go up and it will cycle back to 0 every time a new timestamp is generated. This way, two events of the same timestamp can still be distinguished, assuming a differnt ID.

Additionally, often there is an "age" associated as well. Age indicates generations. If event A of age 0 caused events B and C to be generated, B and C are of age 1. This can go on for ever. Based on age, you know that event A goes before B and C, even if they have the same timestamp and ID. So, this makes the timestamp something like 12223.001.005.

How about priority? If something exploded in the scene, it's got to be more important than chatting messages. In some cases, priority can be partially implemented using age, but not always. So depending on your game, the eventual timestamp looks like:

      timestamp.priority.age.id

Now, let's talk about how you can implement event synchronization?

What do we start?