CS302 --- Skip Lists

Brad Vander Zanden

Reference: Pugh, W. Skip Lists: A Probabilistic Alternative to Balanced Trees. CACM, 33(6), June 1990, 668-676.

Overview

Skip lists provide a way to keep a list of elements sorted and yet support search, insert, and delete operations in an expected time of O(log n). We say an expected time rather than a guaranteed time because skip lists rely on a probablistic algorithm to keep the list elements sorted. It can be shown that the chances of a search taking more than three times the expected search time for a list of 250 or more elements is less than one in a million. In exchange for foregoing a guaranteed worst-case performance, we obtain an algorithm that is much easier to implement than balanced tree schemes and that performs insert and delete operations much faster, on average, than balanced tree schemes.

Key Properties of Skip Lists

Some elements, in addition to pointing to the next element, also point to elements even further down the list.
A level k element is a list element that has k forward pointers. The first pointer points to the next element in the list, the second pointer points to the next level 2 element, and in general, the ithpointer pointer to the next level i element.
The level of an element is chosen in essence by flipping a coin. For example, we could think of flipping a coin until it comes up tails. We count the number of times the coin came up heads before coming up tails and add 1 to this number. This number represents the level of the element. More generally, we choose a fraction p between 0 and 1, so that a fraction p of elements with level i pointers will have level i+1 elements as well. On average, (1-p) elements will be level 1 elements, (1-p)^2 elements will be level 2 elements, (1-p)^3 will be level 3 elements, and so on. This scheme corresponds to flipping a coin that has a p chance of coming up heads and a (1-p) chance of coming up tails.
The name skip lists comes from the fact that you can use the additional pointers to skip over intermediate elements and get to the desired element more quickly.

Data Structures

Skip List Node:
1. forward: a dynamic array of pointers to skip list nodes. The size of the dynamic array is determined at run-time when the coin is flipped to determine the number of levels this node will contain.
2. key: the key of the record
3. val: a pointer to the record
4. p : the chances of getting a "head"
  The C declaration for a skip list node looks as follows:
```
    typedef struct skip_list_node {
        struct skip_list_node **forward; 
        int key;
        void *val;
	float p;
    } *SkipListNode;
    
```
Skip List:
1. header: a pointer to a dummy skip list node that contains the initial set of forward pointers.
2. level: the current number of levels in the skip list.
3. MaxLevel: the maximum number of levels to which a skip list can grow.
  The C declaration for a skip list looks as follows:
```
 
    typedef struct skip_list{
        SkipListNode header;
        int level;
	int MaxLevel;
    } *SkipList;
    
```

Skip List Initialization

Set the level of the skip list to 1.
Use the header node as both the header and the last node. Hence make all the forward pointers of the header point to the header:
```
     for i = 1 to MaxLevel do
         header->forward[i] = header
     
```
To avoid having to check for the end of the list, store a sentinel value greater than the maximum possible value of a key into the last node (which is actually the header node). For example:
```
     header->key = MaxInt;
     
```

Search in Skip Lists

Searching a skip list is much like searching a set of lists---you start with the coarsest grain list and find where in that list the key resides, then drop down to the next coarsest grain list and repeat the search. You keep repeating this process until eventually you reach the finest grain list, which either contains the key or does not. The pseudo-code for a search in a skip list looks as follows:

Search(list, searchKey)
    x := list->header

    -- loop invariant: x->key < searchKey
    for i = list->level downto 1 do
        while x->forward[i]->key < searchKey do
	    x = x->forward[i]

    -- x->key < searchKey <= x->forward[1]->key
    x = x->forward[1]
    if x->key == searchKey then
        return x->value
    else
        return failure

Insertion into a Skip List

Insertion into a skip list is like inserting the same value into i different lists, where i is the number of levels the coin flip chooses for this element. The algorithm simply finds the element's appropriate position in each of the lists and inserts it there.

Actually the following pseudo-code first determines whether the key is already in the list. If the key is already there, the pseudo-code simply replace's the key's record; otherwise it inserts the key and the record into the list. You should think about how this code can be modified to allow a list to contain multiple list elements with the same key (you might find such a question on a lab or a test!).

Here is the pseudo-code for insertion:

Insert(list, searchKey, newValue)

    -- update contains an array of pointers to the elements 
    -- which will be predecessors of the new element. 
    local update[1..list->MaxLevel]

    x = list->header
    for i = list->level downto 1 do
        while x->forward[i]->key < searchKey do
	    x = x->forward[i]
 	 -- x->key < searchKey <= x ->forward[i]->key 
	update[i] = x

    x = x->forward[1]
    if x->key == searchKey then
        x->value = newValue
    else
        newLevel = randomLevel(list)
        -- if the newLevel is greater than the current level of the list,
	-- knock newLevel down so that it is only one level greater than 
	-- the current level of the list. In other words, we will 
	-- increase the level of the list by at most one on each insertion.
	if newLevel > list->level then
	    newLevel = list->level + 1
	    list->level = newLevel
            update[newLevel] = list->header

	x = makeNode(newLevel, searchKey, newValue)
	for i = 1 to newLevel do
	    x->forward[i] = update[i]->forward[i]
	    update[i]->forward[i] = x

Deletion from a Skip List

Deletion from a skip list is like deleting the same value from i different lists, where i is the number of levels the coin flip chose for this element. The algorithm simply finds the element's predecessor in each of the lists, makes the predecessor point to the element that the deleted element points to, and finally deletes the element.

The following pseudo-code assumes there is only one element with the designated key in the list. You should think about how this code can be modified to allow a list to contain multiple list elements with the same key (e.g., several elements might have the same last name, but different first names). You might find such a question on a lab or a test!

Here is the pseudo-code for deletion:

Delete(list, searchKey)

    -- update contains an array of pointers to the
    -- predecessors of the element to be deleted.
    local update[1..list->MaxLevel]

    x = list->header
    for i = list->level downto 1 do
        while x->forward[i]->key < searchKey do
	    x = x->forward[i]
	update[i] = x

    x = x->forward[1]
    if x->key == searchKey then
        for i = 1 to list->level do
	    -- if the element to be deleted is a level j node, break out 
	    -- of the loop when level (j+1) is reached. Since the code
	    -- does not store the level of an element, we determine
	    -- that we have exhausted the levels of an element when
	    -- a predecessor element points past it, rather than to it.
	    if update[i]->forward[i] != x then break
	    update[i]->forward[i] = x->forward[i]
	free(x)

	-- if deleting the element causes some of the highest level lists
	-- to become empty, decrease the list level until a non-empty
	-- list is encountered.
	while list->level > 1 and
	         list->header->forward[list->level] == list->header do
	    list->level = list->level - 1

Choosing a Random Level

A level is chosen for an element in effect by flipping a coin that has probablility p of coming up heads. We keeping flipping until we get a "tails" or until the maximum number of levels is reached:

randomLevel(list)
    newLevel = 1
    --random() returns a random value in [0..1)
    while random() < list->p do
        newLevel = newLevel + 1
    return min(newLevel, list->MaxLevel)

Probabilistic Analysis
The probabilistic analysis of skip lists is beyond this course. However, it can be shown that the expected search, insertion, and deletion times are all O(lg n). The choice of p determines the variability of these search times (i.e., the standard deviation). Intuitively, decreasing p will increase the variability since it will decrease the number of higher-level elements (i.e., the number of "skip" nodes in the list). The Pugh paper contains a number of graphs that show the probability of a search taking significantly longer than expected for given values of p. For example, if p is 0.5 and there are more than 256 elements in the list, the chances of a search taking 3 times longer than expected are less than 1 in a million. If p is decreased to 0.25, the chances rise to about 1 in a thousand.
Choosing p
One might think that p should be chosen to be 0.5. If p is chosen to be 0.5, then roughly half our elements will be level 1 nodes, 0.25 will be level 2 nodes, 0.125 will be level 3 nodes, and so on. This will give us on average lg N search time and on average 2 pointers per node. However, empirical tests show that choosing p to be 0.25 results in roughly the same search time but only an average of 1.33 pointers per node. There is somewhat more variability in the search times and a greater chance of a search taking longer than expected, but the decrease in storage overhead seems to be worth it.