CS302 --- Skip Lists

Brad Vander Zanden


Overview

Skip lists provide a way to keep a list of elements sorted and yet support search, insert, and delete operations in an expected time of O(log n). We say an expected time rather than a guaranteed time because skip lists rely on a probablistic algorithm to keep the list elements sorted. It can be shown that the chances of a search taking more than three times the expected search time for a list of 250 or more elements is less than one in a million. In exchange for foregoing a guaranteed worst-case performance, we obtain an algorithm that is much easier to implement than balanced tree schemes and that performs insert and delete operations much faster, on average, than balanced tree schemes.

Key Properties of Skip Lists


Data Structures


Skip List Initialization

  1. Set the level of the skip list to 1.

  2. Use the header node as both the header and the last node. Hence make all the forward pointers of the header point to the header:
         for i = 1 to MaxLevel do
             header->forward[i] = header
         

  3. To avoid having to check for the end of the list, store a sentinel value greater than the maximum possible value of a key into the last node (which is actually the header node). For example:
         header->key = MaxInt;
         


Search in Skip Lists

Searching a skip list is much like searching a set of lists---you start with the coarsest grain list and find where in that list the key resides, then drop down to the next coarsest grain list and repeat the search. You keep repeating this process until eventually you reach the finest grain list, which either contains the key or does not. The pseudo-code for a search in a skip list looks as follows:
Search(list, searchKey)
    x := list->header

    -- loop invariant: x->key < searchKey
    for i = list->level downto 1 do
        while x->forward[i]->key < searchKey do
	    x = x->forward[i]

    -- x->key < searchKey <= x->forward[1]->key
    x = x->forward[1]
    if x->key == searchKey then
        return x->value
    else
        return failure

Insertion into a Skip List

Insertion into a skip list is like inserting the same value into i different lists, where i is the number of levels the coin flip chooses for this element. The algorithm simply finds the element's appropriate position in each of the lists and inserts it there.

Actually the following pseudo-code first determines whether the key is already in the list. If the key is already there, the pseudo-code simply replace's the key's record; otherwise it inserts the key and the record into the list. You should think about how this code can be modified to allow a list to contain multiple list elements with the same key (you might find such a question on a lab or a test!).

Here is the pseudo-code for insertion:

Insert(list, searchKey, newValue)

    -- update contains an array of pointers to the elements 
    -- which will be predecessors of the new element. 
    local update[1..list->MaxLevel]

    x = list->header
    for i = list->level downto 1 do
        while x->forward[i]->key < searchKey do
	    x = x->forward[i]
 	 -- x->key < searchKey <= x ->forward[i]->key 
	update[i] = x

    x = x->forward[1]
    if x->key == searchKey then
        x->value = newValue
    else
        newLevel = randomLevel(list)
        -- if the newLevel is greater than the current level of the list,
	-- knock newLevel down so that it is only one level greater than 
	-- the current level of the list. In other words, we will 
	-- increase the level of the list by at most one on each insertion.
	if newLevel > list->level then
	    newLevel = list->level + 1
	    list->level = newLevel
            update[newLevel] = list->header

	x = makeNode(newLevel, searchKey, newValue)
	for i = 1 to newLevel do
	    x->forward[i] = update[i]->forward[i]
	    update[i]->forward[i] = x

Deletion from a Skip List

Deletion from a skip list is like deleting the same value from i different lists, where i is the number of levels the coin flip chose for this element. The algorithm simply finds the element's predecessor in each of the lists, makes the predecessor point to the element that the deleted element points to, and finally deletes the element.

The following pseudo-code assumes there is only one element with the designated key in the list. You should think about how this code can be modified to allow a list to contain multiple list elements with the same key (e.g., several elements might have the same last name, but different first names). You might find such a question on a lab or a test!

Here is the pseudo-code for deletion:

Delete(list, searchKey)

    -- update contains an array of pointers to the
    -- predecessors of the element to be deleted.
    local update[1..list->MaxLevel]

    x = list->header
    for i = list->level downto 1 do
        while x->forward[i]->key < searchKey do
	    x = x->forward[i]
	update[i] = x

    x = x->forward[1]
    if x->key == searchKey then
        for i = 1 to list->level do
	    -- if the element to be deleted is a level j node, break out 
	    -- of the loop when level (j+1) is reached. Since the code
	    -- does not store the level of an element, we determine
	    -- that we have exhausted the levels of an element when
	    -- a predecessor element points past it, rather than to it.
	    if update[i]->forward[i] != x then break
	    update[i]->forward[i] = x->forward[i]
	free(x)

	-- if deleting the element causes some of the highest level lists
	-- to become empty, decrease the list level until a non-empty
	-- list is encountered.
	while list->level > 1 and
	         list->header->forward[list->level] == list->header do
	    list->level = list->level - 1

Choosing a Random Level

A level is chosen for an element in effect by flipping a coin that has probablility p of coming up heads. We keeping flipping until we get a "tails" or until the maximum number of levels is reached:
randomLevel(list)
    newLevel = 1
    --random() returns a random value in [0..1)
    while random() < list->p do
        newLevel = newLevel + 1
    return min(newLevel, list->MaxLevel)

Probabilistic Analysis

The probabilistic analysis of skip lists is beyond this course. However, it can be shown that the expected search, insertion, and deletion times are all O(lg n). The choice of p determines the variability of these search times (i.e., the standard deviation). Intuitively, decreasing p will increase the variability since it will decrease the number of higher-level elements (i.e., the number of "skip" nodes in the list). The Pugh paper contains a number of graphs that show the probability of a search taking significantly longer than expected for given values of p. For example, if p is 0.5 and there are more than 256 elements in the list, the chances of a search taking 3 times longer than expected are less than 1 in a million. If p is decreased to 0.25, the chances rise to about 1 in a thousand.

Choosing p

One might think that p should be chosen to be 0.5. If p is chosen to be 0.5, then roughly half our elements will be level 1 nodes, 0.25 will be level 2 nodes, 0.125 will be level 3 nodes, and so on. This will give us on average lg N search time and on average 2 pointers per node. However, empirical tests show that choosing p to be 0.25 results in roughly the same search time but only an average of 1.33 pointers per node. There is somewhat more variability in the search times and a greater chance of a search taking longer than expected, but the decrease in storage overhead seems to be worth it.