CS302 --- Skip Lists
- Reference: Pugh, W. Skip Lists: A Probabilistic Alternative to Balanced
Trees. CACM, 33(6), June 1990, 668-676.
Overview
Skip lists provide a way to keep a list of elements sorted and yet support
search, insert, and delete operations in an expected time of O(log n). We
say an expected time rather than a guaranteed time because skip lists rely
on a probablistic algorithm to keep the list elements sorted. It
can be shown that the chances of a search taking more than three times the
expected search time for a list of 250 or more elements is less than one
in a million. In exchange for foregoing a guaranteed worst-case performance,
we obtain an algorithm that is much easier to implement than balanced tree
schemes and that performs insert and delete operations much faster, on average,
than balanced tree schemes.
Key Properties of Skip Lists
- Some elements, in addition
to pointing to the next element, also point to elements even further
down the list.
- A level k element is a list element that has k forward
pointers. The first pointer points to the next element in the list, the
second pointer points to the next level 2 element, and in general, the
ithpointer pointer to the next level i element.
- The level of an element is chosen in essence by flipping a coin.
For example, we could think of flipping a coin until it comes up tails.
We count the number of times the coin came up heads before coming up tails
and add 1 to this number. This number represents the level of the element.
More generally, we choose a
fraction p between 0 and 1, so that a fraction p of elements
with level i pointers will have level i+1 elements as well.
On average, (1-p) elements will be level 1 elements, (1-p)^2 elements
will be level 2 elements, (1-p)^3 will be level 3 elements, and so on.
This scheme corresponds to flipping a coin that has a p chance of
coming up heads and a (1-p) chance of coming up tails.
- The name skip lists comes from the fact that you can use the
additional pointers to skip over intermediate elements and get to the
desired element more quickly.
Data Structures
- Skip List Node:
- forward: a dynamic array of pointers to skip list nodes. The size
of the dynamic array is determined at run-time when the coin
is flipped to determine the number of levels this node will
contain.
- key: the key of the record
- val: a pointer to the record
- p : the chances of getting a "head"
The C declaration for a skip list node looks as follows:
typedef struct skip_list_node {
struct skip_list_node **forward;
int key;
void *val;
float p;
} *SkipListNode;
- Skip List:
- header: a pointer to a dummy skip list node that contains the
initial set of forward pointers.
- level: the current number of levels in the skip list.
- MaxLevel: the maximum number of levels to which a skip list can grow.
The C declaration for a skip list looks as follows:
typedef struct skip_list{
SkipListNode header;
int level;
int MaxLevel;
} *SkipList;
Skip List Initialization
- Set the level of the skip list to 1.
- Use the header node as both the header and the last node. Hence make
all the forward pointers of the header point to the header:
for i = 1 to MaxLevel do
header->forward[i] = header
- To avoid having to check for the end of the list, store a sentinel
value greater than the maximum possible value of a key into the
last node (which is actually the header node). For example:
header->key = MaxInt;
Search in Skip Lists
Searching a skip list is much like searching a set of lists---you start
with the coarsest grain list and find where in that list the key resides,
then drop down to the next coarsest grain list and repeat the search. You
keep repeating this process until eventually you reach the finest grain list,
which either contains the key or does not.
The pseudo-code for a search in a skip list looks as follows:
Search(list, searchKey)
x := list->header
-- loop invariant: x->key < searchKey
for i = list->level downto 1 do
while x->forward[i]->key < searchKey do
x = x->forward[i]
-- x->key < searchKey <= x->forward[1]->key
x = x->forward[1]
if x->key == searchKey then
return x->value
else
return failure
Insertion into a Skip List
Insertion into a skip list is like inserting the same value into i
different lists, where i is the number of levels the coin flip
chooses for this element.
The algorithm simply finds the element's appropriate position in each of
the lists and inserts it there.
Actually the following pseudo-code first
determines whether the key is already in the list. If the key is already
there, the pseudo-code simply replace's the key's record; otherwise it
inserts the key and the record into the list. You should think about
how this code can be modified to allow a list to contain multiple
list elements with the same key (you might find such a question on a lab
or a test!).
Here is the pseudo-code for insertion:
Insert(list, searchKey, newValue)
-- update contains an array of pointers to the elements
-- which will be predecessors of the new element.
local update[1..list->MaxLevel]
x = list->header
for i = list->level downto 1 do
while x->forward[i]->key < searchKey do
x = x->forward[i]
-- x->key < searchKey <= x ->forward[i]->key
update[i] = x
x = x->forward[1]
if x->key == searchKey then
x->value = newValue
else
newLevel = randomLevel(list)
-- if the newLevel is greater than the current level of the list,
-- knock newLevel down so that it is only one level greater than
-- the current level of the list. In other words, we will
-- increase the level of the list by at most one on each insertion.
if newLevel > list->level then
newLevel = list->level + 1
list->level = newLevel
update[newLevel] = list->header
x = makeNode(newLevel, searchKey, newValue)
for i = 1 to newLevel do
x->forward[i] = update[i]->forward[i]
update[i]->forward[i] = x
Deletion from a Skip List
Deletion from a skip list is like deleting the same value from i
different lists, where i is the number of levels the coin flip
chose for this element.
The algorithm simply finds the element's predecessor in each of
the lists, makes the predecessor point to the element that the deleted
element points to, and finally deletes the element.
The following pseudo-code assumes there is only one element with
the designated key in the list. You should think about
how this code can be modified to allow a list to contain multiple
list elements with the same key (e.g., several elements might have
the same last name, but different first names). You might find such a
question on a lab or a test!
Here is the pseudo-code for deletion:
Delete(list, searchKey)
-- update contains an array of pointers to the
-- predecessors of the element to be deleted.
local update[1..list->MaxLevel]
x = list->header
for i = list->level downto 1 do
while x->forward[i]->key < searchKey do
x = x->forward[i]
update[i] = x
x = x->forward[1]
if x->key == searchKey then
for i = 1 to list->level do
-- if the element to be deleted is a level j node, break out
-- of the loop when level (j+1) is reached. Since the code
-- does not store the level of an element, we determine
-- that we have exhausted the levels of an element when
-- a predecessor element points past it, rather than to it.
if update[i]->forward[i] != x then break
update[i]->forward[i] = x->forward[i]
free(x)
-- if deleting the element causes some of the highest level lists
-- to become empty, decrease the list level until a non-empty
-- list is encountered.
while list->level > 1 and
list->header->forward[list->level] == list->header do
list->level = list->level - 1
Choosing a Random Level
A level is chosen for an element in effect by flipping a coin that has
probablility p of coming up heads. We keeping flipping until
we get a "tails" or until the maximum number of levels is reached:
randomLevel(list)
newLevel = 1
--random() returns a random value in [0..1)
while random() < list->p do
newLevel = newLevel + 1
return min(newLevel, list->MaxLevel)
Probabilistic Analysis
The probabilistic analysis of skip lists is beyond this course. However,
it can be shown that the expected search, insertion, and deletion times
are all O(lg n). The choice of p determines the variability
of these search times (i.e., the standard deviation). Intuitively, decreasing
p will increase the variability since it will decrease the number
of higher-level elements (i.e., the number of "skip" nodes in the list).
The Pugh paper contains a number of graphs that show the probability of
a search taking significantly longer than expected for given values of
p. For example, if p is 0.5 and there are more than
256 elements in the list, the chances of a search taking 3 times longer
than expected are less than 1 in a million. If p is decreased
to 0.25, the chances rise to about 1 in a thousand.
Choosing p
One might think that p should be chosen to be 0.5. If p
is chosen to be 0.5, then roughly half our elements will be level 1 nodes,
0.25 will be level 2 nodes, 0.125 will be level 3 nodes, and so on. This
will give us on average lg N search time and on average 2 pointers per
node. However, empirical tests show that choosing p to be 0.25
results in roughly the same search time but only an average of 1.33
pointers per node. There is somewhat more variability in the search times
and a greater chance of a search taking longer than expected, but the
decrease in storage overhead seems to be worth it.