Spreadsheet Implementation

Brad Vander Zanden

I. Characteristics of a Spreadsheet

    A. A spreadsheet is a rectangular grid of cells
    B. Each cell may contain a value or a formula that computes a value
        based on other cells' values.
    C. When a cell is edited, all cells whose formulas depend on it, either
        directly or indirectly, are updated.

II. How the Spreadsheet Solver Works

    A. The spreadsheet solver maintains a directed graph

        1. The vertices are cells
	2. There is a directed edge from cell i to cell j if and only
	    if cell j has a formula that depends on cell i. We will
	    denote the edge as (c_i, c_j). 

    B. When the user edits a cell, the spreadsheet solver performs a
        depth-first search of the graph starting at the edited cell.

	1. The spreadsheet solver uses the depth-first search to create
	   a topological ordering of the affected cells

	   a. A topological ordering of the cells is an ordering which
	      satisfies the following property: if both c_i and
	      c_j are in the topological order and there is an edge
	      (c_i, c_j), then c_i appears before c_j.

	   b. The topological order is implemented as a stack. The first
	      cell in topological order is the top element of the stack
	      and the last cell in topological order is the bottom element
	      of the stack.

	   c. A cell is pushed onto the stack after all its neighbors have
	      been visited. This occurs right before a cell returns from
	      the depth-first search. Notice that if we push a cell
	      onto the stack after all its neighbors have been visited,
	      then all its neighbors are already on the stack. Since all
	      its neighbors are on the stack, they are after the
	      cell in the topological order. Therefore when we push the cell
	      onto the stack, the topological ordering property is maintained.
	      
	2. Once the spreadsheet solver completes the depth-first search,
	      it pops the cells off the stack and evaluates their
	      formulas.

    C. C++ pseudocode for the editing of a cell might look as follows:

        // FormulaStack is a class variable that keeps track of the
	// order in which cells should be re-evaluated.
        Cell::set_value(v) {
	    value = v;
	    dfs();
	    while (!FormulaStack.IsEmpty()) {
	        cell = FormulaStack.Pop()
		cell.visited = false
		if (cell.formula != 0)
		    cell.value = cell.formula()
	    }
	}

	Cell::dfs() {
	    visited = true;
	    for each cell in this->adj_list {
	        if (!cell.visited) 
		    cell->dfs()
	    }
	    FormulaStack.Push(this)
	}

    D. Adding a Formula to a Cell: If we add a formula to a cell, the code
        is identical to the code above except that we replace the lines:
	
	Cell::set_value(v)
	    value = v

	with the lines:

	Cell::set_value(f)
	    formula = f
	
    E. Removing a Formula from a Cell: When we edit the value of a cell,
        we will assume that any pre-existing formula is removed. When we
	remove a formula, we need to remove any directed edges that
	point to this cell, since the cell no longer depends on these
	other cells. In order to do this, we keep a set of backpointers
	to these cells. We call these cells parameter cells and the list
	of backpointers the parameter list. We can remove the directed
	edges by traversing the backpointers list and removing the cell
	from the adjacency list of each parameter cell:

	Cell::RemoveFormula () {
	    for each cell in parameter_list {
	        // the remove method removes its parameter from the 
		// appropriate list
	        cell.adj_list.remove(this);
	    }
        }

	The two set_value methods must now have a line added to them:

	Cell::set_value(v)
	    this->RemoveFormula()
	    value = v
	    ...

	Cell::set_value(f)
	    this->RemoveFormula()
	    formula = f
	    ...

III. How the Dependency Graph Gets Created

    A. A real spreadsheet: In a real spreadsheet a parser parses the 
        formula and determines which cells the formula depends on. It
	then adds the cell to which the formula is attached to the
	adjacency lists of each of the formula's parameter cells.

    B. Our spreadsheet: We don't want to have to write a parser so we
        need another way.

	1. Strategy 1: We can make the programmer provide a list of
	    parameter cells to set_value along with a formula. set_value
	    can then add the cell to the adjacency lists of each of the
	    parameter cells. This solution is a terrible one for two
	    reasons:

	    a. it forces the programmer to create a list of parameter cells
	       each time the programmer wants to add a formula to a cell.
	       The code to create the list can be cumbersome and time 
	       consuming to write.

	    b. the programmer may inadvertently leave a cell off the list
	       or include a spurious cell on the list. In either case the
	       resulting dependency graph will be incorrect.

	2. Strategy 2: We can automatically build the dependency graph
	    by observing the parameter cells that a formula requests when
	    it is executed. Our strategy will assume that the formula
	    has no conditionals. Things get a bit more complicated if
	    the formula does have conditionals. When a formula is about to be 
	    executed, we store a pointer to the formula's cell into a
	    class variable called current_cell. When the
	    formula requests the value of a parameter cell, the parameter
	    cell can determine which cell requested it by looking at
	    the value of current_cell. The parameter cell can then add this
	    cell to its adjacency list. The code looks as follows:

	    // FormulaStack is a class variable that keeps track of the
	    // order in which cells should be re-evaluated.

	    Cell::set_value(v) {
	        value = v;
		dfs();
		while (!FormulaStack.IsEmpty()) {
	            cell = FormulaStack.Pop()
		    cell.visited = false
		    if (cell.formula != 0)
		        current_cell = this
		        cell.value = cell.formula()
			current_cell = 0
	        }
	    }

	    Cell::get_value() {
	        if (current_cell != 0) {
		    if (!this.adj_list.member(current_cell)) {
		        this.adj_list.append(current_cell)
			current_cell.parameter_list.append(this)
		    }
		}
		return value
	    }