Constraint Implementation

In this set of notes we describe a number of algorithms for satisfying formulas, which in these notes we will call constraints.

Eager versus Lazy Evaluation

Constraints can be evaluated in either a lazy or eager fashion. A lazy evaluator re-evaluates a constraint only if it affects a result the user requests. Eager evaluation re-evaluates a constraint as soon as one of its inputs changes. Thus a lazy evaluation system can contain variables that are out-of-date. Lazy evaluation avoids unnecessary work if relatively few values are needed to compute the result the user requests. For example, if portions of a spreadsheet are off screen, they might not have to be re-evaluated. Lazy evaluation has two main drawbacks:

A constraint may not be evaluated when a user expects the constraint to be evaluated. This could happen because the application does not request the constraint's value. A simple fix to this problem is to allow a constraint to be marked as "eager", which means that the constraint should always be evaluated if one of its inputs changes.
If an edit introduces an error, the error may not be detected until much later in the program. The reason that the error may not be detected until later is that the constraint which detects the error may not be evaluated until much later. This problem is not so easy to solve and requires good debugging tools to help the programmer find where the error originates when the error is detected.

Eager evaluation is useful for immediately showing all the effects of any change. It will immediately detect any errors caused by the edited value. However, it can also unnecessarily re-evaluate constraints whose values are not currently needed. This unnecessary evaluation can slow an application's response time. If there are a large number of off-screen graphics or a lot of invisible graphics, this can be a particular problem.

Another problem with eager evaluation is that it can pre-maturely evaluate constraints. In other words, one or more a constraint's inputs may not have been initialized when the constraint's is evaluated by the evaluator. In this case the constraint may crash the application. Typically the programmer will have to use the language debugger to find the source of the crash, which is very difficult to do unless the programmer has an intimate knowledge of how the constraint solver is implemented.

A solution to the pre-mature evaluation problem can be devised as follows:

Allow the user to provide a default value for the constraint
If the constraint is evaluated pre-maturely, it can throw an exception.
When a constraint throws an exception, the constraint solver can return the constraint's default value

Data Structures Used by A Dataflow Constraint Solver

The fundamental data structure used by a dataflow constraint solver is a dataflow graph. The dataflow graph keeps track of dependencies among variables. The variables represent the vertices for the graph. There is a directed edge from an "input" variable to an "output" variable if the constraint for the output variable requests the value of the input variable. Formally the dataflow graph can be represented as G = {V, E}, where V represents the set of variables and E represents the set of edges.

When a variable is edited, a constraint solver can find all the constraints that depend on this variable by using a depth-first search to follow the edges in the dataflow graph.

A constraint solver also typically uses a number of fields for each variable:

Value: the current value of the variable
Dependencies: a list of dependencies that indicate which constraints depend on this variable
Formula: a formula that computes the constraint's value. It can be null in which case the variable is an "editable" variable. An editable variable is a variable that can be modified by a user. Variables that are determined by formulas should not be edited by the user.
Out_Of_Date: whether this variable's constraint needs to be re-evaluated.

The Importance of Incremental Solvers

One possible approach to constraint satisfaction is to re-evaluate all the constraints when a variable changes. However, many constraints may not depend on the changed variable so a great many constraints could be unnecessarily evaluated. This would decrease the responsiveness of the application. As a result, almost all constraint solvers use some sort of incremental algorithm that tries to evaluate only those constraints that depend on a changed variable.

The simplest possible incremental algorithm is the following one:

    
	Change(cell, new_value) {
	  if cell.value != new_value then
	    cell.value = new_value
	    For each var in cell.dependencies
		val = var.eval(var.formula)
		Change( var , val )
        }

Unfortunately, in the worst case, this algorithm is exponential in the number of variables that must be re-evaluated. In other words, if n variables have to be re-evaluated, this algorithm can evaluate as many as 2ⁿ constraints

The following example graph shows the exponential case:

	A -------> C --------> E    A -> B, B -> C, C -> D, D -> E
           \   /      \    /
	     B           D

	Change A
      	    Change(A) calls Change(C)
	    Change(C) calls Change(E)
	    Change(C) calls Change(D)
	    Change(D) calls Change(E)
            Change(A) calls Change(B)
            Change(B) calls Change(C)
               ...

A Spreadsheet Solver

In order to reduce this exponential complexity, we need to be a little smarter about how we evaluate the variables. Basically we want to topologically sort the variables and then evaluate them in topological order. A list of variables is topologically sorted if for any two variables, v_i and v_j, such that i < j (assuming that i and j denotes the variables' position in the list), either:

there is a directed path from v_i to v_j in the dataflow graph, or
there is no path between v_j and v_i in either direction in the dataflow graph.

The former condition says that v_j either directly or indirectly uses v_i as an input. Hence v_i should be evaluated prior to v_j. The latter condition says that v_i and v_j are independent of one another, and hence it does not matter in which order they are evaluated.

A simple way of topologically ordering variables is to perform a depth-first search of the dataflow graph starting at an edited variable. The depth-first search maintains a list of variables. It adds a variable to the list only after it has visited all of the variable's successors. Once all of the variable's successors have been added to the list, the variable itself can be added to the list, since by doing so, it will be evaluated before all of its successors. When the depth-first search terminates, the solver evaluates the variables in the order they appear on the list. This is the approach used by most spreadsheet solvers. It is an eager evaluation approach because all variables are immediately brought up to date.

The algorithm is as follows:

	Edit_Cell (cell, new_value) {
	    cell.Value = new_value
	    cells_to_be_evaluated = empty
	    /* variables_to_be_evaluated is a global stack */
	    Collect_Variables(variables_to_be_evaluated, cell)
	    for each var in variables_to_be_evaluated do
	        Get(var)

	Collect_Variables(variables_to_be_evaluated, cell)
	    for each var in cell.Dependencies do
	        if var.out_of_date = false then
		    var.out_of_date = true
		    Collect_Variables(variables_to_be_evaluated, var)
		    variables_to_be_evaluated.push(var)

	Get(v)
	   if v.out_of_date = true then
	       v.out_of_date = false /* essential for cycles */
	       v.value = v.eval(v.formula)
	   return(v.value)

The important thing to notice is that the out_of_date flags are set to false before a constraint is evaluated. Doing so ensures that any cycles will terminate when they revisit this variable. For example, suppose we have the two constraints a = b and b = a . Suppose that both a and b are marked invalid and that we call Get(a). a is marked up-to-date and then its constraint is evaluated. The constraint requests b's value. b is out-of-date, so it is marked up-to-date and its constraint is evaluated. Its constraint requests a's value. Since a has been marked up-to-date, it returns whatever old value it has and b's constraint terminates, followed by a's constraint terminating (both variables get a's old value). If we did not set the out_of_date flags to false until after a constraint is evaluated, we could get into an infinite loop with cycles. Check out what happens in the above circular case and you will see that a and b end up in an infinite cycle, requesting each other's value.

Establishing Dependencies

In a constraint system, it is often nice to be able to automatically construct the edges of a dataflow graph without forcing the user to declare what the edges are. In a spreadsheet, the user does not have to declare the edges because the formulas are so simple they can be parsed by a parser. However, if you allow a constraint to have arbitrary code, you may not want to write a parser to find out all the variables that the constraint uses. Further, if you also allow a constraint to have loops, then your parser may not even be able to discover all the variables that the constraint may reference. Consequently it would be nice if the constraint solver could automatically construct the edges of the dataflow graph. It turns out that this is possible if we make the constraint satisfaction phase a bit more sophisticated.

What we need to do to automatically construct dependency edges is to keep track of which cell requested a variable's value (in the following prose I will use cell and variable interchangeably). If we know which cell requested a variable's value, then we can add the cell to the variable's dependency list. An easy way to remember which cell requested a variable's value is to keep a stack of cells. Each time a cell's formula is about to be evaluated, the cell is pushed onto the stack. When the cell's formula is finished executing, the cell is popped off the stack. The cell that requests a variable's value is always the topmost cell on the stack. So a variable can establish a dependency to the appropriate cell by simply looking at the top of the stack.

Here is the algorithm for doing that (the arrows denote the statements that have been added to the new Get routine):

	Get(v)
->	   if cell_stack != empty then
->		v.dependencies.insert(cell_stack.top())

	   if v.out_of_date = true then
	       v.out_of_date = false /* essential for cycles */
->	       cell_stack.push(v)
	       v.value = v.eval(v.formula)
->	       cell_stack.pop(v)
	   return(v.value)

A number of things should be noted about this algorithm. First, the cell_stack is a global variable that is initialized at the start of the program. We need to check whether the cell_stack is empty because the variable may be requested by the application rather than a constraint. In this case, no dependency should be created.

Second, the code that inserts a cell into the dependency list should check to see that the cell is not already on the dependency list.