Names, Scopes, and Bindings


  1. Definitions
    1. Name: Identifiers that allow us to refer to variables, constants, functions, types, operations, and so on
    2. Binding: An association of a name with an object
    3. Scope: The lifetime of a binding of a name to an object
    4. Referencing environment: The complete set of bindings in effect at a certain point in a program
  2. Binding Times
    1. Language design time: The syntax and semantics of a language are typically set at language design time. Examples would be the control constructs, how variable declarations are written, whether static or dynamic scoping is used.
    2. Language implementation time: Most language manuals leave a variety of issues to the discretion of the language implementor. Typical examples might be the precision (number of bits) of the fundamental types, the organization and maximum sizes of the stack and the heap, and the handling of run-time exceptions, such as arithmetic overflow.
    3. Compile time: Compiler associates names with objects, for example, a type with a variable or the layout of a stack allocated object with a variable.
    4. Link time: References to functions may not be resolved until separate .o files are compiled together. Then a placeholder reference can be replaced with an "address" (actually an offset from a relocation register)
    5. Load time: A program is actually assigned physical memory when it executes
    6. Run time: Binding is delayed until run-time, such as dynamic typing in interpreted languages (the type of a variable cannot be determined until it is assigned a value in dynamic typing), the assignment of a heap-allocated object to a pointer variable, or the binding of a virtual method to a function call
    7. Static vs dynamic binding
      1. Static binding is any binding that occurs prior to run-time (either compiler or link time)
      2. Dynamic binding is any binding that occurs at run-time (either load or run time)
    8. Earlier binding decisions are associated with faster code (static typing), whereas later binding decisions are associated with more flexible code (dynamic typing, pointers point to different objects)
  3. Scope Rules: Determine the value of a variable when it is not in the current scope
    1. Types of Scope
      1. Static Scope: Scope can be determined at compile-time, typically by finding a matching declaration in the most closely enclosing block
        1. C/C++ only have the global level and the function level
        2. Some languages, like Python, Ada, or Pascal, allow functions to be nested within other functions (e.g., fig 3.4 and fig 3.5 in the Scott text)
        3. Implemented by maintaining a static link in each frame that points to the "parent" frame, which is the frame of the most recent invocation of the lexically surrounding subroutine. The number of links can be statically determined, so a fixed set of instructions can be emitted to follow the links.
        4. Many languages, including Java, allow nested inner classes, which are a form of static scoping. Each instance of an inner class can access the environment of its enclosing classes. Each inner class maintains a static class link to its "enclosing" object, so that at runtime the class chain can be followed to find a reference to an enclosing instance variable. As with static links, a fixed set of instructions can be emitted to follow the links.
        5. Inheritance is another form of nesting, but in this case scope resolution is done at compile time by following class pointers up the inheritance hierarchy. Since an object's record is simply a collection of all of its ancestor variables, plus its own variables, the compiler can insert a direct reference to the appropriate variable in the object's record, rather than requiring a run-time search of the class hierarchy.
        6. Some functional languages and many scripting languages do not require variable declarations in functions, and object-oriented languages do not require variable declarations a long as the reference is to an instance variable. Scripting languages handle undeclared variables in different ways.
          1. Python treats any variable that gets assigned a value as a local variable. A programmer can import a global variable using the global keyword
          2. Perl treats any variable that gets assigned a value as a global variable. A programmer can declare a variable as local using either the my keyword for static scoping (newer and preferred), or the local keyword for dynamic scoping (older and generally not wanted)
      2. Dynamic Scope: Scope cannot be determined until run-time, typically by following stack frames until a matching declaration is found.
        1. The number of links is unknown, so a loop must be used to follow the chain.
        2. Used in older languages, like Lisp, and initially in Perl. Perl now has a way of specifying static scope as well.
        3. Provides a convenient way to allow access to variables without passing parameters, but it can make code difficult to follow when reading it statically and code can break if a new subroutine with the same declaration gets inserted during maintanance
        4. See figure 3.9 in the Scott text for an example of the different outcomes that result when using static versus dynamic scoping
  4. Aliasing
    1. Definition: More than one name refers to the same object (aliasing usually is caused by references or pointers)
    2. Problem for compilers: Interferes with optimization. Here's an example:
      int a, b;
      int *p, *q;
      
      a = *p
      *q = 3
      b = *p; 
      
      The compiler would like to store the value of *p into a register and then assign this register value to b, without having to reload *p. However, if q refers to the same object as p, then the register value is no longer valid. Unless the compiler can prove that q and p can never point to the same object, the compiler must reload the value of *p.
    3. Languages without aliasing, like Fortran, require pointer-based data structures to be implemented using arrays, which is much more efficient, but less flexible. Arrays have contiguous storage and pre-allocate large blocks of memory, which is what makes them more efficient, plus they do not introduce an aliasing problem.
    4. In C99, you may add the restrict qualifier to a pointer declaration. This qualifier asserts that the object to which the pointer refers has no alias in the current scope.

  5. Overloading Built-in Operators: Some languages, such as C++, provide ways to overload built-in operators, such as + or (). Other languages, such as Java, do not allow operator overloading, on the premise that it makes code difficult to statically read, since the reader cannot know which version of the operator will get called.

    1. Forms of operator overloading

      1. Prefix overloading: The operator is treated as though it belongs to a class and takes a single argument (the first argument is the class). For example, in C++, you might have the following definition for an operator that adds two complex numbers:
        class complex {
            double real, imaginary;
            ...
            public:
               complex operator+(complex &other) {
                   return complex(real + other.real, imaginary + other.imaginary);
        };
        ...
        complex A, B, C;
        C = A + B;  // treated by C++ as C = A.operator+(B)
        
      2. Infix overloading: The operator is declared outside a class and its arguments are all provided in an argument list. This handles situations where a class object might not be the left operand. For example, how would one handle the following case with prefix overloading:
        string lname = "vander zanden";
        string name = "brad" + lname;
        
        An example of infix overloading in C++ that handles the above case might be:
        string operator+(char *operand1, string operand2) {
           return string(operand1) + operand2;
        }
        
        int main() {
          string lname = "vander zanden";
          string name = "brad " + lname;
        }
        
  6. Binding of Referencing Environments: When a subroutine can be passed as a parameter to a function, an issue arises as to whether the referencing environment it uses when it is called is the referencing environment in effect when the subroutine is passed as a parameter, or when the subroutine is actually called. It is called deep binding if the referencing environment is the one in effect when the subroutine is passed as a parameter and shallow binding if it is the one in effect when the subroutine is called. Notice that deep binding corresponds to an early binding of the referencing environment and shallow binding corresponds to a late binding of the referencing environment.

    1. Example: Look at the following C++ pseudo-code and assume that it is dynamically scoped
      class person {
         int age;
      }
      
      int threshold;
      database people;
      
      boolean older_than_threshold(person p) {
          return p.age >= threshold;
      }
      
      void print_person(person p) {
          // call appropriate I/O routines to print person on standard output
          // make use of nonlocal variable line_length to format data in columns
      }
      
      void print_selected_records(database db, procedure predicate, print_routine) {
          int line_length;
         
          if (device_type(stdout) == terminal) 
              line_length = 80
          else  // standard line length on older printers or files
              line_length = 132
          
          foreach person p in db do
              if predicate(p) 
                  print_routine(p)
      }
      
      main {
         threshold = 35;
         print_selected_records(people, older_than_threshold, print_person);
      }
      
      1. shallow binding: you would probably want shallow binding for the print routine, since you would want it to pick up the value of line_length set in print_selected_records
      2. deep binding: you would probably want deep binding for threshold, since it appears that you are trying to alter the value of threshold in main and hoping that that value is used for older_than_threshold
    2. Passing Functions as Parameters
      1. Design Considerations
        1. Statically scoped languages typically use deep binding, since you are usually looking to pick up the static scope, not the dynamic scope of a variable (note that shallow binding is essentially saying that you care about what's on the stack when the function is called, which is what dynamic scoping is all about)
        2. Dynamically scoped languages typically use shallow binding, since you typically do care about what's on the stack when the function is called
      2. Implementation
        1. Shallow binding: Trivial--same as dynamic scoping
        2. Deep binding: Need to save the current referencing environment as well as a pointer to the function. The bundle as a whole is referred to as a closure
          1. Statically scoped language: Save a pointer to the function and the static link that S would use right now if it were called in the current context. When S is finally called, we temporarily restore the saved static link, rather than creating a new one. When S follows its static chain to access a nonlocal object, it will find the object instance that was current at the time the closure was created. The instance may not have the value it had at the time the closure was created, but its identity, at least, will reflect the intent of the closure's creator. Note that you are passing a nested function to a more deeply nested function, so when S is called, the less deeply nested context is further down the stack.
          2. Dynamically scope language: Beyond the scope of this course, although our examples will assume that the closure contains references to variables, rather than copies of the variables' values. Hence changes made by a function will be to the current value of the variable, not the value in effect when the function was passed as a parameter.
    3. Returning Functions as Return Values
      1. Definitions
        1. First class value: can be returned from a function, passed as a parameter, and assigned to variables (e.g., functions in scripting languages, C, C++, functional languages)
        2. Second class: can be passed as a parameter but not returned from a function or assigned to a variable (e.g., functions in some languages)
        3. Third class: cannot be passed even as a parameter (e.g., a label)
      2. Implementation of closures when nested functions may be returned as a return value (note that this is not a problem for non-nested languages such as C)
        1. Example Problem:
          def plus_x(x):
             return def lambda(y):
                      return x+y
          
          plus_x returns an unnamed function (lambda functions denote unnamed functions in functional languages and have been adopted by scripting languages such as Python). This unnamed function takes y as a parameter, adds it to x, and returns the sum. The problem is that in a normal imperative language, x is reclaimed once plus_x exits, and hence would be undefined when our unnamed function is called. This problem is referred to as unlimited extent. Local variables with unlimited extent cannot be reclaimed. This in turn means they cannot be allocated on the stack but instead must be allocated on the heap.

        2. Functional languages: Functional languages simply allocate local objects on the heap and use garbage collection to reclaim them when they are no longer needed
        3. Imperative languages: Imperative languages like to maintain stack-based allocation and reclamation of local objects, so they may place restrictions on returning nested functions
          1. Languages like C and C++ that do not permit nested functions do not have this problem.
          2. Most other imperative languages, like Module-2 an Module-3, only allow outermost functions to be returned as return values and make nested functions either second or third class values
      3. Mimicing the context of nested functions with object closures: In object-oriented languages, we can encapsulate our function as a method and let the object's fields hold context for the method. Here's a Java example for the plus_x problem:
        interface IntFunc {
           int call(int i);
        }
        
        class PlusX implements IntFunc {
           int x;
           public PlusX(int n) { x = n; }
           public int call(int i) { return i + x; }
        }
        ...
        IntFunc f = new PlusX(2);
        System.out.println(f.call(3));  // prints 5
        
        The interface defines an abstract type for objects enclosing a function from integers to integers. Whereas Python would capture the value of x in a function closure, Java captures it in an object-closure (also called a function object or functor).