Data Types

  1. Advantages of Types
    1. Provide implicit context for many operations so that programmer does not have to explicitly specify the operation (e.g., integer arithmetic for a+b)
    2. Types limit the set of operations that may be performed in a semantically valid program. This prevents nonsensical operations from taking place.
  2. Type Issue for Compilers: Ensuring that types provided to an operation are compatible with that operation
    1. Type system = 1) a mechanism to define types and associate them with language constructs and 2) a set of rules for type equivalance, type compatibility, and type inference
      1. type equivalence: types of two values are the same
      2. type compatibility: when a value of a tyle can be used in a given context (e.g., an integer can be used in floating point arithmetic because it can be converted to a floating point number or in C++ a char * is an appropriate argument to a method expecting a string because the string class has a constructor that takes a char * and produces a string)
      3. type inference: defines the type of an expression based on the types of its constitutent parts
    2. Ways of Viewing a Type
      1. Abstract: A type is a set of values and a set of operations that manipulate those values (often associated with an oo style of programming)
      2. Constructive: Define a set of primitive types and a set of type constructors for creating more complex types (often associated with an imperative style of programming)

        Example: Here is an example of a constructive definition of a type system: A type expression may be inductively defined as follows:

        1. A basic type is a type expression (e.g., real, boolean, char).
        2. A type name is a type expression (e.g., typedef foo = int[])
        3. A type constructor applied to a type expression is a type expression:
          1. arrays: If T is a type expression and I is an index set, then array(T, I) is a type expression.
          2. records: If T1, T2, ..., Tn are type expressions, then their Cartesian product T1 X T2 X ... X Tn is a type expression which is commonly referred to as a record
          3. sets: A set is a collection of distinct elements of a base type. Its concrete realization in languages is usually as either an enumeration type (e.g., {Mon, Tues, Wed, ...}) or a subrange type (e.g., type grade = 1..100) .
          4. Pointers: If T is a type expression, then pointer(T) is a type expression. Pointers are often used to implement recursive types. A recursive type is one in which an object of type T may contain references to other objects of type T (e.g., a list node that contains next and prev fields)
          5. Lists: If T is a type expression, then List(T) represents a sequence of elements, but there is no notion of mapping or indexing, as in an array. A list is defined recursively as either an empty list or a pair consisting of a head element and a reference to a sublist. Lists typically have a variable size while arrays typically have a pre-determined size. Some languages, such as many scripting languages or functional languages, support lists as a built-in type.
          6. Files: If T is a type expression then File(T) represents data of that type on a mass-storage device, outside the memory in which other program objects reside. The type T is normally a record. Files are conceptually much like an array, in that they map an integer position to a record.
          7. Functions: A function maps a domain type D, which are the function's parameters, to a range type R, which is the function's return type. Hence D -> R is a type expression. D is typically represented as a cartesian product T1 X T2 X ... X Tn.

    3. Type Checking:
      1. Purpose: verify that the type of a construct matches the type expected by its context.

        Example: The arithmetic operator + expects integer operands, so the type checker must verify that its operands are integers

        Type checking requires establishing rules for type equivalence, type compatibility, and type inferencing

      2. Types of Type Checking
        1. Static checking: type checking done at compile time
        2. Dynamic checking: type checking done at run time
        3. Strongly typed language: a language in which the compiler or the run-time system can guarantee that the programs it accepts will execute without type errors

          1. orthogonal to static/dynamic type checking
          2. Python is a strongly and dynamically typed language

      3. Equivalence of Type Expressions

        1. Structural equivalence: Names are replaced by the type expressions they define. If the resulting type expressions have the same structure, they are equivalent.

        2. Name equivalence: Names are not replaced by the type expressions they define. Two expressions are equivalent if and only if they are structurally equivalent without name substitution.

          Example:

          	      typedef link struct *cell;
          
          	      link next, last;
          	      struct *cell p;
          	      struct *cell q,r;
          
          1. Under structural equivalence, all the variables are type equivalent
          2. Under name equivalence, next and last are type equivalent and p, q, and r are type equivalent.

        3. C, C++, and Java use name equivalence
        4. Variants of name equivalence: Consider the following two cases:
            typedef stack_element int
            stack_element pop(Stack *);
          
            int a = pop(myStack);   // type error?
          
            typedef fahrenheit int
            typedef celcius int
          
            fahrenheit f;
            celcius c;
           
            c = f;  // type error?
          
          In the stack example, most programmers probably consider it okay to assign the result of pop to a, and would be annoyed if the compiler raised a type error. In contrast, the intent of the type declarations in the second example seems to be to create two separate types representing two different temperature systems, and we would probably want the compiler to raise a type error for the assignment of f to c.

          1. strict name equivalence: a language in which aliased types are considered distinct (hence raises a type error for the case c = f)
          2. loose name equivalence: a language in which aliases types are considered equivalent (hence allows the stack assignment)
          3. C/C++ use loose name equivalence. Java does not have a way to create aliased types.
          4. Ada has a mechanism that allows the programmer to specify whether they desire loose or strict name equivalence:
              subtype stack_element is integer;    // loose name equivalence
              type celsius_temp is new integer;    // strict name equivalence
              type fahrenheit_temp is new integer; // strict name equivalence
            
            Note that the first declaration defines stack_element as a subtype of integer, thus implying the two types are similar. In contrast the last two declarations introduce new types.
    4. Type Conversions Versus Type Coercion
      1. Definitions
        1. Type Conversion: An explicit request by the programmer for the compiler to insert code into the program to convert one type to another, compatible type. For example:
          int x;
          float y = 6.3;
          x = (int)y;
          
        2. Type Coercion: The compiler performs an implicit conversion from one type to another, compatible type without informing the user. This conversion could cause a logic error at run-time if it is not dynamically checked. For example:
          int x;
          short y;
          y = x;
          
          The C compiler does an implicit coercion by taking the low-order 15 bits of x and assigning them to y. It also preserves the sign bit. This works well if the value of x fits in 15 bits and poorly otherwise.

      2. Type Conversions in C++: C++ allows the programmer to specify three types of type conversions:
        1. static_cast(value): e.g., static_cast(6.3). This type of cast can be checked by the compiler at compile time, and it will insert code to perform the cast if it knows how to do it. You can also static cast a super class to a sub class. This is not safe, but the compiler will not complain.
        2. dynamic_cast(value): For example:
          Superclass *p = new Subclass();
          Subclass *s = dynamic_cast(p);  // success
          
          Superclass *p = new Superclass();
          Subclass *s = dynamic_cast(p);  // failure--null ptr returned
          
          A dynamic cast is not checked until run-time and may cause either a run-time error, or some error value. In C++, the return result of an invalid cast is a null pointer. Note that this is different than a static cast, where the compiler will perform the downcast from a superclass to a subclass and hope for the best.
        3. reinterpret_cast(value): A reinterpret_cast is often called a non-converting cast. It asks the compiler to literally interpret the bits it sees as a different type. For example, a high-performance scientific computing application might do a non-converting cast on the exponent portion of a floating point number to interpret it specifically as an integer. In this case, the program is taking advantage of its knowledge of how the machine architecture is representing a floating point number internally. This quite possibly makes the cast non-portable. Another example of a non-converting cast would be a cast from a char to an unsigned integer. However, you would normally do this either with a type coercion (unsigned int x = 'a') or a static cast (static_cast('a')). The C++ compiler should be smart enough to treat it as a non-converting cast.