Introduction to Databases


  1. Sample Uses of Databases
    1. Purchases from the supermarket
    2. Purchases using your credit card
    3. Booking a vacation with a travel agent
    4. Running a property-management business
  2. Some Definitions
    1. Database: A collection of related data and its description
    2. Database Management System (DBMS): Software that manages and controls access to the database. A DBMS typically provides the following functions
      1. Data Definition Language (DDL): Defines the structure of the database by allowing users to specify the tables, record fields and data types, and constraints on the data to be stored in the database (e.g., no salary can be more than $40,000 or no property manager can manage more than 100 properties).
      2. Data Manipulation Language (DML): A query language that allows users to insert, update, delete, and retrieve data from the database
      3. Access control
        1. security: prevents unauthorize users from accessing the database
        2. integrity: maintains the consistency of stored data (e.g., a person's salary will not appear differently in two different parts of the database).
        3. concurrency: allows multiple users to use the database simultaneously, without introducing integrity problems
        4. recovery control: restores the database to a previous consistent state following a hardware or software failure
        5. user-accessible catalog: contains descriptions of the data in the database
        6. views: allows each user to have an individualized view of the database
          1. provide security: can limit fields that are accessable by certain classes of users
          2. customization: fields can be given names that are more meaningful for a certain group of users
          3. data abstraction: hides changes to the database that do not affect the data presented by this view
          4. reduces complexity: users only needs to know about data that they care about
    3. Database Application: A program that interacts with the database at some point during its execution.
  3. File-Based Systems: An Alternative Paradigm
    1. What it is: A collection of un-related files and a collection of application programs that perform services for the end-users, such as the production of reports. Each program defines and manages its own data
      1. Traditionally each department in a company would maintain its own collection of files. For example, the sales office might keep track of properties for rent, clients, and owners. The contracts office might keep track of leases, properties, and clients (pp 9-11).
        1. The files might contain redundant information
        2. The files might contain different fields, depending on the informational needs of the two departments. For example, the sales office might need to keep property information about the number of rooms and the owner, while the contracts office might not need such information
      2. The data processing department would write programs for each application each office needed performed. The queries and the file structure would be hard-coded within each program
    2. Limitations of the File-Based Approach
      1. Separation and isolation of data: It is hard to cross reference information from two files, especially if those files exist in two different departments.
        1. For example, if we want to produce a list of all properties owned by Joe Keogh, we have to write a program that locates his ownerNo in the PrivateOwner file and then use his ownerNo to retrieve all records from the PropertyForRent file that match his ownerNo.
        2. If we further wanted all clients who live in Joe Keogh properties, we would have to ask the contracts office for access to their Lease file. The Lease file would allow us to look up the clientNo associated with each leased propertyNo, and then we could use the clientNo to search our own Client file for the name of the client.
      2. Duplication of data: Several departments might keep redundant data. For example, both the sales and contracts office might keep information about clients and properties.
        1. Duplication wastes time and money since data is entered more than once
        2. Duplication takes up additional storage space
        3. Duplication introduces the risk of a loss of data integrity, since changes to the data may not be accurately reflected in all files. For example, a client might notify the sales office of a change in telephone number, but the sales office might forget to forward the changed telephone number to the contracts office, leading to the contracts office having an out-of-date telephone number for the client.
      3. Incompatible File Formats: Different departments often used different programming languages for maintaining and accessing their files, which made it difficult to share information between departments.
        1. For example, the sales office might be a Cobol shop and the contracts office might be a C shop.
        2. It would then be difficult for the sales office to use the contracts office leases file, because the file formats would most likely be incompatible (Cobol and C use different file formats).
      4. Data dependence: The phsycial structure of the data files and records are defined in the application code, which can make it difficult to change their structure, without changing many different application programs.
        1. For example, if the PropertyForRent address field is changed from 40 to 50 characters, then all programs that use a file which contains the PropertyForRent field will need to be updated.
        2. Additionally, the DP department must write a program to transform each file that contains the PropertyForRent field to the new format.
      5. Fixed queries/proliferation of application programs: Each new report that an employee requires will necessitate the creation of a new program to produce the appropriate queries. The employee will have to wait for some period of time while the request for the program is processed and someone from DP writes the program. Hence there is a lack of flexibility associated with file systems and a lack of responsiveness.
      6. Inability to generate timely reports
        1. A file system requires a user to have excellent programming skills and a detailed picture of how the corporation's different file systems are laid out in order to create a report.
        2. Alternatively, the user has to rely on the DP department to write a program, which can be time consuming to wait for
        3. In either case it is not possible to quickly write one-off programs to create on-the-spot reports
  4. Advantages and Disadvantages of DBMSs
    1. Advantages
      1. Flexibility for the End User (the book calls this "More information from the same amount of data"): The big advantage of DBMSs is that they provide a declarative way to specify queries, so that users can quickly specify the data they want, without specifying how to obtain it. For example, to obtain the information about the names of the clients that live in Joe Keogh's properties, an employee could write the following query:
        SELECT Client.fName, Client.lName 
            FROM Client, PrivateOwner, PropertyForRent, Lease
            WHERE PrivateOwner.lName = Keogh AND PrivateOwner.fName = Joe
               AND PrivateOwner.ownerNo = PropertyForRent.ownerNo
               AND PropertyForRent.propertyNo = Lease.propertyNo
               AND Lease.clientNo = Client.clientNo
        
        1. The DBMS would be responsible for translating the query into a series of operations for accessing the information and returning it to the user.
        2. Note that the user only needs to know about the logical organization of the data. The user does not need to know anything about the physical organization of the data.
        3. This means that the user can write the query and immediately obtain the report, rather than having to place a request with DP to write a program to create the report.
      2. Control of data redundancy: Databases provide ways to "decompose" files so that similar information gets placed in one file. For example, all the information about a property being rented could be placed in a single file, rather than in two different files in Sales and Contracts.
      3. Data integrity: By eliminating redundancy, the risk of inconsistent data is reduced.
      4. Sharing of data: The corporate data is kept in a centralized repository, which makes it easier to cross reference data that used to be kept in separate departments (such as the earlier query involving the clients living in Joe Keogh properties).
      5. Decreased maintenance costs
        1. Since physical structure is not hard-wired in application programs, it is easier to change data without changing the underlying programs (a program should use a "catalog" to look up the size of a field rather than hard coding it).
        2. Backups are easier because data is centralized
        3. Security is easier to enforce because file access goes through one central location
        4. Concurrency is easier to control because file access goes through one central location and is less ad-hoc than when many different independently written programs are trying to access the same files simultaneously
    2. Disadvantages
      1. Complexity and Cost: Because of all the services they provide, DBMSs are very complicated pieces of software that typically cannot be written by a single DP department.
        1. Requires that you purchase the DBMS from a vendor, which makes it more difficult to tailor the system to your needs
        2. May require yearly licencing fees
      2. Size and Hardware: The complexity and size of a DBMS means that it is a large piece of software that may occupy many megabytes of disk space and require substantial amounts of memory to run efficiently. A set of custom-tailored programs written by DP might require much less disk storage and main memory.
      3. Performance: Hand written code for a file-based system will often out perform a more general DBMS which may not have the same domain specific information available for optimizing the query.
      4. Greater impact of a failure: Because the data is centralized, a hardware or software failure may have a greater impact because all data operations will come to a halt.