Physical Database Design

Goals of Physical Design
1. Map relations into files on the file system
2. Select indices to aid the query processor
Basic File Organization on Disk
1. Block storage
2. Elements of access time for hard drive
  1. Seek time: time to move mechanical read/write arm to the appropriate track
  2. Rotational latency: time for appropriate sector of track to rotate under the read/write arm
  3. 2021 seek times (source: wikipedia): Random access times have not improved significantly in recent years
    1. 3-4 ms for high end (3000-4000 microseconds)
    2. 9 ms for average desk top
    3. 12-15 ms for average mobile device or laptop
  4. 2021 rotational latency times: 2-7.15ms (source: wikipedia)
  5. 2021 transfer rates:
    1. consumer grade hard disk drive: 200 MB/second for both read and write (source: User Benchmark).
    2. enterprise grade hard disk drive: 200-300MB/second for both read and write
3. Solid State Devices (SSD): Uses persistent solid state memory to store data.
  1. Cost: About 24 cents per gigabyte in 2021, which is about 12 times the cost of a gigabyte for a hard disk drive (source: course zybook).
  2. Access times for Solid State Device (SSD) (source: course zybook)
    1. < 0.02 to 0.1ms seek time (no latency because no rotation)
    2. transfer rates
      1. 200-3000MB per second
      2. write times for less expensive SSD's can be 10 times slower than read rates
      3. write times for higher end SSD's is comparable to read times
Basic Organizations
1. Heap (unordered)
2. B+ tree
3. Extendible Hash
Indices
1. Importance of a cluster key: improves performance of range queries
2. Types of indices
  1. Primary Index: File is physically clustered using the primary key
  2. Cluster Index: File is physically clustered on a key other than the primary key--Often you do not cluster on the primary key if it is an artificial key since an artificial key is unlikely to be used in range queries
  3. Secondary Index: Index on a non-clustered attribute(s) that helps the query optimizer more efficienty locate records
3. Physical Implementation of Indices
  1. Sparse versus Dense
    1. Sparse: One index entry per block--useful for primary and cluster index files
    2. Dense: One index entry per row--necessary for secondary indices
  2. Types of implementation
    1. Single-level (i.e., no tree structure): Can be searched using binary search or interpolation search. Useful for small indices that are rarely updated
    2. B+-tree: Typically used for cluster/primary key indices
    3. Extendible hash table: Typically used for secondary key indices
4. Some queries, especially aggregate queries, might be solvable from the index alone
Creating Indices in MySQL
1. Syntax: CREATE INDEX indexName ON tableName(column1, column2, ..., column_n)
2. Example: CREATE INDEX PostalCodeIndex ON Address (PostalCode);
Considerations in Choosing Indices
1. Things to Index
  1. Index PK of a relation if it is not a cluster key of the file organization.
  2. Add secondary index to a FK since it is involved in joins
  3. Add secondary index to any attribute heavily used in selection queries
  4. Add secondary index on attributes involved in:
    1. ORDER BY;
    2. GROUP BY; and
    3. other operations involving sorting (such as UNION or DISTINCT).
  5. Add secondary index on attributes involved in aggregate functions.
  6. Add secondary index on attributes that could result in an index-only plan.
2. Things Not to Index
  1. small relations.
  2. an attribute or relation that is frequently updated.
  3. an attribute if the query will retrieve a significant proportion of the relation.
    1. If E[#records/block] << 1, then on average only a few blocks will be retrieved and the index is worth it
    2. If E[#records/block] is roughly 1, then on average you will access every block in the file and the index is a waste of time
  4. Avoid indexing attributes that consist of long character strings.
Transactions To Analyze
1. It's difficult to analyze all transactions so try to concentrate on a few that will be frequently executed
2. MySQL has a slow query log and an EXPLAIN statement that can help identify slow queries and show why they are slow.