Physical Database Design
- Goals of Physical Design
- Map relations into files on the file system
- Select indices to aid the query processor
- Basic File Organization on Disk
- Block storage
- Elements of access time for hard drive
- Seek time: time to move mechanical read/write arm to the appropriate
track
- Rotational latency: time for appropriate sector of track to
rotate under the read/write arm
- 2021 seek times (source: wikipedia): Random access times have not improved
significantly in recent years
- 3-4 ms for high end (3000-4000 microseconds)
- 9 ms for average desk top
- 12-15 ms for average mobile device or laptop
- 2021 rotational latency times: 2-7.15ms (source: wikipedia)
- 2021 transfer rates:
- consumer grade hard disk drive: 200
MB/second for both read and write (source: User Benchmark).
- enterprise grade hard disk drive: 200-300MB/second for both
read and write
- Solid State Devices (SSD): Uses persistent solid state memory to
store data.
- Cost: About 24 cents per gigabyte in 2021, which is about 12
times the cost of a gigabyte for a hard disk drive (source: course zybook).
- Access times for Solid State Device (SSD) (source: course zybook)
- < 0.02 to 0.1ms seek time (no latency because no rotation)
- transfer rates
- 200-3000MB per second
- write times for less expensive SSD's can be 10 times slower than
read rates
- write times for higher end SSD's is comparable to read times
- Basic Organizations
- Heap (unordered)
- B+ tree
- Extendible Hash
- Indices
- Importance of a cluster key: improves performance of range queries
- Types of indices
- Primary Index: File is physically clustered using the primary
key
- Cluster Index: File is physically clustered on a key other
than the primary key--Often you do not cluster on the primary
key if it is an artificial key since an artificial key is
unlikely to be used in range queries
- Secondary Index: Index on a non-clustered attribute(s) that
helps the query optimizer more efficienty locate records
- Physical Implementation of Indices
- Sparse versus Dense
- Sparse: One index entry per block--useful for primary and
cluster index files
- Dense: One index entry per row--necessary for secondary
indices
- Types of implementation
- Single-level (i.e., no tree structure): Can be searched
using binary search or interpolation search. Useful
for small indices that are rarely updated
- B+-tree: Typically used for cluster/primary key indices
- Extendible hash table: Typically used for secondary key
indices
- Some queries, especially aggregate queries, might be solvable
from the index alone
- Creating Indices in MySQL
- Syntax: CREATE INDEX indexName ON tableName(column1, column2, ..., columnn)
- Example: CREATE INDEX PostalCodeIndex ON Address (PostalCode);
- Considerations in Choosing Indices
- Things to Index
- Index PK of a relation if it is not a cluster key of the file organization.
- Add secondary index to a FK since it is involved in joins
- Add secondary index to any attribute heavily used in selection queries
- Add secondary index on attributes involved in:
- ORDER BY;
- GROUP BY; and
- other operations involving sorting (such as UNION or DISTINCT).
- Add secondary index on attributes involved in aggregate functions.
- Add secondary index on attributes that could result in an index-only plan.
- Things Not to Index
- small relations.
- an attribute or relation that is frequently updated.
- an attribute if the query will retrieve a significant proportion of the relation.
- If E[#records/block] << 1, then on average only a few
blocks will be retrieved and the index is worth it
- If E[#records/block] is roughly 1, then on average you will
access every block in the file and the index is a waste of time
- Avoid indexing attributes that consist of long character strings.
- Transactions To Analyze
- It's difficult to analyze all transactions so try to concentrate on a few
that will be frequently executed
- MySQL has a slow query log and an EXPLAIN statement that can help identify slow queries and show why they are slow.