Physical Database Design
- Goals of Physical Design
- Map relations into files on the file system
- Select indices to aid the query processor
- Basic File Organization on Disk
- Block storage
- Elements of access time for hard drive
- Seek time: time to move mechanical read/write arm to the appropriate
- Rotational latency: time for appropriate sector of track to
rotate under the read/write arm
- 2017 seek times (source: wikipedia): Random access times have not improved
significantly in recent years
- 3-4 ms for high end (3000-4000 microseconds)
- 9 ms for average desk top
- 12-15 ms for average mobile device or laptop
- 2017 rotational latency times: 2-7.15ms (source: wikipedia)
- 2017 transfer rates:
- consumer grade hard disk drive: 200
MB/second for both read and write (source: User Benchmark).
- enterprise grade hard disk drive: 200-300MB/second for both
read and write (source: Tom's IT Pro.
- Solid State Devices (SSD): Uses persistent solid state memory to
- Cost: About 24 cents per gigabyte in 2017, which is about 4
times the cost of a gigabyte for a hard disk drive.
- Access times for Solid State Device (SSD) (source: wikipedia).
- < 0.1ms seek time (no latency because no rotation)
- transfer rates
- 200-2500MB per second
- write times for less expensive SSD's can be 10 times slower than
- write times for higher end SSD's is comparable to read times
- Basic Organizations
- Heap (unordered)
- B+ tree
- Extendible Hash
- Importance of a cluster key: improves performance of range queries
- Types of indices
- Primary Index: File is physically clustered using the primary
- Cluster Index: File is physically clustered on a key other
than the primary key--Often you do not cluster on the primary
key if it is an artificial key since an artificial key is
unlikely to be used in range queries
- Secondary Index: Index on a non-clustered attribute(s) that
helps the query optimizer more efficienty locate records
- Some queries, especially aggregate queries, might be solvable
from the index alone
- Considerations in Choosing Indices
- Things to Index
- Index PK of a relation if it is not a cluster key of the file organization.
- Add secondary index to a FK since it is involved in joins
- Add secondary index to any attribute heavily used in selection queries
- Add secondary index on attributes involved in:
- ORDER BY;
- GROUP BY; and
- other operations involving sorting (such as UNION or DISTINCT).
- Add secondary index on attributes involved in aggregate functions.
- Add secondary index on attributes that could result in an index-only plan.
- Things Not to Index
- small relations.
- an attribute or relation that is frequently updated.
- an attribute if the query will retrieve a significant proportion of the relation.
- If E[#records/block] << 1, then on average only a few
blocks will be retrieved and the index is worth it
- If E[#records/block] is roughly 1, then on average you will
access every block in the file and the index is a waste of time
- Avoid indexing attributes that consist of long character strings.
- Transactions To Analyze
- It's difficult to analyze all transactions so try to concentrate on a few
that will be frequently executed