SQL Notes

SQL: Structured Query Language
1. many variations exist
2. mysql is a public domain version and one that we have running on the EECS servers
SQL is an example of a relational database
1. Relational databases are comprised of a collection of tables
2. Each table is comprised of a collection of named columns
3. Each column has a fixed type associated with it. Some examples include:
  1. int(size)
  2. decimal(size, d): size is the maximum number of digits and d is the number of decimal digits
  3. char(size): fixed length string--must have this number of characters
  4. varchar(size): variable length string--there are other data types that allow even larger size strings. Different implementations will specify different maximum sizes for varchar.
  5. date(yyymmdd): a date
4. Databases tradeoff speed for flexibility: Although the specifics differ, databases typically assume that each row will have the same set of columns and hence the data does not need to be tagged with markup elements. XML is more flexible in that each "row" can define different data but the data is identifiable to a processor via tags. However, this requires that an XML file contain not just data but also data definition commands. The data definition commands (i.e., markup) adds additional storage cost to a file and also makes it more difficult to optimize queries because the file structure is not completely known in advance.
Data Manipulation Language (DML)-Executes queries against a database
1. Select: extract data
2. Update: update a column in a row
3. Delete: delete a row from a table
4. Insert into: insert a new row into a table
Data Definition Language (DDL): Creates the structure of a table and deletes tables
1. Operation types
  1. Create Table: create a table
  2. Alter Table: add or delete a column from an existing table
  3. Drop Table: delete a table
  4. Create Index: creates a secondary index
  5. Drop Index: deletes an index
2. Indices and Keys
  1. primary key: A column which is guaranteed to have a unique value. When creating a table the designer can denote exactly one column as being the primary key
  2. clustering index: A clustering index is one in which the records (rows) of a table are stored in the leaves of the index (typically either a b-tree or extendible hash table is used).
    - The table is said to be physically sorted on this column
    - There can be only one clustering index per table
    - A clustering index is useful for range queries because several records can be usually stored in a single block, thus reducing the required number of block accesses.
  3. non-clustering index: An index consisting only of the keys in a column. It allows for fast random access on a single key but cannot be used to facilitate fast range queries.