Information Management

Brad Vander Zanden


  1. Definition (from Wikipedia): The acquisition of information from one or more sources, the custodianship and the distribution of that information to those who need it, and its ultimate disposition through archiving or deletion.
  2. What comprises information management?
    1. Content capture: The process of obtaining information and converting it to a storable format. Content capture may involve capturing information via sensors or other devices.
    2. Content management/storage: The process of maintaining information for an indeterminate amount of time in a manner that facilitates analysis and navigation.
    3. Content analysis/navigation: The process of interpreting information to provide insight or to make decisions.
    4. Content presentation: The process of communicating information via visual, audio, or other sensory techniques.
  3. Content capture: The process of obtaining information and converting it to a storable format. Content capture may involve capturing information via sensors or other devices.
    1. Digitization: Much information, such as sound or video, comes in an analog format but must be converted to a digital format in order to be stored. There are many different formats used to store information and these are discussed in storage formats.
    2. sampling: When you convert an analog signal to a digital signal, you must sample the signal using a certain frequency. For example, with video you might sample 30 times per second, thus getting 30 frames per second.
      1. The goal is to create a discretized signal that is not perceptibly different than the analog signal
      2. Examples
        1. The Library of Congress suggests sampling at 300 dots per inch for old photographs.
        2. 30 frames per second works well in video if the scene is relatively static and 60 frames per second is better for more dynamic scenes. Smart phones and cameras can take video at wither 30 or 60 fps.
        3. Most movies are recorded at 24 frames per second.
        4. Most TV shows are recorded at 30 frames per second, although traditionally TVs have refreshed at 60 frames per second and more modern TVs may refresh at 120 or even 240 frames per second. Hertz, Hz, is the unit of measured used for TVs, rather than frames per second.
        5. 44.1kHz and 48kHz are the audio sampling rates most frequently used by professional musicians.
    3. compression: Often the file size will still be too big after sampling and so you will employ a compression algorithm to either eliminate redundant information (lossless compression) or eliminate information deemed less significant (lossy compression).
    4. transformation/translation: Sometimes information may be captured in one digital format but must be transformed to another digital format before it is usable. For example, a document might be created using Open Office but then must be transformed from .odt (open office format) to .doc (Microsoft format) format so that it can be edited in Microsoft Word.
    5. migration: Migration is the act of transforming information to a new technology. transformation/translation would be conversion between similar technologies. Migration occurs when you need to update the format. For example, I find that many postscript files I have must be updated to .pdf files if I want to use them. As another example, you might want to migrate a .mpeg1 or .mpeg2 file to the newer .mp4 video standard.
    6. crawling/harvesting: Crawling/harvesting is the act of summarizing an information data set, often by indexing it. Google uses a crawler to visit web-sites, extract information from their meta-data, and then index it for use by their search engine. This step can also be considered as a form of content analysis. When a company is using crawling strategies to obtain information, as Google does, then crawling is a content capture strategy. When a company already has the information and is using crawling/harvesting to summarize and/or index the information, then crawling/harvesting is a content analysis strategy.
  4. Content management/storage: The process of maintaining information for an indeterminate amount of time in a manner that facilitates analysis and navigation.
    1. Local versus Cloud Storage
      1. Local: Information is stored on your own personal devices
        1. Advantages
          1. Secure
          2. Rapidly accessible
        2. Disadvantages
          1. May be hard to share
          2. More expensive
          3. You have to deal with migrating the information when upgrading to new devices.
      2. Cloud Storage: Information is stored remotely and managed for you by a company on its data servers.
        1. Advantages
          1. Tends to be cheaper than personal devices on a per gigabyte measure.
          2. Easier to share with others or to use your information on multiple personal devices.
        2. Disadvantages
          1. May be less secure, both because the data is available over the network and hence is subject to attacks by hackers and because the data may be requisitioned by governments.
          2. Access may be less rapid because it is over the network and may even be unavailable when you do not have network access.
    2. Compression: Many storage formats use some form of compression to reduce the storage footprint required by their data.
      1. Lossless versus Lossy
        1. Lossless: the image is made smaller, but at no detriment to the quality, by eliminating redundant information. For example, suppose there is a string of 100 0's. That string could be replaced by a significantly smaller code indicating that the next 100 bits are all 0's.
        2. Lossy: the image is made even smaller, but at a detriment to the quality.
          1. Tries to drop information that is considered less significant. For example, image compression tries to drop color information that can't be detected by the human eye
          2. Repeatedly saving an image in a lossy format will progressively degrade the data quality
        3. Codec versus Container: You will often hear these two terms used when talking about compression.
          1. Codec: The algorithm used for encoding/decoding the data. The encoding often involves some type of compression.
          2. Container: The file(s) holding the data produced by the codec.
    3. Storage formats
      1. Plain text: Text files do not often come with encoding tables so programs typically make educated guesses as to the character encoding being used.
        1. Ascii: 7-bit, 128 character code developed at Bell Laboratories. Prevalent in early computer and file systems. Was the most common encoding system on the World Wide Web until December 2007 when it was surpassed by UTF-8 (according to Wikipedia). Using 7 bits rather than 8 bits was meant to reduce data transmission costs, not provide a bit for parity.
          1. Advantage: Compact--characters take only 1 byte
          2. Disadvantage: Only encodes English character and punctuation set.
        2. Unicode: Developed in the early 1990's to support multi-lingual character sets. Since expanded to also handle graphemes, such as Egyptian hieroglyphs, rare Kanji or Chinese characters, and emojis. A grapheme is a sequence of one or more code points that are displayed as a single, graphical unit that a reader recognizes as a single element of the writing system (e.g., ä).
          1. Standard
            1. defines a codespace of 1,114,112 codepoints in the range 0x0 to 0x10FFFF. The standard uses codepoint rather than "character" because a code point could be a character or grapheme.
            2. The first 128 codepoints are identical to the 128 Ascii character codes.
          2. Encoding: Various standards are used to encode Unicode characters. The simplest encoding is to simply use 4 bytes (although 3 would suffice) but this is wasteful since the codepoints for the most commonly used characters only require 1-2 bytes.

            1. UTF-8: The most common encoding, which is used for most web-pages, html, XML, etc. It can represent a character code using 1-4 bytes. The first bits of the first byte allows the remaining bytes to be decoded (x's denote the bits used for the character code):
              1. 1 byte: The encoding is 0xxxxxxx
              2. 2 bytes: The encoding is 110xxxxx 10xxxxxx
              3. 3 bytes: The encoding is
                1110xxxx 	10xxxxxx 	10xxxxxx
                			      
              4. 4 bytes: The encoding is
                11110xxx 	10xxxxxx 	10xxxxxx 	10xxxxxx	
                			      
              Notice how the start of the first byte always allows a decoder to determine how many bytes will be used to encode this character code. Note that all Ascii codes can be encoded in a single byte, which is much more compact than using 4 bytes.
            2. UTF-16: Uses 1 or 2 16-bit (i.e., 2 byte) code units to denote Unicode character codes. Used internally in Windows, Java, and Javascript but never caught on as an encoding for web pages.
            3. UTF-32: The simple 32 bit encoding of a unicode character. It is very time efficient because it requires no decoding but is very space inefficient and hence is rarely used.
      2. Documents: Formats that encode documents must be able to store text, images, and graphics as well as specify how the document will appear when printed or displayed in a browser. The most common document formats are:
        1. PDF: Developed by Adobe prior to the advent of the internet. It is the most popular format for exchanging printable documents.
          1. All computations required to create the document have already been performed. The document is in its final printable format. As a result pdf documents originally were not meant to be modified once created.
          2. More recent pdf documents have type-in text boxes or other types of widgets that allow small changes to be made once the file has been created.
        2. HTML: Document is described using a variety of markup commands.
          1. Most widely used document format for the web
          2. Cascading style sheets (css) are used to specify the presentation of the document
          3. Easily modified with a text editor
          4. HTML documents are meant to be displayed in a browser and not printed. CSS does not support page layout algorithms for HTML, which makes HTML a poor choice for printed documents.
        3. JSON: An open-source standard used for transmitting human-readable text using attribute-value pairs and arrays. For example:
          {
            "name" : "brad",
            "age": 54,
            "hobbies": ["hiking","climbing","bridge","golf","travel"]
          }
          		
          Strictly speaking JSON is not a document encoding format but I had to put it somewhere because it has become so popular as a way of storing and transmitting data over the internet.
        4. DOC: A proprietary document format created by Microsoft.
          1. Most widely used format for documents that need to be edited and then printed.
        5. RTF (Rich Text Format): A format developed by Microsoft to allow cross-platform document exchange.
          1. RTF is typically not a final format for a document but an intermediate format that is used to transfer the document to a new platform, such as from Microsoft Word to OpenOffice or Apple Pages
        6. XML (from wikipedia): A markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable
          1. HTML is an XML-based format
          2. Can be rendered into many forms, either printable or browsable, using XSL (Extensible Stylesheet Language)
          3. RSS (Rich Site Summary) is a common format used by news organizations to organize and store news content
          4. Many hope that it will become the future standard for technical file formats (source is wikipedia), but I have my doubts. XML seemed to be more popular a few years back, but I do not hear about it as much any more. XML is indeed flexible as a way of specifying arbitrary document formats that can be type-checked and parsed, but it is also heavy-weight and cumbersome. JSON seems to be supplanting it as the preferred format for storing and transmitting data.
        7. Open Document (.odt): An open source, XML-based standard for Open Office documents. Open Office is an open source suite of programs for editing documents, spreadsheets, and slide presentations that was meant as a competitor to proprietary systems such as Microsoft and Apple.
        8. postscript (.ps): An older format that is more of a legacy format these days. It is actually a programming language and .ps files are programs that when executed create a formatted, printable document
          1. The different between a postscript file and a pdf file is that the postscript file is not in its finished form while a pdf file is.
          2. postscript was developed when CPU's were much slower and printing would waste valuable CPU time. Printers were equipped with CPU's that could interpret postscript programs and thus offload the computation from CPU's. As CPU's became much more powerful, it became unnecessary to offload the print computations and postscript has been largely relegated to legacy code.
        9. .xls: A more specialized proprietary format from Microsoft for manipulating spreadsheets.
          1. .csv (comma separated values): An intermediate format, much like RTF, but used for transferring spreadsheet style data to different platforms
        10. .ppt: A more specialized proprietary format from Microsoft for manipulating documents meant for presentation, such as slide shows
      3. Images: Images are objects like bitmaps used for animations, photos, scanned documents, etc
        1. Key factors in storing digital images
          1. Compression
          2. Color Model
            1. Bitmap: 1 bit per pixel which is either on or off. Good for black and white images
            2. Gray-Scale: Several bits per image indicating a shade of grayness between white and black. NeXT was an early example of a gray-scale system.
            3. Full Color: Bits are used to specify color values for red, green, and blue. The combination of these color values produces the desired color. Most existing systems allow 8 bits per color for a total of 3 bytes per color.
            4. Color Map: Color values are an index into a table of fully specified (i.e. 24 bit) colors. Typically 8 bits are allocated for the color value giving a possibility of 256 colors.
        2. Common image formats
          1. JPEG: A lossy, full color compression format that was designed for storing photographs. Not good for storing graphics since the lossiness causes the graphics to look too bitmappy (e.g., a smooth line may look jagged)
          2. GIF: An older lossless, color map, compression format that was designed for storing graphics. The patent for GIF was held for many years by CompuServe, its developer.
            1. Not good for storing photographs because of the limited colors a gif file can store.
            2. Can created animated GIF files that perform animations using multiple images.
            3. Not good for graphics involving color gradations because of the limited colors available. It's typically best to have shapes with solid colors.
          3. PNG: A newer, open source replacement for GIF developed in part to avoid paying fees to CompuServe for its GIF format. It is a full color, lossless compression format for storing images. It produces larger files than jpeg files so it is not frequently used for storing photographs.
            1. Good for storing graphics files, just like GIF
            2. It is a good format to use for storing screenshots, which involve a mix of images and text (note that screenshots have a much lower resolution than photos so the files will not be overly large).
          4. TIFF: A non-compressed, lossless, full color format that is good for storing documents or photos while they are being processed using software.
            1. TIFF images create very large files because they are uncompressed and lossless.
            2. TIFF files are good for storing multiple-page documents
            3. TIFF files are also good for storing photos or documents while they are being processed by photo or page layout software. The output can be stored in more compact formats, such as jpeg or pdf.
      4. Video: Generally stored using a lossy compression technique.
        1. MP4: Latest of the mpeg standards (superceding mpeg-1 and mpeg-2). It is a high compression format meant for low bandwidths and as a result its image quality is lower than other formats. It is widely used on social media sites and was popularized by Apples iTunes store.
        2. MOV: Apple's QuickTime Movie format which is essentially superceded by MP4. Both MP4 and MOV uses the same compression algorithm and hence produce the same video quality.
        3. AVI (Audio Video Interleave): Developed by Microsoft in the 1990's for storing movies. It is a bit dated and more modern file types have better compression but it is still widely used.
        4. WMV (Windows Media Video): A Microsoft standard that was meant for streaming video but variations now also support movies. It was meant as a replacement for AVI. It has excellent compression but as a result has worse video quality than AVI.
        5. MKV (Matroska): An open-source format for shows and movies. Popular for encoding because it is free but less popular for display on certain devices because, ta da!, the devices are proprietary and want to support their own proprietary formats.
        6. WEBM: Meant to be used for web-based videos using HTML5. It produces very small video sizes and hence very low load times. Of course the trade-off is less video quality. It has become less popular as online platforms develop more computational power and bandwidth, thus allowing for larger video files with better image quality.
        7. FLV (Flash Video Format): Adobe's standard for transmitting video files and displaying them using its flash player. One major downside is that Apple no longer supports Flash videos.
      5. Audio Files
        1. MP3: Most widely-used format in use today. Uses a lossy compression algorithm and generates small file sizes. It's popularity is due to the excellent audio quality it achieves.
        2. WAV: Uncompressed format developed by Microsoft. Widely used with no loss of sound quality but very large files (up to 10MB per minute). Variations exist which support compression but uncompressed files seem to be most common.
        3. WebM: Open-source standard meant for use with HTML 5. Supported by most Android and Windows platforms but not by iOS. Quicktime, Apple's proprietary video playing software, can only play it through a third party plugin.
    4. Ways to store content
      1. Databases: A structured collection of related data and its description. A database is controlled by software that manages and controls access to the database. Database management software (DBMS) typically provides the following functions:
        1. Data Definition Language (DDL): Defines the structure of the database by allowing users to specify the tables, record fields and data types, and constraints on the data to be stored in the database (e.g., no salary can be more than $40,000 or no property manager can manage more than 100 properties).
        2. Data Manipulation Language (DML): A query language that allows users to insert, update, delete, and retrieve data from the database
        3. Access control
          1. security: prevents unauthorized users from accessing the database
          2. integrity: maintains the consistency of stored data (e.g., a person's salary will not appear differently in two different parts of the database).
          3. concurrency: allows multiple users to use the database simultaneously, without introducing integrity problems
          4. recovery control: restores the database to a previous consistent state following a hardware or software failure
          5. user-accessible catalog: contains descriptions of the data in the database
          6. views: allows each user to have an individualized view of the database
            1. provide security: can limit fields that are accessable by certain classes of users
            2. customization: fields can be given names that are more meaningful for a certain group of users
            3. data abstraction: hides changes to the database that do not affect the data presented by this view
            4. reduces complexity: users only needs to know about data that they care about
      2. Document collections: A set of inter-related, but independent documents that are managed jointly by software.
        1. nosql databases, such as MongoDB, are one example of the type of software used to store collections of documents and provide query languages for retrieving subsets of information from these documents. MongoDB stores documents in JSON format.
        2. Social media companies, such as Facebook, Twitter, and Instagram, may be loosely considered as document collection companies.
        3. Collections of XML documents, such as the RSS documents created by news organizations, are another example of document collections. These collections do not have to be handled by nosql databases--they could be handled by customized software.
      3. Digital libraries (definition from Wikipedia): A digital library is an online database of digital objects that can include text, still images, audio, video, or other digital media formats.
        1. Objects can consist of digitized content like print or photographs, as well as originally produced digital content like word processor files or social media posts.
        2. Digital libraries should provide an electronic catalog that provides a means for organizing, searching, and retrieving the content contained in the collection.
        3. A digital library has the notion of curators, which means an individual or group of individuals who actively select, organize, and look after items in the library.
        4. Good examples of digital libraries are the ACM digital library and the IEEE Xplore digital library, both of which contain collections of their technical publications.
        5. The distinction between a document collection and a digital library is that a digital library tends to be more actively curated and is available to the general public while a document collection tends to be managed more by software and is more likely to be managed by an organization for proprietary reasons. However, the distinction between these two terms is a blurry one. For example, many social media companies have been forced to adopt curation practices in response to outrage over some of the material published on their platforms.
      4. Repositories: A repository is a collection of information published by a single organization. A good example would be the software repositories published by many companies and organizations.
        1. Typically open-access as a way of getting the greatest possible dissemination of information.
        2. Typically not currated as actively as a digital library. A digital library is much more likely to try to extract meta-data from the collection and use that as an aid to users. Hence repositories are often a more "raw" form of information than digital libraries.
      5. Hypertext: Hypertext uses a graphical interface with links that allow users to navigate to associated topics/graphics by pointing and clicking on appropriate links.
        1. The most commonly used hypertext is HTML and other markup languages that are used in conjunction with web browsers.
      6. Spreadsheets: An electronic document in which data is arranged in the rows and columns of a grid and can be manipulated and used in calculations.
        1. Probably the most widely used format for modeling and analyzing information.
    5. Physical implementation:
      1. Types of storage devices
        1. Disks or solid state memory are typically used for permanent storage of data.
        2. USB devices are typically used to temporarily store data or to transport it between devices.
        3. Tapes are typically used to store information for archival reasons when it no longer needs to be actively accessed. Backup storage is often done on tapes.
      2. Basic File Organization on Disk
        1. Block storage
        2. Elements of access time for hard drive

          1. Seek time: time to move mechanical read/write arm to the appropriate track
          2. Rotational latency: time for appropriate sector of track to rotate under the read/write arm
          3. 2017 seek times (source: wikipedia): Random access times have not improved significantly in recent years

            1. 3-4 ms for high end (3000-4000 microseconds)
            2. 9 ms for average desk top
            3. 12-15 ms for average mobile device or laptop

          4. 2017 rotational latency times: 2-7.15ms (source: wikipedia)
          5. 2017 transfer rates:

            1. consumer grade hard disk drive: 200 MB/second for both read and write (source: User Benchmark).
            2. enterprise grade hard disk drive: 200-300MB/second for both read and write (source: Tom's IT Pro).

        3. Solid State Devices (SSD): Uses persistent solid state memory to store data.
          1. Cost: About 24 cents per gigabyte in 2017, which is about 4 times the cost of a gigabyte for a hard disk drive.
          2. Access times for Solid State Device (SSD) (source: wikipedia).

            1. < 0.1ms seek time (no latency because no rotation)
            2. transfer rates
              1. 200-2500MB per second
              2. write times for less expensive SSD's can be 10 times slower than read rates (writes require more energy because they need to flip the polarity of a bit, which means enough energy to overcome the current polarity and reverse it).
              3. write times for higher end SSD's is comparable to read times

      3. Basic Organizations

        1. Heap (unordered)
        2. B+ tree
        3. Extendible Hash

      4. Indices

        1. Types of indices

          1. Primary Index: File is physically clustered using the primary key
          2. Cluster Index: File is physically clustered on a key other than the primary key--Often you do not cluster on the primary key if it is an artificial key since an artificial key is unlikely to be used in range queries
            1. Importance of a cluster key: improves performance of range queries (e.g., find all people who were hired between Jan. 1, 2018 and Dec. 31, 2018).
          3. Secondary Index: Index on a non-clustered attribute(s) that helps the query optimizer more efficienty locate records
            1. Secondary indices are good for point queries that ask for a single value (e.g., find all peiple who were hired on Feb. 1, 2018).
        2. Some queries, especially aggregate queries, might be solvable from the index alone. For example, if you ask for the total amount of salaries paid by your company, you could answer that by looking at an index on the salary field, if one exists.
    6. Data modeling: The analysis of data objects and their relationships to other data objects.
      1. Data modeling typically involves identifying
        1. Entities: A group of objects with similar properties that have an independent existence, such as employees, offices, or books.
        2. Attributes: The properties of an entity. For a book they might be the book's title, author, publisher, page count, date of publication, and ISBN number.
        3. Relationships among entities: Some type of association among entities. The relationship can usually be expressed as a sentence, with the entities being the nouns and the relationship being the verb. For example, a "library branch stores books". stores is the relationship between library branch and books. As another example, "an office has staff" is a relationship between an office and its staff.
      2. Example Data Models
        1. Databases
          1. Entities are relations
          2. Relationships may be represented as relations or foreign keys (more on this when we talk about databases)
        2. Spreadsheets
          1. Entities are typically rows or columns
          2. Relationships are the cells (e.g, Brad scored a 95 on the midterm) or equations among attributes
        3. Hypertext
          1. Entities are the web-pages
          2. Relationships are the links
  5. Content analysis/navigation: The process of interpreting information to provide insight or to make decisions.
    1. Navigation
      1. Query languages for databases
      2. Links/browsing for hypertext
      3. Search/Filters for document collections, digital libraries, and repositories
    2. Content Analysis
      1. indexing: We typically create indices to help us group information into logically organized units. Examples include:
        1. library catalogs that organize book collections
        2. tables of contents in documents that divide content into chapters and sections
        3. search engines attempt to use crawling algorithms to index content on the web and then use these indices to answer user queries.
      2. data mining/pattern recognition (from Wikipedia): the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.
      3. constraint solving: In spreadsheets content analysis involves solving the equations that users have created.
  6. Content presentation: The process of communicating information via visual, audio, or other sensory techniques.
    1. Things to consider in content presentation
      1. We gain the majority of our sensory perception through the human eye. Roughly speaking, our visual (eye) system can process an order of magnitude more information than our auditory (ear) system, which in turn can process an order of magnitude more information than our sense of smell system (nose). As a result most content is presented visually, with music and radio being two very obvious exceptions. From a computer science perspective, we are most interested in how information is presented via computing devices, and most devices present it visually.
      2. Human Visual Properties
        1. Update Rates
          1. 20-30 frames per second is required to gain the illusion of smooth, continuous motion. Since human perception varies, 30 frames per second is the figure to shoot for.
          2. 5 frames per second is the lower acceptable bound for dragging interactions
          3. Non-continuous interactions
            1. 1-2 second delays are acceptable for retrieving information or finding information in a document
            2. After 10-15 seconds the illusion of interactivity is lost
            3. Percent-done indicators or some other indicator that gives an estimate of remaining time or at least an indication that something is happening are useful when a delay is necessary
        2. Color Sensitivity: The human eye consists of rods, which are color-agnostic but respond to intensity (i.e., brightness), and cones, which respond to both color and intensity. However, rods are about 100 times more sensitive to intensity than cones, and rods are about 20 times more numerous (about 94 million rods to 4.5 million cones).
          1. Human retina can only distinguish about 64-128 shades of hue (a hue is a single color)

            ==> we can represent all of the colors that can be sensed by the retina using 6-7 bits for each primary color

          2. Human ocular system can sense somewhat more shades. 256 levels of shade for each primary color is more than adequate for human visual needs which is why computer displays represent colors with 24 bits.
          3. Humans are 10 times more sensitive to variations in intensity than they are to variations in hue or the color wavelength of light (this has been shown experimentally but follows from the greater number of rods than cones and the fact that rods are far more sensitive to intensity than cones).

            ==> contrasting intensities of colors should be carefully considered when superimposing colors on a display (e.g., dark on light or vice versa works well).

          4. Humans are much less capable of distinguishing intensity at the fringes of the spectrum (deep red and violet) than in the center where the yellows and greens are found.

            ==> The reds and blues are harder for the eye to resolve than greens and yellows

          5. About 1% of the population is color blind to one or more of the primary colors--this does not affect the ability to sense intensity.

            ==> discrimination of colors should *not* be based solely on hue--also use intensity to discriminate colors

        3. Motion and Animation: The human eye is strongly attracted to motion which is why animation is so powerful. This attraction is so powerful that if you have multiple moving objects on a display, the eye will unconsciously cycle among these moving objects, thus distracting your brain. As a result, if you are trying to present information it is best to:

          1. only have one animation running at a time
          2. use animation to attract the eye to a particular part of the display but then pause the animation if the animation is not required in order for the user to understand the presentation.
      3. Color Models
        1. RGB Model: Colors formed from the additive primaries of red, green and blue.
          1. This is the model for generated light, such as from the sun or from a computer screen.
          2. This model is used most often by drawing packages
        2. HSV Model: Colors formed from hue, saturation, and value
          1. Hue = primary wavelength of light
          2. Saturation = pureness of the primary wavelength-high saturation means a relatively pure color whereas low saturation means a lot of white or gray is mixed in
          3. Value = intensity or brightness
          4. This model is what most "ordinary" people are used to.
        3. CMY Model: Colors defined by the subtractive primaries of cyan, magenta, and yellow
          1. This is the model for reflected light, such as from a printed document or a painting.
          2. Each color corresponds to the absence of one of the additive primaries (since light is reflected, the color is what the eye sees after one of the additive primaries has been absorbed by the reflective surface).

            Example: Cyan is the absence of red, magenta is the absence of green, and yellow is the absence of blue

          3. This model is most familiar to people in the visual arts and is used by printers
      4. Presentation of Text: When presenting data, the primary consideration is the choice of the type of font:
        1. Serif Fonts: Fonts with curly decorations at the ends of the characters.
          1. Examples are Times New Roman and Cambria.
          2. The curly decorations help characters blend together, which makes them ideal for high resolution devices, such as printers. Hence serif fonts are more commonly used in printed documents.
        2. Sans serif fonts: Fonts that do not use curly decorations, or serifs, at the ends of the characters.
          1. Examples are Helvetica, Arial, and Calibri
          2. The lack of serifs provide good distinction between characters, which can prevent blurriness on low resolution devices, such as some computer screens. Hence sans serif fonts are more commonly used on computer screens, and should be used in preference to serif fonts on computer screens because you cannot guarantee that your output will be viewed on a high resolution screen.
        3. Typewriter fonts
          1. Fonts where each character has a fixed width.
          2. A common example is Courier
          3. Typically used to represent code in either printed documents or on a computer screen.
          4. Typewriter fonts are less aesthetically appealing than serif or sans-serif fonts, so they are rarely used outside of presenting computer code.
  7. Modes of Presentation
    1. Textual modes
      1. Tables (relational databases and spreadsheets)
      2. Hypertext (e.g., Google, Digital libraries, repositories)
    2. Information visualizations
      1. Graphs/Plots: Useful for numerical data
      2. Animations: Useful for dynamic systems
      3. Diagrams/figures: Useful for showing more static relationships