Mongo DB


Overview

Mongo's principle design feature is that information associated with a document should be stored with a document rather than spread across tables.

  1. This principle eliminates expensive joins
  2. It works especially well for
    1. One to many relationships involving weak entities: As an example, you could store a list of sub-tasks for a project with the project. Each sub-task would be represented as a document. As another example, in our hotels scenario, you could store each hotel's room information with the appropriate hotel
    2. Multi-valued attributes: Multi-valued attributes can be stored as a list. For example, you might store a list of phone numbers, dependents, or keywords.
    3. Supertype/subtype entities: The objects can be intermixed in a list with a type field distinguishing them
    4. Composite attributes: Composite attributes can be stored as an object, such as an address, a date, or a name
    5. Tree structures (go to Mongo Data Models/Data Model Examples and Patterns/Model Tree Structures): When you have a taxonomy of objects, like categories of books or animals, and you want to be able to retrieve ancestor information.
  3. It does not work as well for modeling relationships between two strong entities where you want to be able to query both entities as stand-alone objects. For example, you could store a room's bookings with the rooms, but you also might want to store bookings with a guest and you might also want to access bookings as a standalone object. In this case you tend to use embedded references (i.e., foreign keys) and then you can use the $lookup aggregation operator to perform a left outer join between two collections, but that's not going to be as efficient as in a relational database.


Starting with Mongo

  1. https://docs.mongodb.com/manual/tutorial/insert-documents/ has an interactive mongo shell that you can use to try out the demo commands in these notes.

  2. Mongo stores documents in JSON format (on the disk they are stored in BSON format).
    1. A document is equivalent to a tuple
    2. A collection is equivalent to a table

  3. When you start with the mongo shell you can create a new database with the command:
    use <database-name>
    
    This command creates the db if it does not yet exist and establishes context for subsequent queries (Do not use this command in the sample Mongo DB shell).

  4. A collection is created the first time you insert a document into the collection:

    Here's an example insert command for a collection of inventory documents:

    db.inventory.insertMany( [
       { item: "journal", qty: 25, size: { h: 14, w: 21, uom: "cm" }, status: "A" },
       { item: "notebook", qty: 50, size: { h: 8.5, w: 11, uom: "in" }, status: "A" },
       { item: "paper", qty: 100, size: { h: 8.5, w: 11, uom: "in" }, status: "D" },
       { item: "planner", qty: 75, size: { h: 22.85, w: 30, uom: "cm" }, status: "D" },
       { item: "postcard", qty: 45, size: { h: 10, w: 15.25, uom: "cm" }, status: "A" }
    ]);
    
  5. You can execute a series of commands in the Mongo shell by putting your commands into a .js file (for javascript) and then loading them with the load command:
    load ("commands.js")
    


Find Operations

  1. The find() function is the workhorse command for querying a mongo database:
        
    db.collection-name.find(<query>, <projection>).cursor()
    
    1. Mongo uses a pipeline architecture where operations are performed sequentially and the output is passed as input to the next operation in the pipeline.
    2. Here the query is performed first, then the projection operation is performed on the result set, and finally the cursor operation is applied to the output of the projection
    3. Mongo uses a cursor to allow you to traverse through the results, although in practice you would use a scripting language to process the results rather than the shell so we won't worry much about cursor operations in this class. We won't bother with the cursor in this class but the cursor function can do things such as limit the number of returned documents.
    4. The query object is a JSON object of the form { ... }. For example:
      db.inventory.find( { status: "D" } )      
            

  2. Filter operations on top-level documents (use the example MongoDB shell to execute the sample queries)
    1. Relational operators are $eq, $gt, $gte, $in, $lt, $lte, $ne, and $nin
    2. Boolean operators are $and, $or, and $not
    3. $exists matches documents that have the specified attribute
    4. $type matches documents whose attribute have the specified type, such as boolean, string, double, date, array, etc.
    5. $regex provides regular expression capabilities for pattern matching strings in queries. MongoDB uses Perl compatible regular expressions that are placed in //'s.
    6. $text performs a text search on the content of the fields indexed with a text index.
    7. We won't discuss the $type, $regex, or $text operators any further
    8. A comma operates as an AND operator

    Here are some examples:

    1. Equality: { <field1>: <value1>, <field2> : <value2>, ...} -- Returns all documents which have field(s) whose value(s) matches each of the values in the list. For example:
       db.inventory.find( { status: "D" } )
       db.inventory.find( { status: "D", qty: 75 } )
      
      In SQL the second example would be equivalent to the query:
      SELECT * FROM inventory 
          WHERE status = "D" AND qty = 75;
      
    2. Queries that use one or more of the find operators have the form:
      { <field1>: { <operator1>: <value1> }, ... }
      
      1. For example:
        db.inventory.find( { status: { $in: [ "A", "D" ] } } )
        db.inventory.find( { status: "A", qty: { $gt: 10, $lt: 30 } } )
        
      2. An example with the OR operator--notice the use of a list ([]) here
        db.inventory.find( { $or: [ { status: "A" }, { qty: { $lt: 30 } } ] } )
        
        The equivalent SQL query is:
        SELECT * FROM inventory
            WHERE status = "A" OR qty < 30;
        
      3. An example with the $exists operator:
        db.inventory.find( { status: { $exists : true } } )
        
        This query returns all documents that have a status field
        db.inventory.find( { status: { $exists : false } } )
        
        This query returns all documents that do not have a status field

    3. Sorting: Add the .sort() function and use 1 to indicate an ascending field and -1 to indicate a descending field
             db.inventory.find({}).sort({ qty: 1 })
      
      sorts the final results in ascending order by qty
             db.inventory.find({}).sort({ status: 1, qty: -1 })
      
      sorts the final results in ascending order by status and in case of ties in descending order by qty

    4. Queries on attributes whose values are documents (e.g., a spouse attribute that has information about a spouse or a report attribute for a student that has information about the report such as the student's text, the grade, the TA who graded it, and TA comments).

      For the following example queries use the following collection:

      	   
      db.inventory.insertMany( [
         { item: "journal", qty: 25, size: { h: 14, w: 21, uom: "cm" }, status: "A" },
         { item: "notebook", qty: 50, size: { h: 8.5, w: 11, uom: "in" }, status: "A" },
         { item: "paper", qty: 100, size: { h: 8.5, w: 11, uom: "in" }, status: "D" },
         { item: "planner", qty: 75, size: { h: 22.85, w: 30, uom: "cm" }, status: "D" },
         { item: "postcard", qty: 45, size: { h: 10, w: 15.25, uom: "cm" }, status: "A" }
      ]);
      
      1. Equality query
        db.inventory.find( { "size.uom": "in" } )
        
        returns all documents whose size document has a "uom" attribute with the value of "in"

        • When querying using dot notation, the field and nested field must be inside quotation marks.

      2. AND query
        db.inventory.find( { "size.h": { $lt: 15 }, "size.uom": "in", status: "D" } )
        

    5. Queries on attributes whose values are arrays (i.e., multi-valued attributes): when you query an array, you will get all documents whose arrays match the query--the arrays that are returned with the documents will not be projected so that only elements satisfying the query are returned--all array elements will be returned. There are projection functions that can limit the number of returned array elements, but you cannot return exactly those array elements that match the query filter.

      Use the following collection to execute the example queries in this section:

      db.inventory.insertMany([
         { item: "journal", qty: 25, tags: ["blank", "red"], dim_cm: [ 14, 21 ] },
         { item: "notebook", qty: 50, tags: ["red", "blank"], dim_cm: [ 14, 21 ] },
         { item: "paper", qty: 100, tags: ["red", "blank", "plain"], dim_cm: [ 14, 21 ] },
         { item: "planner", qty: 75, tags: ["blank", "red"], dim_cm: [ 22.85, 30 ] },
         { item: "postcard", qty: 45, tags: ["blue"], dim_cm: [ 10, 15.25 ] }
      ]);
      
      1. Query if an array field contains at least one element with the specified value:
        	   
        	     db.inventory.find( { tags: "red" } )
        
      2. Query if an array field contains any of several values
        	  
        	     db.inventory.find( { tags: { $in : ["blank", "blue"] } } )
        
      3. Exact Match: order of tags is important and array must contain exactly those items db.inventory.find( { tags: ["red", "blank"] } ) Only the document with "notebook" gets returned.
      4. Exact Match: order of tags is unimportant and array must contain each of those items (it may contain additional items as well)
        	      
        	  db.inventory.find( { tags: { $all: ["red", "blank"] } } )
        
        All documents except "postcard" get returned.

      5. Query an Array with Multiple Filter Conditions and return all documents where the array values in combination satisfy the filter conditions
                  db.inventory.find( { dim_cm: { $gt: 15, $lt: 20 } } )
        
        Returns any documents where one array element is > 15 and one array element is < 20. All of the following arrays would work:

        • [ 10, 17 ] : 17 satisfies both of the conditions
        • [ 10, 25, 40]: 10 satisfies the < 20 condition and 25 and 40 both satisfy the > 15 condition

      6. Query for an Array Element that Meets Multiple Filter Conditions: Use $elemMatch if a single array element must meet all the filter conditions
                    db.inventory.find( { dim_cm: { $elemMatch: { $gt: 15, $lt: 20 } } } )
        
        Now [ 10, 17 ] satisfies the query but [10, 25, 40] does not

    6. Queries on an attribute that is an array of documents: You will get all documents where any document in the array matches the query. You cannot project out those documents in the array that do not match the query--you will get back all documents in the array. Mongo DB thinks that you want the documents associated with the array--not the elements in the array. You can use a scripting language to iterate through the array if you only want the documents that match the query.

      For the following example queries use the following collection:

      db.inventory.insertMany( [
         { item: "journal", instock: [ { warehouse: "A", qty: 5 }, { warehouse: "C", qty: 15 } ] },
         { item: "notebook", instock: [ { warehouse: "C", qty: 5 } ] },
         { item: "paper", instock: [ { warehouse: "A", qty: 60 }, { warehouse: "B", qty: 15 } ] },
         { item: "planner", instock: [ { warehouse: "A", qty: 40 }, { warehouse: "B", qty: 5 } ] },
         { item: "postcard", instock: [ { warehouse: "B", qty: 15 }, { warehouse: "C", qty: 35 } ] }
      ]);
      
      1. Query on a specific field: use dot notation with the name of the array and then the attribute in the embedded document:
                   db.inventory.find( { 'instock.qty': { $lte: 20 } } )
        
        This query return all documents where at least one of the instock objects has a qty field ≤ 20.

      2. Use $elemMatch if the query should succeed only if a nested document satisfies all of the query conditions
            db.inventory.find( { "instock": { $elemMatch: { qty: 5, warehouse: "A" } } } )
        
        This query matches any document where at least one of the instock objects has a qty of 5 and a warehouse equal to "A".

        Note that the above query is different than this query:

                 db.inventory.find( { "instock": { qty: 5, warehouse: "A" } }  )
        
        The above query only matches instock objects that have exactly these two fields, in exactly this order, and with exactly these values

        Another example, this time where we want any documents that have an instock object that matches the condition: 10 < qty ≤ 20

                 db.inventory.find( { "instock": { $elemMatch: { qty: { $gt: 10, $lte: 20 } } } } )
        
      3. Queries in which a combination of nested documents satisfies the filter criteria: If you do not use the $elemMatch operator, then when you specify a list of filter conditions, it is permissable for different objects in the array to match the conditions
                  db.inventory.find( { "instock.qty": { $gt: 10,  $lte: 20 } } )
        
        This query will be true if there is either one instock object whose qty field is between 10 and 20 or if there are two instock objects such that one's qty field is > 10 and the second's qty field is ≤ 20 (e.g., one whose qty field is 50 and one whose qty field is 5).

        Another example that shows how subtle changes to the operators produce much different results:

        	  db.inventory.find( { "instock.qty": 5, "instock.warehouse": "A" } )
        
        Again we query on the qty and warehouse fields but now this query is true if there are two instock objects such that one's qty field is 5 and the other's warehouse field is "A". Of course it is also true if one instock object meets both conditions.
  3. Projection: If you want to only print certain attributes in the returned documents you can include attributes by specifying their name and a 1 and you can exclude attributes by specifying their name and a 0. For example:
           db.inventory.find( { status: "A" }, { item: 1, status: 1 } )
    
    Returns the specified field and the _id field
           db.inventory.find( { status: "A" }, { item: 1, status: 1, _id: 0 } )
    
    Suppresses the id field
           db.inventory.find( { status: "A" }, { status: 0, instock: 0 } )
    
    Prints all fields except the status and instock fields In general:

    With the exception of the _id field, you cannot combine inclusion and exclusion statements in projection documents.

    1. Return Specific Fields in Embedded Documents
              db.inventory.find(
         	    { status: "A" },
          	    { item: 1, status: 1, "size.uom": 1 }
      	)
      
      Returns the item, status, and _id attributes in the outermost document and the uom attribute from the object embedded in the size field.

    2. Return Specific Fields from Documents Embedded in an Array (show the following query in the Mongo DB shell)
              db.inventory.find( { status: "A" }, { item: 1, status: 1, "instock.qty": 1 } )
      
      Prints the instock array as a list of objects but only the qty field appears

    3. Ways to restrict the values returned from an array: There is no way to restrict array results to the ones that satisfy a query filter but you can use the $, $elemMatch, and $slice operators to somewhat restrict the results.

      The following query uses this schema:

      	  zipcode, [ students { name, school, age } ]
      
      	  db.schools.find( { zipcode: "63109" },
                                 { students: { $elemMatch: { school: 102 } } } )
      
      The operation returns documents that have zipcode equal to 63109 and prints the first student from the students array whose school is 102.

      The following query uses this schema:

      	  
      	  name, school, age, [ grades { mean, grade } ] } ]
      	  
      	  db.students.find( { grades: { $elemMatch: {
                                                  mean: { $gt: 70 },
                                                  grade: { $gt: 90 }
                                                } } },
                                  { "grades.$": 1 } )
      
      For each student, $ returns the first object from the grades array whose mean is > 70 and whose grade is > 90
      	  db.students.find( { grades: { $slice: 5 } } )
      
      returns the first five documents from each student's grade array. It returns all documents from a student's grade array if there are fewer than 5 documents in that array.

      You can do more things with slice, like return a range by skipping some number of entries or return the last documents in an array by using a negative number.


Aggregation Operation

Mongo DB provides three ways to perform aggregation

  1. the aggregation pipeline,
  2. the map-reduce function, and
  3. single purpose aggregation methods that count the documents in a collection or return the distinct values for an attribute. We won't discuss them further in this course.

Mongo DB is designed to return documents, not perform complex computations so the aggregation mechanism is pretty weak unless you become an expert and use some involved syntax that we won't discuss in this course. For example, on the homework assignment I wanted to have you write a query that would compute the average of a quiz array and the average of an exam array. It turns out not to be easy to do with Mongo DB. Often you would do an aggregation computation in a scripting language.

  1. Aggregation pipeline: Query result sets are passed through a pipeline, with the output of one operator becoming the input to the next operator

    1. Go to this link--https://docs.mongodb.com/manual/aggregation/--for an animated example of how the following query executes
          db.orders.aggregate([
             { $match: { status: "A" } },
             { $group: { _id: "$cust_id", total: { $sum: "$amount" } } }
          ])
      

      1. First Stage: The $match stage filters the documents by the status field and passes to the next stage those documents that have status equal to "A".

      2. Second Stage: The $group stage groups the documents by the cust_id field to calculate the sum of the amount for each unique cust_id. The final fields are named _id and total.

      3. Further notes on this query:
        1. $group uses the _id field to perform its grouping. The _id field is required
        2. To specify a field in an aggregation operation, put a $ in front of it, as is done with $cust_id and $amount
        3. You need to put quotes around the attribute fields that you are aggregating on.

      Another example:

      db.inventory.insertMany([
         { item: "journal", qty: 25, size: { h: 14, w: 21, uom: "cm" }, status: "A" },
         { item: "notebook", qty: 50, size: { h: 8.5, w: 11, uom: "in" }, status: "A" },
         { item: "paper", qty: 100, size: { h: 8.5, w: 11, uom: "in" }, status: "D" },
         { item: "journal", qty: 25, size: { h: 16, w: 21, uom: "cm" }, status: "F" },
         { item: "planner", qty: 75, size: { h: 22.85, w: 30, uom: "cm" }, status: "D" },
         { item: "postcard", qty: 100, size: { h: 15, w: 15.25, uom: "cm" }, status: "A" },
         { item: "postcard", qty: 75, size: { h: 10, w: 15.25, uom: "cm" }, status: "A" },
         { item: "postcard", qty: 45, size: { h: 10, w: 15.25, uom: "cm" }, status: "A" }
      ]);
      
          db.inventory.aggregate([{ $group : {_id: "$item", entries : {$sum : 1}}}])
          db.inventory.aggregate([{ $group : {_id: "$item", quantity : {$sum : "$qty"}}}])
          db.inventory.aggregate([{ $group : {_id: "$item", "avg_ht" : {$avg : "$size.h"}}}])
      
      The first aggregate query counts the number of times each item occurs. Note that we get a count on the number of documents by $summing 1. Don't try to count using the $count function--it does not work with the group operator.

      The second aggregate query counts the quantity of each item and the third aggregate query computes the average height of each item.

    2. Aggregate operations include $sum, $min, $max, $avg

    3. To sort the output, use the $sort operator with the same format as for the sort() function with find. For example to sort the inventory in ascending order by avg_ht:
          db.inventory.aggregate([{ $group : {_id: "$item", 
           			                "avg_ht" : {$avg : "$size.h"}}},
      			    { $sort : { avg_ht : 1 } } ])
      
    4. If you want to include additional fields in the output and you know that all the documents in each group will have the same value for the field(s) that you want (e.g., the same value for the name field), then you can use the $first operator to get the value of the fields you want. For example:
          db.inventory.aggregate([{ $group : {_id: "$item",
                                              item : { $first : "$item" },
      					quantity : {$sum : "$qty"}}}])
      
      This prints the item twice, once as _id and once as item but it's the only field that will be the same for all documents in a group. For example the status, qty, and size fields are different for each document.

    5. Computing an aggregate function for a field that is an array: If you want to compute an aggregate function for a field that is an array or an array of embedded documents, you must use the $unwind operator to create a duplicate tuple for each array element (the only field that differs is the array field), then perform the aggregation operation.

      For example, create the following document

      db.inventory.insertMany([{ _id: 1, "item" : "shirt", sizes: [ 10, 15, 20] },
                               { _id: 2, "item" : "pant", sizes: [ 32, 34, 36, 40] }])
      
      Then the query
                 db.inventory.aggregate( [ { $unwind : "$sizes" } ] )
      
      returns the following results:
      { "_id" : 1, "item" : "shirt", "sizes" : 10 }
      { "_id" : 1, "item" : "shirt", "sizes" : 15 }
      { "_id" : 1, "item" : "shirt", "sizes" : 20 }
      { "_id" : 2, "item" : "pant", "sizes" : 32 }
      { "_id" : 2, "item" : "pant", "sizes" : 34 }
      { "_id" : 2, "item" : "pant", "sizes" : 36 }
      { "_id" : 2, "item" : "pant", "sizes" : 40 }
      
      To compute the minimum size for each item, you can write:
              db.inventory.aggregate([{ $unwind : "$sizes" },
      	                        { $group : { _id : "$item",
      				             minSize : { $min : "$sizes" }
      				} } ] )
      

  2. Map-Reduce Function: Go to https://docs.mongodb.com/manual/core/map-reduce/ to see examples. I won't require you to write Map-Reduce aggregation queries in this course but I do expect you to know what map-reduce is.

    Map-Reduce allows the user to

    1. perform an optional query to restrict the query set,
    2. then execute a javascript function over each document in the query set to extract the desired information (map step). This will give you a list of objects, with one object per document in the query set
    3. then execute a 2nd javascript function over the mapped list to obtain a scalar result (reduce step).
    4. the result can be returned as a document where the result is placed in the attribute named by the "out" command
    Here is a sample map-reduce query that sums the amount of orders for each customer whose status is "A" (go to the above link to see a pictorial illustration of how this query executes).
    db.orders.mapReduce(
       map     → function() { emit( this.cust_id, this.amount ); },
       reduce  → function(key, values) { return Array.sum( values ); },
                  {
       query   →   query: { status: "A" },
       output  →   out: "order_totals"
                  }
    )
         
    In this query the key parameter matches this.cust_id and the values parameter matches this.amount

    For most aggregation operations, the Aggregation Pipeline provides better performance and more coherent interface. However, map-reduce operations provide some flexibility that is not presently available in the aggregation pipeline.


Updating Documents

For detailed information on updating documents, go to https://docs.mongodb.com/manual/tutorial/update-documents/

The 3 relevant commands are:

  1. db.collection.updateOne(<filter>, <update>, <options>): updates the first document that matches the filter.

  2. db.collection.updateMany(<filter>, <update>, <options>): updates all documents that match the filter.

  3. db.collection.replaceOne(<filter> <update>, <options>): replaces the first document matched by the filter by a new document.

  1. Example Update Queries

    We will use the following schema

            { item, qty, size { h, w, uom }, status }
    
    1.         db.inventory.updateOne(
         	    { item: "paper" },
          	    {
           	      $set: { "size.uom": "cm", status: "P" },
           	      $currentDate: { lastModified: true }
         	    }
              )
      
      Here's an explanation of this update operation:

      1. uses the $set operator to update the value of the size.uom field to "cm" and the value of the status field to "P",

      2. uses the $currentDate operator to update the value of the lastModified field to the current date. If lastModified field does not exist, $currentDate will create the field.

    2. This next query is like the previous query except that all documents whose qty field is < 50 get updated.
      	
              db.inventory.updateMany(
                 { "qty": { $lt: 50 } },
                 {
                   $set: { "size.uom": "in", status: "P" },
           	     $currentDate: { lastModified: true }
         	   }
      	)
      
    3. Here's a query that replaces the first document whose item attribute that matches "paper" with the new object that is shown. The old _id field is kept intact.
              db.inventory.replaceOne(
                { item: "paper" },
         	  { item: "paper",
      	     instock: [ { warehouse: "A", qty: 60 }, { warehouse: "B", qty: 40 } ] }
      	)
      

  2. Updating values in an Array: To add a new value to an array use the $push operator:

    1. Example: Append a Value to an Array

      Suppose you have the following document for a student:

                 { _id: 1, name: "Joe", scores: [50, 99, 73] }
      
      The following example appends 89 to the scores array:
                 db.students.update(
                     { _id: 1 },
                     { $push: { scores: 89 } }
                 )
      
    2. Example: Append a document to an array

      Suppose you have the following document for a student:

      	   
      {
         "_id" : 5,
         "quizzes" : [
            { "wk": 1, "score" : 10 },
            { "wk": 2, "score" : 8 },
            { "wk": 3, "score" : 5 },
            { "wk": 4, "score" : 6 }
         ]
      }
      
      The following command will add a new document to the quizzes array:
      db.students.update(
         { _id: 5 },
         { $push: { quizzes: { "wk" : 5, "score" : 7 } } }
      )
      
    3. If you want to update a document in an array and you know its exact location in the array, then you can use a positional index. The first document is at index 0. For example to change the third quiz to be the document {wk : 8, score: 10} I could write:
      db.students.update( 
          { _id: 5},
          { $set : { "quizzes.2" : { wk : 8, score : 10 } } }
      )
      
    4. If instead I want to modify some of the fields of an array document but not replace it, then I can use the positional syntax shown above and append the field name. For example, given the above set of quizzes for a student, you can update the score field in the third quiz to 7 and the wk field in the fourth quiz to 6 using the following command:
      db.students.update(
         {_id : 5},
         { $set : { "quizzes.2.score" : 7 ,
                    "quizzes.3.wk" : 6 } }
      }
           
    5. Finally suppose you do not know the exact position of a document but you know the value of one of the fields in the document that needs to change. For example, suppose we want to change the score of the wk 3 quiz to 7. Then we use the positional operator $ which acts as a placeholder for the first array document that matches the query filter.
      db.students.update(
         {_id : 5, "quizzes.wk" : 3},
         { $set : { "quizzes.$.score" : 7 } }
      )
      


Deletion

The relevant commands for deletion are:

  1. db.collection.deleteMany(<filter>): deletes all documents that match the filter

  2. db.collection.deleteOne(<filter>): deletes only the first document that matches the filter

  3. db.collection.deleteMany({}): deletes all documents in the collection

filter can be any of the filters previously discussed for the find command.