DF100
Storage and Retrieval I
MongoDB Developer Fundamentals
Release: 20240216
Topics we cover
Creating Documents
Cursors
Updating Documents
Absolute Changes
Relative Changes
Conditional Changes
Deleting Documents
3
Load Sample Data
In your Atlas cluster,
Click the three dots [...]
Select Load Sample Dataset
Click Browse Collections to view
the databases and collections we
loaded.
We will be using the
sample_training database.
Follow the instructions to load sample data set in Atlas.
Click on the Collections Button to see a list of databases, out of which we would be using the
sample_training database.
Validate Loaded Data
Connect to the MongoDB Atlas MongoDB> use sample_training
switched to db sample_training
Cluster using mongosh MongoDB> db.grades.countDocuments({})
100000
Verify that data is loaded MongoDB> db.inspections.countDocuments({})
80047
Database to use: MongoDB>
sample_training
Collections to verify: grades and
inspections
Validate the loaded data by checking collection counts for sample_training.grades and
sample_training.inspections.
countDocuments causes the query to return just the number of results found.
Basic Database CRUD Interactions
Single Document Multiple Documents
Create insertOne(doc) insertMany([doc,doc,doc])
Read findOne(query, projection) find(query, projection)
Update updateOne(query,change) updateMany(query,change)
Delete deleteOne(query) deleteMany(query)
MongoDB APIs allow us to perform Create Read Update and Delete operations options to perform
single or multiple operations.
Creating
Documents
7
Creating New Documents - insertOne()
insertOne() adds a document to MongoDB> db.customers.insertOne({
_id : "
[email protected]",
a collection. name: "Robert Smith", orders: [], spend: 0,
lastpurchase: null
})
Documents are essentially { acknowledged: true, insertedId : "
[email protected]" }
Objects. MongoDB> db.customers.insertOne({
_id : "
[email protected]",
name: "Bobby Smith", orders: [], spend: 0,
_id field must be unique, it will })
lastpurchase: null
be added if not supplied. MongoServerError: E11000 duplicate key error ...
MongoDB> db.customers.insertOne({
name: "Andi Smith", orders: [], spend: 0,
lastpurchase: null
})
{acknowledged: true, insertedId: ObjectId("609abxxxxxx254")}
insertOne() adds a document to the collection on which it is called. It is the most basic way to
add a new document to a collection.
There are a very few default constraints, the document - which is represented by a language
object - Document, Dictionary, Object must be <16MB
It must have a unique value for _id. If we don't provide one, MongoDB will assign it a GUID of type
ObjectId - a MongoDB GUID type 12 bytes long.
{ "acknowledged”: true, ... } means it has succeeded in writing the data to one member of the
replica set however we have not specified whether we need it to be on more than one, or even
flushed to disk by default.
We can request stronger write guarantees as we will explain later.
Add Multiple Documents - insertMany()
Accepts an array of documents. // 1000 Network Calls
MongoDB> let st = ISODate()
for(let d=0;d<1000;d++) {
Single network call normally. db.orders.insertOne({ product: "socks", quantity: d})
}
Reduces network time. print(`${ISODate()-st} milliseconds`)
9106ms
Returns an object with // 1 Network call, same data
information about each insert. MongoDB> let st = ISODate()
let docs = []
for(let d=0;d<1000;d++) {
docs.push({ product: "socks", quantity: d})
}
db.orders.insertMany(docs)
print(`${ISODate()-st} milliseconds`)
51ms
insertMany() can add multiple new documents. Often 1000 at a time.
This avoids the need for a network round trip per document, which is really slow
Returns a document showing the success/failure of each and any primary keys assigned
Limit of 48MB or 100,000 documents data in a single call to the server, but a larger batch is broken
up behind the scenes by the driver
There is a way to bundle Insert, Update and Delete operations into a single network call too called
BulkWrite.
Order of operations in insertMany()
insertMany() can be ordered or MongoDB> let friends = [
{_id: "joe" },
unordered. {_id: "bob" },
{_id: "joe" },
{_id: "jen" }
Ordered (default) stops on first ]
MongoDB> db.collection1.insertMany(friends)
error. { errmsg : "E11000 duplicate key error ...",
nInserted : 2 }
Unordered reports errors but MongoDB> db.collection2.insertMany(friends,{ordered:false})
{ errmsg : "E11000 duplicate key error ...",
nInserted : 3 }
continues; can be reordered by
MongoDB> db.collection1.find()
the server to make the operation { _id : "joe" }
{ _id : "bob" }
faster. MongoDB> db.collection2.find()
{ _id : "joe" }
{ _id : "bob" }
{ _id : "jen" }
10
If we opt for strict ordering then:
● It must stop on first error
● No reordering or parallelism can be done so slower in sharded cluster.
Reading
Documents
11
Find and Retrieve documents
findOne() retrieves a single MongoDB> db.customers.insertOne({
_id : "
[email protected]",
document. name: "Timothy",
orders: [], spend: 0,
lastpurchase: null
Accepts a document as a filter })
to “query-by-example.” { acknowledged: true, insertedId : "[email protected]" }
MongoDB> db.customers.findOne({ _id : "
[email protected]" })
{ _id : "
[email protected]",
Empty object (or no object) name : "Timothy",
orders : [ ],
spend : 0,
matches everything. }
lastpurchase : null
MongoDB> db.customers.findOne({ spend: 0 })
MongoDB> db.customers.findOne({ spend: 0 , name: "Timothy" })
MongoDB> db.customers.findOne({ name: "timothy" }) // No match
✗
MongoDB> db.customers.findOne({ spend: "0" }) // No Match ✗
MongoDB> db.customers.findOne({}) // All Match - Return one
12
We can retrieve a document using findOne(). findOne() takes an Object as an argument
We return the first document we find where all the members match. If there are multiple matches
there is no way to predict which is 'first' in this case.
Here we add a record for customer Timothy using insertOne()
Then we query by the _id field - which has the user’s email and we find the record - this returns an
object - and mongosh prints what is returned.
We can also query by any other field - although only _id has an index by default so the others here
are less efficient for now.
We can supply multiple fields, and if they all match we find the record - Someone called Timothy
who has spent 0 dollars.
Note that the order of the fields in the query does not matter here - we can think of the comma as
just meaning AND
db.customers.findOne({ spend: "0" }) fails - because it's looking for the String "0" not the
number 0 so doesn't match.
An Empty object matches everything. However, due to the inherent nature of findOne() it would
return us only one document.
Find and Retrieve documents
Regex can be used to find string
MongoDB> db.customers.findOne({ name: "timothy" }) // No match
values without needing exact ✗
matching related to case MongoDB> db.customers.findOne({ name: {$regex: /timothy/i }})
//Returns a match
sensitivity MongoDB> db.customers.findOne({ name: /timothy/i })
//Returns a match
$regex operator is optional in
syntax and can be omitted to
get the same result
13
The example is done in javascript regex since mongosh is a js REPL. The regex
structure is entirely language dependent based on the driver you are working with.
Projection: choosing the fields to return
Find operations can include a MongoDB> db.customers.insertOne({
_id : "
[email protected]",
projection parameter. name: "Ann", orders: [], spend: 0,
lastpurchase: null
})
Projections only return a subset MongoDB> db.customers.findOne({ name: "Ann" })
{ _id : "
[email protected]",
of each document. name : "Ann",
orders : [], spend: 0, lastpurchase: null }
Projections include/exclude a set MongoDB> db.customers.findOne({ name:"Ann" },{name:1, spend:1})
{ _id : "
[email protected]", name : "Ann", spend : 0 }
of fields. MongoDB> db.customers.findOne({ name:"Ann" },{name:0, orders:0})
{ _id : "
[email protected]", spend : 0, lastpurchase : null }
MongoDB> db.customers.findOne({ name:"Ann" },{name:0, orders:1})
MongoServerError: "Cannot do inclusion on field orders in
exclusion projection"
MongoDB> db.customers.findOne({ name:"Ann" },{_id: 0, name:1})
{ name : "Ann" }
14
We can select the fields to return by providing an object with those fields and a value of 1 for each.
Documents can be large; with the help of projection we can have MongoDB return a subset of the
fields.
_id is always returned by default.
We can instead choose what field NOT to return by providing an object with fields set to 0.
We cannot mix and match 0 and 1 - as what should it do with any other fields?
There is an exception where we can use _id: 0 it to remove _id from the projection and project
only the fields that are required { _id:0, name : 1 }
There are some more advanced projection options, including projecting parts of an array
and projecting computed fields using aggregation but those are not covered here.
Fetch multiple documents using find()
find() returns a cursor object MongoDB> for(let x=0;x<200;x++) {
db.taxis.insertOne({ plate: x })
rather than a single document }
MongoDB> db.taxis.find({})
We fetch documents from the { _id : ObjectId("609b9aaccf0c3aa225ce9116"), plate : 0 }
{ _id : ObjectId("609b9aaccf0c3aa225ce9117"), plate : 1 }
cursor to get all matches ...
{ _id : ObjectId("609b9aaccf0c3aa225ce9129"), plate : 19 }
Type "it" for more
mongosh fetches and displays MongoDB> it
20 documents from the cursor { _id : ObjectId("609b9aaccf0c3aa225ce912a"), plate : 20 }
{ _id : ObjectId("609b9aaccf0c3aa225ce912b"), plate : 21 }
object. ...
{ _id : ObjectId("609b9aaccf0c3aa225ce913d"), plate : 39 }
MongoDB> db.taxis.find({ plate: 5 })
{ _id : ObjectId("609b9aaccf0c3aa225ce911b"), plate : 5 }
15
Find returns a cursor object, by default the shell then tries to print that out.
The cursor object prints out by displaying its next 20 documents and setting the value of a
variable called it to itself.
If we type it - then it tries to print the cursor again - and display the next 20 objects.
As a programmer - cursors won't do anything until we look at them.
We can add .pretty() to a cursor object to make the shell display larger documents with newlines
and indentation.
Cursors
16
Using Cursors
Here, we store the result of find MongoDB> let mycursor = db.taxis.find({})
to a variable. MongoDB> while (mycursor.hasNext()) {
let doc = mycursor.next();
We then manually iterate over }
printjson(doc)
the cursor. { _id : ObjectId("609b9aaccf0c3aa225ce9117"), plate : 0 }
{ _id : ObjectId("609b9aaccf0c3aa225ce9118"), plate : 1 }
...
The query is not actually run until { _id : ObjectId("609b9aaccf0c3aa225ce91dd"), plate : 199 }
we fetch results from the cursor. MongoDB> let mycursor = db.taxis.find({}) // No Output
MongoDB> mycursor.forEach( doc => { printjson(doc) })
//This does nothing - does not even contact the server!
MongoDB> for(let x=0;x<100;x++) {
let c = db.taxis.find({})
}
17
mycursor is a cursor object, it knows the database, collection and query we want to run.
Until we do something with it it has not run the query - it has not even contacted the server.
It has methods - importantly , in mongosh hasNext() and next() to check for more values and
fetch them.
We can iterate over a cursor in various ways depending on our programming language.
If we don't fetch information from a cursor - it never executes the find - this might not be expected
when doing simple performance tests like the one below.
To pull the results from a cursor in a shell for testing speed we can use
db.collection.find(query).itcount()
Cursor modifiers
Cursors can include additional MongoDB> for(let x=0;x<200;x++) {
db.taxis.insertOne({plate:x})
instructions like limit, skip, etc. }
MongoDB> db.taxis.find({}).limit(5)
Skip and limit return us cursors. { _id : ObjectId("609b9aaccf0c3aa225ce9116"),
{ _id : ObjectId("609b9aaccf0c3aa225ce9117"),
plate
plate
:
:
0
1
}
}
{ _id : ObjectId("609b9aaccf0c3aa225ce9118"), plate : 2 }
{ _id : ObjectId("609b9aaccf0c3aa225ce9119"), plate : 3 }
{ _id : ObjectId("609b9aaccf0c3aa225ce911a"), plate : 4 }
MongoDB> db.taxis.find({}).skip(2)
{ _id : ObjectId("609b9aaccf0c3aa225ce9118"), plate : 2 }
... REMOVED for clarity ...
{ _id : ObjectId("609b9aaccf0c3aa225ce912b"), plate : 21 }
Type "it" for more
MongoDB> db.taxis.find({}).skip(8).limit(2)
{ _id : ObjectId("609b9aaccf0c3aa225ce911e"), plate : 8 }
{ _id : ObjectId("609b9aaccf0c3aa225ce911f"), plate : 9 }
18
We can add a limit instruction to the cursor to stop the query when it finds enough results.
We can add a skip instruction to the cursor to tell it to ignore the first N results.
The Skip is always performed before the limit when computing the answer.
This can be used for simple paging of results - although it's not the optimal way of doing so.
Skip has a cost on the server - skipping a large number of documents is not advisable.
Sorting Results
Use sort() cursor modifier to MongoDB> let rnd = (x)=>Math.floor(Math.random()*x)
retrieve results in a specific order MongoDB>
for(let x=0;x<100;x++) { db.scores.insertOne({ride:rnd(40),swim:
rnd(40),run:rnd(40)})}
Specify an object listing fields in //Unsorted
MongoDB> db.scores.find({},{_id:0})
the order to sort and sort { ride : 5, swim : 11, run : 11 }
{ ride : 0, swim : 17, run : 12 }
{ ride : 17, swim : 2, run : 2 }
direction //Sorted by ride increasing
MongoDB> db.scores.find({},{_id:0}).sort({ride: 1})
{ ride : 0, swim : 38, run : 10 }
{ ride : 1, swim : 37, run : 37 }
{ ride : 1, swim : 30, run : 20 }
//Sorted by swim increasing then ride decreasing
MongoDB> db.scores.find({},{_id:0}).sort({swim: 1, ride: -1})
{ ride : 31, swim : 0, run : 14 }
{ ride : 11, swim : 0, run : 14 }
{ ride : 30, swim : 1, run : 34 }
{ ride : 21, swim : 1, run : 3 }
19
With Skip and Limit sorting can be very important so we skip to limit to what we expect.
We cannot assume anything about the order of unsorted results.
Sorting results without an index is very inefficient - we cover this when talking about indexes later.
Cursors work in batches
Cursors fetch results from the server in batches.
The default batch size in the shell is 101 documents during the
initial call to find() with a limit of 16MB.
If we fetch more than the first 100 document from a cursor it
fetches in 16MB batches in the shell or up to 48MB in some drivers.
20
Rather than make a call to the server every time we get the next document from a cursor, the
server fetches the result in batches and stores them at the client or shell end until we want them.
Fetching documents one by one would be slow.
Fetching all documents at once would use too much client RAM.
We can change the batch size on the cursor if we need to but it's still limited to 16M.
Fetching additional data from a cursor uses a function called getmore() behind the scenes, it
fetches 16MB at a time.
Exercise
Add four documents to a MongoDB> db.diaries.drop()
collection called diaries using the MongoDB> db.diaries.insertMany([
commands shown here. {
name: "dug", day: ISODate("2014-11-04"),
txt: "went for a walk"
Write a find() operation to },
{
output only diary entries from name: "dug", day: ISODate("2014-11-06"),
txt: "saw a squirrel"
dug. },
{
name: "ray", day: ISODate("2014-11-06"),
Modify it to output the line below txt: "met dug in the park"
},
using skip, limit and a {
projection. name: "dug", day: ISODate("2014-11-09"),
txt: "got a treat"
}
[{name: 'dug', txt: 'saw a squirrel'}] ])
21
Answers at the end
Quiz Time!
22
#1. When does a find() query get
executed on the MongoDB server?
When you call When the driver
When a cursor is
A iterated B the find()
function
C connects to the
database
Every time we Every time an
D add a projection E index is created
Answer in the next slide.
23
#1. When does a find() query get
executed on the MongoDB server?
When you call When the driver
When a cursor is
A iterated B the find()
function
C connects to the
database
Every time we Every time an
D add a projection E index is created
24
find() returns a cursor object rather than a document/s. The shell starts retrieving the first 20
results but by default find() on its own does not retrieve documents.
Calling find() does not return any values until you start retrieve data with the cursor.
The find() query does not have relationship with a connection pool or the driver connection.
The creation of a cursor, adding a projection, or creating an index do not execute the find query.
#2. Why is insertMany() faster than
multiple insertOne() operations?
Performs the
Needs fewer Reduces the
A writes to disk. B network time. C writes as a single
transaction.
Allows parallel
Replicates to
processing of
D other servers
faster.
E inserts in
sharded clusters.
Answer in the next slide.
25
#2. Why is insertMany() faster than
multiple insertOne() operations?
Performs the
Needs fewer Reduces the
A writes to disk. B network time. C writes as a single
transaction.
Allows parallel
Replicates to
processing of
D other servers
faster.
E inserts in
sharded clusters.
26
Recap
Recap
Using Bulk writes vs. Single Writes
has better network performance
find() returns us a cursor object
which the shell then pulls from
27
Exercise
Answers
28
Answer -Exercise: find, skip and limit
Write a find() to output only diary entries from "dug":
MongoDB> db.diaries.find({name:"dug"})
{"_id" : ObjectId("609ba812cf0c3aa225ce91de"), "name" : "dug", "day" : ISODate("2014-11-
04T00:00:00Z"), "txt" : "went for a walk" }
{"_id" : ObjectId("609ba812cf0c3aa225ce91df"), "name" : "dug", "day" : ISODate("2014-11-
06T00:00:00Z"), "txt" : "saw a squirrel" }
{"_id" : ObjectId("609ba812cf0c3aa225ce91e1"), "name" : "dug", "day" : ISODate("2014-11-
09T00:00:00Z"), "txt" : "got a treat" }
Modify it to output the line below using skip, limit and a projection:
MongoDB> db.diaries.find({name:"dug"},{_id:0,day:0}).skip(1).limit(1)
{ name: "dug", txt: "saw a squirrel" }
29