MongoDB

Introduction

Your MongoDB cluster is provisioned with 3 nodes managed by MongoDB's Cloud Manager, each in a different subnet and availability zone.

Cluster topology

For the purposes of data integrity and high availability, all MongoDB clusters are configured to run in a replica-set with 1 primary node and 2 secondary nodes, with automatic failover enabled.

MongoDB Cluster

Data backups and integrity

As noted above, running MongoDB in a replica-set configuration ensures that data is written to more than one node.

In addition, your cluster contains a backup agent which allows Cloud Manager to stream encrypted and compressed MongoDB oplog data to MongoDB's fault-tolerant, geographically distributed data centers, ensuring continuous backups.

Using Cloud Manager we're able to offer all our clients point-in-time data backups.

Identifying slow queries

As your traffic grows, so does the size of your database collections, which can mean that a previously fast index starts slowing down more and more.

There are two recommended ways to identify slow queries:

  • Tailing the logs: If your app is performing slowly, but still servicing requests, it's easiest to identify slow queries by tailing the logs on the primary.
  • Using the mongo shell: If your app has totally grinded to a halt and is no longer servicing requests, you'll want to use the mongo shell. Typically in this case what's happened is the query speed has become so slow that the primary instance's thread pool is totally full. If that's the case, you'll want to use the mongo shell to identify and kill the long running operations.

For either of these methods, we'll need to determine the primary node, so read on.

Determine the Mongo primary node

The first step to determining which queries are slow is to determine the primary node in your cluster.

This can be done by SSH'ing into a Mongo instance and running the following in the mongo shell:

$ mongo --port 27000 --eval 'rs.status().members.find(r=>r.state===1).name'

After doing so, you'll get the IP address of the current primary:

~/code/example_app $ workarea production ssh mongo
info Connecting via SSH as tvendetta to 10.95.19.190.
tvendetta@ip-10-95-19-190:~$ mongo --port 27000 --eval 'rs.status().members.find(r=>r.state===1).name'
MongoDB shell version v3.4.3
connecting to: mongodb://127.0.0.1:27000/
ip-10-95-18-47.ec2.internal:27000

Now that we know our primary is 10.95.18.47, lets SSH into that instance:

workarea production ssh 10.95.18.47

Once we're connected, we're ready to start hunting slow queries.

Tailing mongo logs to identify slow queries

In older environments, mongo logs will live in /mongodb/data/myclient-production/mongodb.log.

In newer environments, mongo logs will live in /var/log/mongodb.log.

After you've located your mongodb.log file, let's cat or tail it for slow COMMAND queries:

# Prints all slow queries from the past
cat /var/log/mongodb.log | grep COMMAND

# Tails the mongo log for slow queries happening in real time
tail -f /var/log/mongodb.log | grep COMMAND

Here's an example, with line breaks added to increase readability, from a real world slow query log:

query myclient_production.workarea_orders
query: { deleted_at: null, items.number: "OEIFJMDF28890-1" }
planSummary: IXSCAN { deleted_at: 1, email: 1, placed_at: 1 }
ntoreturn:0 ntoskip:0 keysExamined:79689 docsExamined:79689 cursorExhausted:1 numYields:624 nreturned:1 reslen:7742
locks:{ Global: { acquireCount: { r: 1250 } }, Database: { acquireCount: { r: 625 } }, Collection: { acquireCount: { r: 625 } } }
324ms

From the log above, we can see that our query is taking 324ms.

We also see planSummary: IXSCAN, which indicates our query is using an index, which is good.

Sometimes you'll forget to add an index, and you'll see planSummary: COLLSCAN, which is bad, as it means no index is being used. As the number of documents in a collection grows, these queries gradually get slower and slower.

From our log, we can see that our query signature isn't really relevant to the index being used, aside from the fact that they both reference deleted_at:

query: { deleted_at: null, items.number: "OEIFJMDF28890-1" }
index: { deleted_at: 1,      email: 1, placed_at: 1 }

Now that we've identified a slow query, we're ready to add an index to our collection, which we go over below.

Using the mongo shell to identify slow queries

Using the mongo shell to identify slow queries is good for when mongo has come grinding to a halt.

To begin, you'll want to open a mongo shell and switch to the database you're looking to debug:

$ mongo --port 27000
exampleclient-production:PRIMARY> show dbs
admin                     0.000GB
local                     0.008GB
exampleclient_production  0.003GB
exampleclient-production:PRIMARY> use exampleclient_production

After you've switched to the proper database, you'll run the following script:

db.currentOp().inprog.forEach(function(op) { if(op.secs_running > 5) printjson(op); })

This iterates over all database operations currently in progress, and outputs relevant information in JSON format, including query, time running, and the index in use.

Adding indices to collections

In this section we'll use the query example from the tailing mongo logs to identify slow queries section above.

Given the query below, and the index being used, our first step is to determine which fields in the document are relevant to the query.

myclient_production.workarea_orders
query: { deleted_at: null, items.number: "OEIFJMDF28890-1" }
index: { deleted_at: 1,      email: 1, placed_at: 1 }
324ms

In this case, we care about deleted_at and items.number.

Once we've logged into our primary instance, we want to:

Open a mongo shell: mongo --port 27000

Switch to the database we want to add an index to: use myclient_production

Add our index: db.workarea_orders.createIndex({ "deleted_at": -1, "items.number": 1 }, { "background": true })

Important to note:

The { "background": true } argument to the createIndex command tells MongoDB to index the collection in the background. Forgetting to set this means all reads/writes will be blocked while the index is being created.

On a large collection this may mean you can't read or write to the DB for a few minutes, which we don't want, especially in production.

By the end, your terminal will look something like this:

~/code/example_app $ workarea production ssh mongo
info Connecting via SSH as tvendetta to 10.95.19.190.
tvendetta@ip-10-95-19-190:~$ mongo --port 27000
MongoDB shell version v3.4.0
connecting to: mongodb://127.0.0.1:27000
myclient-production:PRIMARY> use myclient_production
switched to db myclient_production
myclient-production:PRIMARY> db.workarea_orders.createIndex({ "deleted_at": -1, "items.number": 1 }, { "background": true })
{
  "createdCollectionAutomatically" : false,
  "numIndexesBefore" : 7,
  "numIndexesAfter" : 8,
  "ok" : 1
}

But wait, there is one more step:

Now that we've added our index to the database, we also want to move it down to the Workarea code base so this index is re-created automatically on future environments:

module Workarea
  decorate Order, with: :exampleclient do
    decorated do
      index({ 'deleted_at' => -1, 'items.number' => 1}, { background: true })
    end
  end
end

After committing these changes to your repos, you're all set!