Best way to count documents in MongoDB

Jul 31, 2023
Hello everyone šŸ‘‹šŸ‘‹ I created a MongoDB database with 10 million documents.šŸ˜„ My schema is simple and is for a very very basic inventory management system.

Table of Content

Click to expand

Hello everyone šŸ‘‹šŸ‘‹

I created a MongoDB database with 10 million documents.šŸ˜„ My schema is simple and is for a very very basic inventory management system.

Before weĀ begin

name: string,
quantity: number,

Here is a sample of some of the documents.

The schema looks simple enough but I have been working with a huge number of documents. To be precise, I am working with 12734005 documents. For those wondering how did I insert these documents here is the python code that helped me did it.

Link to python Code

Letā€™s talk aboutĀ count

letā€™s find out which is better in which situation

By default, In MongoDB provides us with 3 different count methods.

  • countDocuments()
  • cursor.count()
  • estimatedDocumentCount()

1. countDocuments()

it took somewhere between 4 to 10 seconds. Probably caching helped reduce its speed. Letā€™s discuss this below.

If you know aggregation, you can see that countDocuments() is the most accurate but slowest count query among the three. Behind the scene, it does a sequential scan, ** fancy way of saying goes through all documents **, in all documents for the query gets the count. This is the slowest count I found.

db.collection.countDocuments() wraps the following aggregation operation and returns just the value of n:

db.collection.aggregate([
       { $match: <query> },
       { $group: { _id: null, n: { $sum: 1 } } }
])

Source: MongoDb Official Documentation

2. cursor.count()

MongoDB returns a cursor for collection.find(<query>) type of queries. The cursor has many methods andĀ .count() is one of them. cursor.count() is same as countDocuments() but when its aboutĀ .find().count() it returns count from collstats. It takes constant time

The collStats command returns a variety of storage statistics for a given collection. SourceĀ : MongoDb official documentation

3. estimatedDocumentCount()

Unlike cursor.count() and countDocuments(), estimatedDocumentCount() does not take any query parameters. It returns the total count of documents. This is an estimated count. But I think it is ok to return the estimated count as no one would actually bother about if the count is right or wrong up to million. SourceĀ : MongoDb official documentation

Suggestions

  1. Donā€™t be that guy above
  2. Use estimatedDocumentCount() when counting total number of documents.
  3. Use find.count() when dealing with the counting of total number of documents with query filters
  4. Whenever using query filters, always filter by indexes, as it makes querying faster resulting in faster counts.
  5. Try storing count with query filters in a data store like Redis or a new collection and update it periodically.
  6. It is better to use pre-computed counts than to hog your databaseā€™s CPU.
  7. Best way to deal with million documents is by not dealing with millions at one time.

Conclusion

I am not the ā€œknow allā€ type of guy. I might be wrong in this blog or you might have some better way to count the documents. Letā€™s discuss. I like to be proven wrong and I want an opportunity to learn from you guys as well but until then peace out. šŸ˜„


For the time being, comments are managed by Disqus, a third-party library. I will eventually replace it with another solution, but the timeline is unclear. Considering the amount of data being loaded, if you would like to view comments or post a comment, click on the button below. For more information about why you see this button, take a look at the following article.

Related Posts


You may also like

Have a problem to solve?
View my freelancing profile on upwork
Reading List Contact
server

© Nirjal Paudel