Best way to count documents in MongoDB

Jul 31, 2023
Hello everyone šŸ‘‹šŸ‘‹ I created a MongoDB database with 10 million documents.šŸ˜„ My schema is simple and is for a very very basic inventory management system.

Table of Content

Click to expand

Hello everyone šŸ‘‹šŸ‘‹

I created a MongoDB database with 10 million documents.šŸ˜„ My schema is simple and is for a very very basic inventory management system.

Before weĀ begin

name: string,
quantity: number,

Here is a sample of some of the documents.

The schema looks simple enough but I have been working with a huge number of documents. To be precise, I am working with 12734005 documents. For those wondering how did I insert these documents here is the python code that helped me did it.

Link to python Code

Letā€™s talk aboutĀ count

letā€™s find out which is better in which situation

By default, In MongoDB provides us with 3 different count methods.

  • countDocuments()
  • cursor.count()
  • estimatedDocumentCount()

1. countDocuments()

it took somewhere between 4 to 10 seconds. Probably caching helped reduce its speed. Letā€™s discuss this below.

If you know aggregation, you can see that countDocuments() is the most accurate but slowest count query among the three. Behind the scene, it does a sequential scan, ** fancy way of saying goes through all documents **, in all documents for the query gets the count. This is the slowest count I found.

db.collection.countDocuments() wraps the following aggregation operation and returns just the value of n:

db.collection.aggregate([
       { $match: <query> },
       { $group: { _id: null, n: { $sum: 1 } } }
])

Source: MongoDb Official Documentation

2. cursor.count()

MongoDB returns a cursor for collection.find(<query>) type of queries. The cursor has many methods andĀ .count() is one of them. cursor.count() is same as countDocuments() but when its aboutĀ .find().count() it returns count from collstats. It takes constant time

The collStats command returns a variety of storage statistics for a given collection. SourceĀ : MongoDb official documentation

3. estimatedDocumentCount()

Unlike cursor.count() and countDocuments(), estimatedDocumentCount() does not take any query parameters. It returns the total count of documents. This is an estimated count. But I think it is ok to return the estimated count as no one would actually bother about if the count is right or wrong up to million. SourceĀ : MongoDb official documentation

Suggestions

  1. Donā€™t be that guy above
  2. Use estimatedDocumentCount() when counting total number of documents.
  3. Use find.count() when dealing with the counting of total number of documents with query filters
  4. Whenever using query filters, always filter by indexes, as it makes querying faster resulting in faster counts.
  5. Try storing count with query filters in a data store like Redis or a new collection and update it periodically.
  6. It is better to use pre-computed counts than to hog your databaseā€™s CPU.
  7. Best way to deal with million documents is by not dealing with millions at one time.

Conclusion

I am not the ā€œknow allā€ type of guy. I might be wrong in this blog or you might have some better way to count the documents. Letā€™s discuss. I like to be proven wrong and I want an opportunity to learn from you guys as well but until then peace out. šŸ˜„

Related Posts

database

Temporary tables in SQL and How I used it to improve performance

In PostgreSQL, a temporary table is a table that exists only during a database session. It is created and used within a single database session and is automatically dropped at the end of the session.
Temporary tables in SQL and How I used it to improve performance
database

Postgres Just In Time Compiler and Worker Processes

Both Just In Time compiler or JIT and worker processes could be news to you. By the end of this article, you would be able to understand the picture I have provided.
Postgres Just In Time Compiler and Worker Processes
database

JSONB in PostgresSQL and its daily uses

Postgres is a powerful SQL database with extensive features. It supports two JSON data types, json and jsonb, offering various functions and operators. Using JSONB can optimize database queries and reduce joins, as illustrated by examples like Netflix profiles and inflation-resistant order data.
JSONB in PostgresSQL and its daily uses
tips and tricks

I Bet You Donā€™t Use These JavaScript Tricks and Practices

Tell me how many did you know by responding to the article and you may share this with your network to find out how many they know.
I Bet You Donā€™t Use These JavaScript Tricks and Practices

You may also like

database

Best way to count documents in MongoDB

Hello everyone šŸ‘‹šŸ‘‹ I created a MongoDB database with 10 million documents.šŸ˜„ My schema is simple and is for a very very basic inventory management system.
Best way to count documents in MongoDB
tips and tricks

Overloading TypeScript Constructors like C++ or Java with a Catch

Whenever I was looking into the TypeScript Date class constructor implementation, I always wondered how its constructor signature worked like that, how it can have many signature, how can it work with 1st argument as number or string or an instance of itself.
Overloading TypeScript Constructors like C++ or Java with a Catch
learning

The new challenge format

Recently in my country, many hashtag challenges are going viral. Challenges like #couplechallenge, #singlechallenge, #dropyourdopephoto, #dropyourpout etc are coming and going around in social media platform.
The new challenge format
backend

Every Class in NestJS and Its Functionalities

NestJS is a huge framework built on top of TypeScript, JavaScript also uses Express or Fasitfy as an underlying mechanism to route requests.
Every Class in NestJS and Its Functionalities
Reading List Contact
server

© Nirjal Paudel