A Practical Guide to MongoDB Index Optimization

Featuring a Real-World E-Commerce Case Study 

Introduction to Indexes

What Is an Index?

An index in MongoDB is a data structure that significantly improves query performance by allowing efficient data lookups—similar to a book’s table of contents. Without indexes, MongoDB performs a full collection scan, which becomes inefficient as data grows. 

Why Indexes Matter

Indexes map field values to document locations, reducing the number of documents MongoDB needs to scan. For example, searching for orders with status: “pending” without an index means scanning every document. With an index, MongoDB can jump directly to the relevant results. 

How Indexes Work

MongoDB uses B-tree data structures to power its indexes. When a query runs, MongoDB consults the index to quickly locate matching documents, minimizing disk I/O and speeding up query execution. 

Case Study: Optimizing a Slow Query in an E-Commerce Database

Let’s explore a common scenario: an e-commerce platform with an orders collection containing millions of documents. Each document includes fields like customer_id, order_date, status, and total_amount. 

A frequently run query looks like this: 

				
					db.orders.find({ customer_id: "12345", status: "pending" }); 
				
			

Users report slow performance. Let’s diagnose and optimize this query using a step-by-step approach. 

Step-by-Step Guide to Identify and Optimize Slow Queries

Step 1: Analyze the Query with explain("executionStats")

The following screenshot is from the Explain Plan view in Compass: 

Screenshot at Aug 07 10 51 12

Use explain(“executionStats“) to inspect how MongoDB executes the query: 

				
					db.orders.find({ customer_id: "12345", status: "pending" }).explain("executionStats"); 
				
			

Sample Output:

				
					{ 
  "queryPlanner": { 
    "namespace": "ecommerce.orders", 
    "indexFilterSet": false, 
    "parsedQuery": { 
      "$and": [ 
        { "customer_id": { "$eq": "12345" } }, 
        { "status": { "$eq": "pending" } } 
      ] 
    }, 
    "winningPlan": { 
      "stage": "COLLSCAN", 
      "filter": { 
        "$and": [ 
          { "customer_id": { "$eq": "12345" } }, 
          { "status": { "$eq": "pending" } } 
        ] 
      }, 
      "direction": "forward" 
    }, 
    "rejectedPlans": [] 
  }, 
  "executionStats": { 
    "executionSuccess": true, 
    "nReturned": 50, 
    "executionTimeMillis": 1200, 
    "totalKeysExamined": 0, 
    "totalDocsExamined": 5000000 
  } 
} 
				
			

Analysis:

  • COLLSCAN: The query performs a full collection scan, examining 5 million documents (totalDocsExamined: 5000000). 
  • Execution Time: 1200ms, indicating a slow query. 
  • No Index Used: totalKeysExamined: 0 shows no index was utilized. 

Step 2: Identify the Issue

The query is slow because it scans the entire collection due to the absence of an index on customer_id and status. This is inefficient for a collection with millions of documents. 

Step 3: Create an Appropriate Index

Since the query filters on customer_id and status, create a compound index to cover both fields: 

				
					db.orders.createIndex({ customer_id: 1, status: 1 }); 
				
			

Why a Compound Index? 
A compound index supports queries that filter on the indexed fields in the order they are defined. Here, it optimizes queries filtering on customer_id alone or both customer_id and status. 

Step 4: Re-run explain() to Verify Improvement

Run the query again with explain(“executionStats“): 

				
					{ 
  "queryPlanner": { 
    "namespace": "ecommerce.orders", 
    "indexFilterSet": false, 
    "parsedQuery": { 
      "$and": [ 
        { "customer_id": { "$eq": "12345" } }, 
        { "status": { "$eq": "pending" } } 
      ] 
    }, 
    "winningPlan": { 
      "stage": "FETCH", 
      "inputStage": { 
        "stage": "IXSCAN", 
        "keyPattern": { "customer_id": 1, "status": 1 }, 
        "indexName": "customer_id_1_status_1", 
        "indexBounds": { 
          "customer_id": ["[12345, 12345]"], 
          "status": ["[pending, pending]"] 
        } 
      } 
    }, 
    "rejectedPlans": [] 
  }, 
  "executionStats": { 
    "executionSuccess": true, 
    "nReturned": 50, 
    "executionTimeMillis": 10, 
    "totalKeysExamined": 50, 
    "totalDocsExamined": 50 
  } 
} 
				
			

Analysis: 

  • IXSCAN: The query now uses the index (stage: “IXSCAN”). 
  • Improved Performance: Execution time dropped to 10ms, and only 50 documents were examined (totalDocsExamined: 50). 
  • Efficient: The index reduced the scan scope to exactly the matching documents. 

Step 5: Monitor and Maintain Indexes

  • Check Index Usage: Use db.orders.getIndexes() to review existing indexes. 

Remove Unused Indexes: Drop indexes not used by queries to save storage and reduce write overhead: 

				
					db.orders.dropIndex("unused_index_name"); 
				
			

Monitor Slow Queries: Enable MongoDB’s slow query log to identify other problematic queries: 

				
					db.setProfilingLevel(1, { slowms: 100 }); 
				
			

Optimizing Aggregation Queries

Aggregation pipelines, used for complex data processing, can also benefit from indexes. Consider an aggregation to calculate total sales per customer: 

				
					db.orders.aggregate([ 
  { $match: { status: "completed" } }, 
  { $group: { _id: "$customer_id", totalSales: { $sum: "$total_amount" } } } 
]); 
				
			

Schema validation is especially valuable when: 

  • Managing user profiles or customer records 
  • Handling order processing and financial transactions 
  • Standardizing API responses across microservices 
  • Ensuring reporting fields remain consistent over time 

Step 1: Analyze Aggregation with explain()

Use explain(“executionStats“) for aggregations: 

				
					db.orders.explain("executionStats").aggregate([ 
  { $match: { status: "completed" } }, 
  { $group: { _id: "$customer_id", totalSales: { $sum: "$total_amount" } } } 
]); 
				
			

Sample Output (No Index):

				
					{ 
  "stages": [ 
    { 
      "$cursor": { 
        "queryPlanner": { 
          "winningPlan": { "stage": "COLLSCAN" }, 
          "rejectedPlans": [] 
        }, 
        "executionStats": { 
          "nReturned": 1000000, 
          "executionTimeMillis": 1500, 
          "totalDocsExamined": 5000000 
        } 
      } 
    }, 
    { "$group": { ... } } 
  ] 
} 
				
			

Analysis: 

  • The $match stage causes a full collection scan (COLLSCAN), examining 5 million documents. 
  • Execution time is high (1500ms) due to the lack of an index. 

Step 2: Create an Index for Aggregation

The $match stage filters on status. Create an index to optimize it: 

				
					db.orders.createIndex({ status: 1 }); 
				
			

MongoDB Atlas makes it easy to get started. But to truly deliver performance, consistency, and long-term value, ongoing optimization is essential. In this guide, we explored four key areas: 

  • Search: Deliver fast, relevant results using Atlas Search and proper indexing 
  • Validation: Enforce structure and consistency with flexible schema validation 
  • Visualization: Ensure data quality to power meaningful insights with MongoDB Charts 
  • Security: Review configuration and implement controls to protect your environment 

Step 3: Re-run explain() for Aggregation

Re-run the aggregation with explain(): 

				
					{ 
  "stages": [ 
    { 
      "$cursor": { 
        "queryPlanner": { 
          "winningPlan": { 
            "stage": "FETCH", 
            "inputStage": { 
              "stage": "IXSCAN", 
              "keyPattern": { "status": 1 }, 
              "indexName": "status_1" 
            } 
          }, 
          "rejectedPlans": [] 
        }, 
        "executionStats": { 
          "nReturned": 1000000, 
          "executionTimeMillis": 200, 
          "totalDocsExamined": 1000000 
        } 
      } 
    }, 
    { "$group": { ... } } 
  ] 
} 
				
			

Analysis: 

  • IXSCAN: The $match stage now uses the index, reducing execution time to 200ms. 
  • Fewer Documents Scanned: Only documents matching status: “completed” are processed by the $group stage. 

Notes on Aggregation Indexes 

  • Indexable Stages: Only certain aggregation stages (e.g., $match, $sort, $geoNear) can use indexes. Place these stages early in the pipeline to filter documents before computationally expensive stages like $group. Here are some common aggregation stages that can take advantage of indexes: 

Stage 

Description 

$match 

Can use indexes (similar to find()), especially when placed at the beginning of the pipeline. 

$sort 

Can use indexes if the sort field is indexed and a significant number of documents have already been filtered (e.g., via $match). 

$geoNear 

Must be placed at the beginning of the pipeline and requires a geospatial index; otherwise, an error will occur. 

$search 

A stage specific to Atlas Search; uses Atlas Search indexes (not regular MongoDB indexes). 

$limit 

Can be optimized with $sort under appropriate conditions, leveraging indexes. 

  • Compound Indexes: For aggregations with multiple filters in $match, use a compound index, e.g., { status: 1, order_date: 1 }. 
  • Sort Optimization: If the pipeline includes $sort, ensure an index exists on the sorted field to avoid in-memory sorting. 

Additional Optimization Tips

Poor index design (such as using low-selectivity fields or creating redundant indexes) can lead to suboptimal query performance, slow down write operations, and consume excessive storage. 

Design for High Selectivity: 
High-selectivity fields are those with low repetition and high uniqueness (e.g., customer_id, email, order_id). The higher the selectivity, the better the index can filter data. For example: 

  • A query condition like status: “active” has poor selectivity if 90% of documents match this value. 
  • A query like customer_id: 123 has very high selectivity if the value is unique, making the index highly effective. 

Apply the ESR Rule (Equality, Sort, Range): 
When designing compound indexes, follow the order Equality → Sort → Range to maximize index efficiency: 

				
					db.orders.createIndex({ customer_id: 1, order_date: -1, amount: 1 }) 
				
			
  • customer_id is used in equality conditions 
  • order_date is used for sorting 
  • amount is used for range queries (e.g., $gt, $lte) 

Avoid Over-Indexing: 
Too many indexes increase storage usage and slow down write performance. Use db.collection.stats() to monitor index size. 

It is recommended to periodically review indexes: 
Use db.collection.explain() and the Atlas Performance Advisor to identify unused indexes and consider removing them. 

Use Covered Queries: If a query only needs indexed fields, ensure it’s covered by the index to avoid fetching documents:

				
					db.orders.find({ customer_id: "12345" }, { customer_id: 1, _id: 0 }); 
				
			

 Regular Maintenance: Periodically review slow query logs and update indexes based on application changes. 

Summary

Indexes are critical for MongoDB performance, especially for large datasets. By analyzing queries with explain(“executionStats”), developers can identify inefficiencies, create targeted indexes, and verify improvements. For aggregations pipelines, placing $match early and supporting it with proper indexes leads to faster, more efficient processing. 

Key takeaways: 

  • Use explain() to understand query behavior 
  • Design compound indexes using the ESR rule 
  • Periodically review and clean up unused indexes 
  • Use covered queries when possible 
  • Optimize aggregation pipelines with indexed $match stages 

Want Help Tuning Your Index Strategy?

At Delbridge, we help teams get the most out of MongoDB by: 

  • Auditing and optimizing index structures 
  • Eliminating expensive collection scans 
  • Improving performance across queries and aggregations 
  • Designing for long-term scalability on MongoDB Atlas 

Whether you’re troubleshooting performance issues or building a new data model, our experts can help. 

Book a MongoDB Index Review with Delbridge or reach out to our team to get started. 

Scroll to Top