Free Wins in Storage and Read Speed: Flat Schemas vs Structured Schemas
When developers or admins who grew up with the “relational way” move to MongoDB, they often design flat documents. That habit makes sense, since relational modeling trains you to think in two dimensions, with data spread across tables.
MongoDB stores data as BSON documents, which are close to a binary form of JSON with a few differences. Because of that format, schemas can have multiple levels. You can read more in the BSON specification and its differences from JSON.
A MongoDB document is a set of key and value pairs. A value can be any BSON type, including nested documents, arrays, or arrays of documents.
Using nested documents or arrays lets you model a structured schema, where one field groups related details. This is an alternative to a flat schema.
Consider the same user record expressed both ways:

Both versions hold identical information. In flatUser everything sits on one level. In structuredUser fields are nested to reflect related data.
Why pick structured instead of flat? The short answer is that structured schemas can use less disk space and they can be quicker to traverse. To see why, it helps to recall how BSON is laid out.
For our purpose, think of a BSON document as a list of items, one per field and value. Each item includes a type byte, the field name as a string, a four byte length for variable sized values, and the serialized value bytes. In a picture, it looks like this:

Now let’s compare storage for the user’s name.
In flatUser, the storage table looks like this:
| field-and-value | Type | Field Name | Field Length | Field Data | Total |
|---|---|---|---|---|---|
| name_first: “john” | 1 byte | 10 bytes | 4 bytes | 4 bytes | 19 bytes |
| name_last: “smith” | 1 byte | 9 bytes | 4 bytes | 5 bytes | 19 bytes |
| name_middle: “oliver” | 1 byte | 11 bytes | 4 bytes | 6 bytes | 22 bytes |
Summing the totals, the flat approach spends 60 bytes for the name field and value.
For structuredUser, split the accounting into two tables. The first table is the nested document that holds the name. The second table is the field and value for the name itself.
First table, the value of the field name:
| field-and-value | Type | Field Name | Field Length | Field Data | Total Size |
|---|---|---|---|---|---|
| first: “john” | 1 byte | 5 bytes | 4 bytes | 4 bytes | 14 bytes |
| last: “smith” | 1 byte | 4 bytes | 4 bytes | 5 bytes | 14 bytes |
| middle: “oliver” | 1 byte | 6 bytes | 4 bytes | 6 bytes | 17 bytes |
Those entries add up to 45 bytes for the value of name. Now the second table:
| field-and-value | Type | Field Name | Field Length | Field Data | Total Size |
|---|---|---|---|---|---|
| name: { … } | 1 byte | 4 bytes | 4 bytes | 45 bytes | 54 bytes |
Together, the structured approach uses 54 bytes for the user name.
The big gap comes from the field name bytes. The flat design spends 30 bytes on field names, while the structured design spends 19 bytes. The repeated substring “name_” in the flat fields drives the extra cost.
When these two full documents are stored in MongoDB, the flat version is 403 bytes and the structured version is 307 bytes. That is about a 24 percent space reduction with only a schema refactor, and the structured document is also easier to read.
Next, consider traversal speed for a lookup like the work address zip code.
In flatUser, reaching address_work_zip from the document start requires 12 field name comparisons.
In structuredUser, reaching address.work.zip takes 8 comparisons. The reduction happens because some values are documents. When the cursor reads a field like name, it can skip any nested fields that clearly cannot contain address.work.zip. The same idea applies when the cursor reads address.home and can skip street, number, zip, state, and country within that branch.
To measure the effect, we ran a focused test with this setup:
- The MongoDB instance used in-memory storage to isolate document traversal.
- Flat schemas used documents with 10, 25, 50, and 100 fields.
- Structured schemas used 2×5, 5×5, 10×5, and 20×5 layouts, where 2×5 means two document fields with five fields each.
- Each collection contained 10,000 documents generated with faker/npm.
- Queries searched for a field and value that did not exist, which forced a full scan of every document and field.
- Each query was run 100 times for every document size and schema.
- No concurrent workload ran during the tests.
Results:
| Documents | Flat | Structured | Difference | Improvement |
|---|---|---|---|---|
| 10 / 2×5 | 487 ms | 376 ms | 111 ms | 29,5% |
| 25 / 5×5 | 624 ms | 434 ms | 190 ms | 43,8% |
| 50 / 10×5 | 915 ms | 617 ms | 298 ms | 48,3% |
| 100 / 20×5 | 1384 ms | 891 ms | 493 ms | 55,4% |
As expected, structured documents were faster to traverse in this scenario. Keep in mind that gains vary with how you nest and organize fields.
This walkthrough showed how to get more from your MongoDB deployment by reshaping the schema while keeping the same information. You can also apply common MongoDB schema patterns to decide what belongs in each document. The article Building with Patterns covers widely used approaches and is a strong next step.
All test code is available in the GitHub repository.