Transforming Oil & Gas Data Access with AI and MongoDB
Field engineers needed faster access to drilling performance data stored in MongoDB Atlas. Delbridge developed an AI-powered solution that converts plain-English, domain-specific questions into structured MongoDB queries using OpenAI’s language models, MongoDB’s vector search, and metadata-driven context. This gives engineers real-time access to insights without requiring technical expertise.
The Challenge
A leading energy technology company needed to make its drilling data more accessible for non-technical users. While the data was stored in MongoDB Atlas, querying it required knowledge of MongoDB Query Language (MQL), which limited access for field engineers and slowed decision-making.
The goal was to build a solution that could understand domain-specific natural language, automatically generate accurate queries, and retrieve results through the company’s REST API.
Our Approach
Delbridge developed a two-phase proof of concept (POC) to demonstrate and scale the use of AI-powered natural language querying in MongoDB.
- Phase 1 focused on proving that natural language inputs could be accurately converted into structured MQL using vector embeddings and MongoDB’s native vector search.
- Phase 2 enhanced the system’s accuracy and usability by introducing an interactive chatbot, refining metadata filtering, and a multi-agent architecture to handle increasingly complex and context-specific queries.
The solution was built using Python3 in a Google Colab environment and powered by OpenAI’s large language models. A metadata-driven approach enabled the extraction of contextual information from MongoDB collections to generate highly relevant and precise queries.
Phase 1: Building the Foundational Model
The primary goal of Phase 1 was to demonstrate how MongoDB could support on-demand embeddings and serve as a vector database. It also aimed to prove that natural language questions could be reliably translated into structured MQL to retrieve data via the REST API.
Example user questions included:
- “What is my average cost per foot by well section for the assets in the Midland Basin?”
- “What is my average feet per day for all assets in the Midland Basin?”
Using OpenAI’s models and metadata from MongoDB collections, the system produced accurate MQL queries tied to drilling performance metrics.
Solution Flow
- User Input
Users submitted technical drilling-related questions in plain language. - LLM Processing
The query was interpreted by OpenAI’s model using metadata and vector embeddings stored in MongoDB. - MQL Generation
A structured MQL query was generated based on the interpreted context. - Data Retrieval and Visualization
The query was executed via the REST API, and the data was returned in a visual format.
Using Python3 in Google Colab, the team rapidly iterated on the model to refine both query generation and metadata mapping. The results validated MongoDB’s utility as a real-time vector database and laid a strong foundation for Phase 2.
Phase 2: Enhancing Accuracy and the User Experience
Phase 2 focused on improving accuracy and making the experience more intuitive. The system needed to handle complex, domain-specific queries and adapt to varied user phrasing.
Key enhancements included:
- An interactive chatbot to guide users through query refinement
- Metadata filtering to provide richer context
- A multi-agent LLM architecture where each agent handled a specialized task (e.g., keyword extraction, metadata lookup, historical query reference)
How It Works: The Query Flow
- User Input and Initial Prompt
The user enters a natural language question, such as:
“What is my average feet per day for all assets in the Midland Basin?” - Chatbot Refinement
A conversational interface may ask clarifying questions:
“Are you looking for specific assets or all available data in the Midland Basin?”
This step is optional and continues until the user is satisfied with the prompt. - Multi-Agent Analysis
The refined prompt is processed by multiple LLM agents:- Knowledge Agent identifies key terms
- Metadata Agent selects relevant collections
- Field Agent extracts key fields and values
- Q&A Agent references past queries for consistency
- Query Generation and Optimization
The system synthesizes these inputs into a MongoDB query. If needed, it splits the query into smaller parts to run sequentially across collections. - Validation and Execution
The MQL is validated, executed through the API, and returned in a visual format.
Outcomes and Impact
Phase 2 delivered a significant leap in both precision and usability. With metadata refinement, a conversational interface, and modular LLM design, the system can now interpret complex technical queries with high contextual accuracy.
It also confirmed MongoDB’s ability to support advanced Retrieval-Augmented Generation (RAG) use cases at scale. Native vector support and a flexible architecture enable scalable, AI-powered querying of highly specialized datasets.
Looking Ahead
This proof of concept lays the groundwork for scalable, AI-driven data access. It empowers non-technical users to query complex operational data while providing a robust foundation for more advanced, real-time use cases.
Future enhancements may include:
- Real-time dashboards and visualizations
- Adaptive learning for continuous model improvement
- Multilingual support for global teams
- Deeper metadata tagging and contextual understanding
- Expansion into industries such as healthcare, finance, and logistics
By combining the power of OpenAI’s language models with MongoDB’s vector capabilities, this solution redefines how teams interact with data—making technical insights more accessible, responsive, and scalable.