# Deployment and DevOps Standards for MongoDB
This document outlines the Deployment and DevOps standards for MongoDB. Following these standards will help ensure maintainable, performant, and secure MongoDB deployments. This focuses on aspects of getting a MongoDB deployment into production and keeping it healthy rather than application code.
## 1. Infrastructure as Code (IaC)
### Standard
Use Infrastructure as Code (IaC) tools to provision and manage MongoDB infrastructure.
**Do This:**
* Use tools like Terraform, Ansible, or CloudFormation to define MongoDB infrastructure.
* Automate as much of the deployment process as possible.
* Treat infrastructure configurations as code and manage them in version control.
**Don't Do This:**
* Manually provision MongoDB instances.
* Make ad-hoc configuration changes to the infrastructure.
**Why:** IaC promotes reproducibility, reduces manual errors, and facilitates disaster recovery. It also simplifies scaling and environment parity.
**Example (Terraform):**
"""terraform
resource "aws_instance" "mongodb_replica_set" {
ami = "ami-xxxxxxxxxxxxxxxxx" # Replace with a suitable AMI
instance_type = "t3.medium"
subnet_id = "subnet-xxxxxxxxxxxxx" #Replace with your VPC Subnet
vpc_security_group_ids = ["sg-xxxxxxxxxxxxxxxxx"] # Replace with your Security Group
tags = {
Name = "mongodb-replica-set-member"
}
user_data = <<-EOF
#!/bin/bash
sudo apt-get update
sudo apt-get install -y mongodb
# Configure MongoDB as a replica set member
sudo sed -i "s/127.0.0.1/0.0.0.0/g" /etc/mongod.conf
sudo sed -i "s/#replication:/replication:\n replSetName: rs0/g" /etc/mongod.conf
sudo systemctl restart mongod
EOF
}
"""
**Anti-pattern:** Manually creating instances through the AWS Console or other WebUI, leading to configuration drift.
## 2. CI/CD Pipelines
### Standard
Implement CI/CD pipelines for MongoDB schema changes, configuration updates, and application deployments.
**Do This:**
* Use CI/CD tools like Jenkins, GitLab CI, CircleCI, or GitHub Actions.
* Automate unit and integration tests. These will largely focus on your application not specific MongoDB server features.
* Include schema validation in CI pipelines.
* Implement automated rollback procedures.
**Don't Do This:**
* Deploy schema changes directly to production without testing.
* Manually update configurations without version control.
**Why:** CI/CD pipelines automate deployment, reduce errors, and improve the speed of deployments. This enables faster iterations and quicker responses to changing business needs.
**Example (GitHub Actions):**
"""yaml
name: MongoDB Schema Validation
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Node.js
uses: actions/setup-node@v3
with:
node-version: '18.x' # Adjust the version
- name: Install dependencies
run: npm install joi # Example validation package
- name: Run schema validation
run: node ./scripts/validate-schema.js # Example
env:
MONGO_URI: ${{ secrets.MONGO_URI }}
"""
**Anti-pattern:** Manually running db.collection.validate() and hoping for the best. This should be automated and testable via CI/CD.
## 3. Monitoring and Alerting
### Standard
Implement comprehensive monitoring and alerting for MongoDB instances.
**Do This:**
* Use monitoring tools like MongoDB Atlas, Prometheus + Grafana, Datadog, or New Relic.
* Monitor key metrics such as CPU utilization, memory usage, disk I/O, connection count, and query performance.
* Set up alerts for critical events such as high latency, connection errors, replication lag, and low disk space. Use severity levels to avoid alert fatigue.
* Monitor oplog size and replication lag. This is useful for debugging replication issues and estimating time to resync a secondary.
**Don't Do This:**
* Rely solely on manual checks or logs.
* Ignore performance degradation or error logs.
**Why:** Monitoring provides visibility into the state of your MongoDB deployment, allowing you to proactively identify and resolve performance issues and avoid downtime.
**Example (Prometheus):**
"""yaml
# prometheus.yml
scrape_configs:
- job_name: 'mongodb'
static_configs:
- targets: ['mongodb-server:9000'] # Ensure this the MongoDB Exporter Port
"""
You will also need the MongoDB exporter which collects MongoDB stats and serves them in a format Prometheus understands:
"""bash
wget https://github.com/percona/mongodb_exporter/releases/download/v0.38.0/mongodb_exporter-0.38.0.linux-amd64.tar.gz # Check for latest version
tar xvf mongodb_exporter-0.38.0.linux-amd64.tar.gz
cd mongodb_exporter-0.38.0.linux-amd64
./mongodb_exporter --mongodb.uri="mongodb://user:password@mongodb-server:27017"
"""
**Anti-pattern:** Only checking CPU usage or basic connectivity. Ensure you are monitoring replication lag, connection counts, slow query logs, and oplog sizes.
## 4. Security Hardening
### Standard
Implement security best practices to protect MongoDB instances.
**Do This:**
* Enable authentication and authorization using strong passwords.
* Use TLS/SSL encryption for communication.
* Restrict network access using firewalls.
* Regularly patch MongoDB software to address security vulnerabilities.
* Implement role-based access control (RBAC).
* Use field-level encryption for sensitive data.
* Audit log access and modifications.
**Don't Do This:**
* Use default credentials.
* Expose MongoDB instances directly to the internet without any security measures.
* Store sensitive data in plain text.
* Grant excessive privileges to users.
**Why:** Security measures protect your data from unauthorized access and prevent data breaches.
**Example (MongoDB Configuration):**
"""yaml
# mongod.conf
security:
authorization: enabled
net:
port: 27017
bindIp: 127.0.0.1,
tls:
mode: requireTLS
certificateKeyFile: /etc/ssl/mongodb.pem
setParameter:
enableLocalhostAuthBypass: false
"""
**Anti-pattern:** Relying solely on network security (firewalls) and ignoring authentication. Internal breaches are a major threat.
## 5. Backup and Restore
### Standard
Implement a robust backup and restore strategy for MongoDB.
**Do This:**
* Regularly back up your MongoDB data.
* Use consistent backups of all shards in a sharded cluster using "mongodump" or MongoDB Atlas.
* Store backups offsite or in a separate data center.
* Test backup restoration procedures regularly.
* Use incremental backups to minimize backup time.
**Don't Do This:**
* Rely solely on manual snapshots.
* Store backups on the same server as the MongoDB instance.
* Fail to test the integrity and recoverability of backups.
**Why:** Backups protect against data loss from hardware failures, software bugs, or human errors.
**Example (Backup using "mongodump"):**
"""bash
mongodump --host --port --username --password --authenticationDatabase admin --out /path/to/backup
"""
**Example (Restore using "mongorestore"):**
"""bash
mongorestore --host --port --username --password --authenticationDatabase admin --drop /path/to/backup
"""
**Anti-pattern:** Having a backup schedule but never testing the restoration process. Regularly test recovering recent data.
## 6. Capacity Planning and Scaling
### Standard
Plan and scale your MongoDB deployment based on projected growth and performance requirements.
**Do This:**
* Monitor resource utilization and performance metrics.
* Estimate future growth.
* Scale horizontally by adding more shards to a sharded cluster or more secondaries to a replica set.
* Consider using MongoDB Atlas for managed scaling.
* Regularly review your infrastructure.
**Don't Do This:**
* Wait until performance suffers before scaling.
* Over-provision compute resources wasting money.
**Why:** Scaling ensures that your MongoDB deployment can handle increasing workloads and maintain performance.
**Example (Scaling with MongoDB Atlas):** Use MongoDB Atlas UI or API to add more shards to a cluster.
**Anti-pattern:** Aggressively scaling up your instances vertically (e.g. upgrading RAM). First optimize queries and schema.
## 7. Schema Evolution and Management
### Standard
Manage schema changes in a controlled and documented manner.
**Do This:**
* Use schema validation and validation rules.
* Deploy schema changes using CI/CD.
* Maintain a record of schema changes.
* Use online schema changes where possible to avoid downtime.
**Don't Do This:**
* Apply schema changes directly in production.
* Make ad-hoc schema changes without documentation.
**Why:** Controlled schema evolution prevents data inconsistencies and application errors.
**Example (Schema Validation):**
"""javascript
db.createCollection("products", {
validator: {
$jsonSchema: {
bsonType: "object",
required: [ "name", "price", "quantity" ],
properties: {
name: {
bsonType: "string",
description: "must be a string and is required"
},
price: {
bsonType: "double",
description: "must be a double and is required"
},
quantity: {
bsonType: "int",
description: "must be an integer and is required"
}
}
}
},
validationAction: "warn",
validationLevel: "moderate"
})
"""
**Anti-pattern:** Dropping entire collections and recreating them which can lead to severe unreliability and data loss.
## 8. Networking
### Standard
Implement a strategy for networking between MongoDB instances, and also between application servers and the MongoDB cluster.
**Do This:**
* Use a virtual private cloud (VPC) to isolate MongoDB servers within your infrastructure.
* Use a firewall to restrict access to the MongoDB ports (default: 27017) from untrusted networks.
* Consider using a bastion host to limit SSH access to MongoDB instances.
* Ensure DNS resolution is reliable for all instances.
**Don't Do This:**
* Expose MongoDB ports directly to the public internet.
* Store MongoDB credentials or keys on application servers.
**Why:** Network segmentation and firewalls are essential for security. Poor network setup is a common source of downtime.
**Example (Security Group):**
An example AWS Security Group configuration would only allow access from your application servers on the MongoDB port, and SSH access only from specific, trusted IPs.
**Anti-pattern:** Configuring servers which can connect to the MongoDB instance, but then failing to allow egress traffic back to the calling server. Forgetting about return traffic is a common mistake.
## 9. Performance Tuning
### Standard
Optimize database and query performance using best practices.
**Do This:**
* Analyze slow query logs to identify performance bottlenecks.
* Use indexes where necessary.
* Optimize query shapes and use projections to return only necessary data.
* Keep statistics up-to-date using "db.collection.stats()".
* Monitor disk I/O and RAM usage.
* Tune connection pool settings.
**Don't Do This:**
* Create indexes without understanding query patterns.
* Use the wildcard "*" in "explain()" output.
* Ignore slow query warnings.
**Why:** Proper performance tuning ensures that your MongoDB deployment can handle the load efficiently and provide quick response times.
**Example (Index Creation):**
"""javascript
db.collection.createIndex( { "field": 1 } )
"""
**Anti-pattern:** Creating unnecessary indexes. Too many indexes can negatively impact write performance.
## 10. Automation and Tooling
### Standard
Use automation and tooling to streamline MongoDB operations.
**Do This:**
* Use scripts (e.g., Python, Bash) to automate repetitive tasks.
* Use configuration management tools (e.g., Ansible, Chef, Puppet) to manage server configurations.
* Utilize MongoDB's built-in automation features to streamline specific tasks.
**Don't Do This:**
* Rely on manual, error-prone processes.
* Reinvent the wheel; use existing tools and libraries where possible.
**Why:** Automation reduces manual effort, improves accuracy, and enables faster incident response.
**Example (Automated User Creation Script - Python):**
"""python
import pymongo
def create_user(mongo_uri, db_name, username, password, roles):
client = pymongo.MongoClient(mongo_uri)
db = client[db_name]
db.command("createUser", username, pwd=password, roles=roles)
client.close()
if __name__ == "__main__":
mongo_uri = "mongodb://admin:password@localhost:27017/"
db_name = "mydatabase"
username = "myuser"
password = "mypassword"
roles = [{"role": "readWrite", "db": db_name}]
create_user(mongo_uri, db_name, username, password, roles)
print("User created successfully.")
"""
**Anti-pattern:** Not automating tasks such as backups, user creation, and index creation. These should be scripted.
By adhering to these Deployment and DevOps standards, you can ensure that your MongoDB deployments are reliable, secure, and optimized for performance, scalability, and maintainability.
danielsogl
Created Mar 6, 2025
This guide explains how to effectively use .clinerules
with Cline, the AI-powered coding assistant.
The .clinerules
file is a powerful configuration file that helps Cline understand your project's requirements, coding standards, and constraints. When placed in your project's root directory, it automatically guides Cline's behavior and ensures consistency across your codebase.
Place the .clinerules
file in your project's root directory. Cline automatically detects and follows these rules for all files within the project.
# Project Overview project: name: 'Your Project Name' description: 'Brief project description' stack: - technology: 'Framework/Language' version: 'X.Y.Z' - technology: 'Database' version: 'X.Y.Z'
# Code Standards standards: style: - 'Use consistent indentation (2 spaces)' - 'Follow language-specific naming conventions' documentation: - 'Include JSDoc comments for all functions' - 'Maintain up-to-date README files' testing: - 'Write unit tests for all new features' - 'Maintain minimum 80% code coverage'
# Security Guidelines security: authentication: - 'Implement proper token validation' - 'Use environment variables for secrets' dataProtection: - 'Sanitize all user inputs' - 'Implement proper error handling'
Be Specific
Maintain Organization
Regular Updates
# Common Patterns Example patterns: components: - pattern: 'Use functional components by default' - pattern: 'Implement error boundaries for component trees' stateManagement: - pattern: 'Use React Query for server state' - pattern: 'Implement proper loading states'
Commit the Rules
.clinerules
in version controlTeam Collaboration
Rules Not Being Applied
Conflicting Rules
Performance Considerations
# Basic .clinerules Example project: name: 'Web Application' type: 'Next.js Frontend' standards: - 'Use TypeScript for all new code' - 'Follow React best practices' - 'Implement proper error handling' testing: unit: - 'Jest for unit tests' - 'React Testing Library for components' e2e: - 'Cypress for end-to-end testing' documentation: required: - 'README.md in each major directory' - 'JSDoc comments for public APIs' - 'Changelog updates for all changes'
# Advanced .clinerules Example project: name: 'Enterprise Application' compliance: - 'GDPR requirements' - 'WCAG 2.1 AA accessibility' architecture: patterns: - 'Clean Architecture principles' - 'Domain-Driven Design concepts' security: requirements: - 'OAuth 2.0 authentication' - 'Rate limiting on all APIs' - 'Input validation with Zod'
# State Management Standards for MongoDB This document outlines the standards and best practices for managing application state with MongoDB. State management encompasses how data is stored, accessed, modified, and synchronized across different parts of an application. Effective state management ensures data consistency, improves application performance, and simplifies development workflows. This rule focuses specifically on principles applicable to MongoDB and differentiates between good code and exceptional solutions within the MongoDB ecosystem. ## 1. Overview of State Management in MongoDB Applications Managing state in MongoDB applications requires understanding how to leverage MongoDB's features to maintain data integrity and optimize application performance. Consider the following concepts: * **Data Modeling:** Designing schemas that accurately reflect the relationships and structure of your application's data. * **Atomicity:** Ensuring that operations are performed as a single, indivisible unit to prevent partial updates. * **Consistency:** Maintaining data integrity by enforcing constraints and validation rules. * **Isolation:** Preventing concurrent operations from interfering with each other. * **Durability:** Guaranteeing that once an operation is committed, it remains persistent even in the event of system failures. MongoDB provides the mechanisms for managing state directly, which includes transactions, schema validation, and change streams, but application-level state management requires judicious decisions about when and how to trigger these mechanisms. ## 2. Data Modeling and Schema Design Proper data modeling is fundamental to effective state management. A well-designed schema ensures data consistency, simplifies queries, and optimizes performance. ### 2.1. Standards for Schema Design * **Do This:** Use embedded documents for one-to-one and one-to-few relationships to reduce the need for joins and improve query performance. * **Why:** Embedding reduces the number of database operations required to retrieve related data. """javascript // Example: Embedding address information within a user document { _id: ObjectId(), username: "johndoe", email: "john.doe@example.com", address: { street: "123 Main St", city: "Anytown", zip: "12345" } } """ * **Do This:** Use references (dbrefs or manual references) for one-to-many and many-to-many relationships to avoid document growth and improve scalability. * **Why:** References allow you to link related documents without duplicating data. """javascript // Example: Using manual references to link a user to their orders // User document { _id: ObjectId("user123"), username: "johndoe", email: "john.doe@example.com" } // Order document { _id: ObjectId(), userId: ObjectId("user123"), // Reference to the user document orderDate: ISODate("2024-01-01T00:00:00Z"), items: [...] } """ *
# Performance Optimization Standards for MongoDB This document outlines coding standards and best practices for optimizing performance in MongoDB applications. These guidelines are designed to improve application speed, responsiveness, and resource utilization. They align with the latest MongoDB features and capabilities and aim to create maintainable, efficient, and scalable solutions. ## 1. Schema Design and Data Modeling ### 1.1. Choosing the Right Data Model * **Do This:** Carefully evaluate the one-to-many and many-to-many relationships in your data and choose the data model that best reflects the application's read and write patterns. Consider embedding, referencing, or a hybrid approach. * **Don't Do This:** Blindly normalize all data, which can lead to excessive joins (lookups) and poor performance in MongoDB. * **Why:** MongoDB excels when related data can be accessed in a single document. Minimizing the number of queries reduces latency. * **Example:** For a blog application, embed comments within the post document. """javascript // Embedded comments in a post document { _id: ObjectId("..."), title: "Optimizing MongoDB Performance", content: "...", comments: [ { author: "John Doe", text: "Great post!", date: ISODate("...") }, { author: "Jane Smith", text: "Very informative.", date: ISODate("...") } ] } """ ### 1.2. Data Size and Document Structure * **Do This:** Keep document sizes within reasonable limits (ideally, under 16MB). Avoid excessively large arrays or deeply nested structures. * **Don't Do This:** Store large binary files or multimedia content directly within the document. Use GridFS for these scenarios instead. * **Why:** Large documents can impact indexing performance and network transfer times. Extremely deep nesting can slow down query processing. * **GridFS Example:** Storing a large image file. """javascript // Uploading a file using GridFS const { GridFSBucket } = require('mongodb'); const fs = require('fs'); async function uploadFile(db, filePath, filename) { const bucket = new GridFSBucket(db, { bucketName: 'images' }); const uploadStream = bucket.openUploadStream(filename); fs.createReadStream(filePath).pipe(uploadStream); uploadStream.on('finish', () => { console.log('File uploaded successfully!'); }); } """ ### 1.3 Atomicity * **Do This:** Use transactions for operations that require atomicity across multiple documents, ensuring all changes are applied or none at all. * **Don't Do This:** Rely on application-level logic for atomicity, as this can lead to data inconsistencies. * **Why:** Transactions guarantee ACID properties, which are crucial for data integrity in complex operations. * **Example (MongoDB 4.0+):** """javascript const session = client.startSession(); try { session.startTransaction(); const coll1 = client.db("mydb").collection("inventory"); const coll2 = client.db("mydb").collection("customers"); await coll1.updateOne({ _id: 1 }, { $inc: { qty: -1 } }, { session }); await coll2.updateOne({ _id: 123 }, { $inc: { points: 10 } }, { session }); await session.commitTransaction(); console.log("Transaction committed successfully."); } catch (error) { await session.abortTransaction(); console.error("Transaction aborted due to error:", error); } finally { session.endSession(); } """ ## 2. Indexing Strategies ### 2.1. Index Selection and Creation * **Do This:** Create indexes on fields frequently used in queries, sort operations, and aggregations. Use the "explain()" method to analyze query performance and identify missing indexes. * **Don't Do This:** Create indexes indiscriminately, as each index adds overhead to write operations. Regularly review and remove unused indexes. * **Why:** Indexes significantly speed up query execution by allowing MongoDB to locate documents more quickly. * **Example:** Creating an index on the "userId" field for faster user lookups. """javascript db.collection('users').createIndex({ userId: 1 }); """ ### 2.2. Index Types * **Do This:** Use appropriate index types for your data and query patterns: * **Single Field Index:** Indexing a single field. * **Compound Index:** Indexing multiple fields (order matters!). * **Multikey Index:** Indexing array fields. * **Text Index:** For full-text search. * **Geospatial Index:** For geospatial queries. * **Don't Do This:** Rely solely on the default "_id" index for all queries. * **Why:** Different index types are optimized for specific query types. Choosing the correct index type maximizes performance. * **Example:** Creating a compound index for sorting and filtering. """javascript db.collection('products').createIndex({ category: 1, price: -1 }); // Sort by price descending within each category """ ### 2.3. Indexing Arrays * **Do This:** Use multikey indexes to efficiently query array fields. * **Don't Do This:** Underestimate the performance implications of querying arrays without proper indexing. * **Why:** Multikey indexes allow MongoDB to efficiently locate documents where the specified array field contains a specific value. * **Example:** Indexing the "tags" array in a blog post document. """javascript db.collection('posts').createIndex({ tags: 1 }); """ ### 2.4. Partial Indexes * **Do This:** Use partial indexes to index only a subset of documents based on a filter expression, reducing index size and improving write performance. * **Don't Do This:** Create indexes on all documents, even if a significant portion of them are rarely queried. * **Why:** Partial indexes optimize index size and write performance by excluding irrelevant documents. * **Example:** Creating a partial index on active users. """javascript db.collection('users').createIndex( { lastLogin: 1 }, { partialFilterExpression: { status: 'active' } } ); """ ### 2.5. Covered Queries * **Do This:** Strive for covered queries where MongoDB can retrieve all necessary data directly from the index without accessing the document itself. * **Don't Do This:** Assume that an index automatically covers a query; verify using "explain()". * **Why:** Covered queries are significantly faster because they eliminate the need for disk I/O. * **Example:** Considering a "products" collection with "category", "price", and "name" fields: """javascript db.collection('products').createIndex({ category: 1, price: 1, name: 1 }); // Covered query: only retrieves fields present in the index db.collection('products').find({ category: "electronics", price: { $lt: 100 } }, { projection: { category: 1, price: 1, name: 1, _id: 0 } }).explain("executionStats"); """ In the "explain" output, check for "coveredQuery" and "indexOnly" being true. ## 3. Query Optimization ### 3.1. Query Selectivity * **Do This:** Write queries that are highly selective, targeting a small subset of documents. * **Don't Do This:** Perform full collection scans with broad queries that return a large number of documents. * **Why:** Selective queries minimize the amount of data MongoDB needs to process, improving performance. * **Example:** Using specific criteria in a "find()" operation. """javascript db.collection('orders').find({ userId: "123", status: "pending" }); """ ### 3.2. Projection * **Do This:** Use projection to return only the fields required by the application, reducing network traffic and memory usage. * **Don't Do This:** Retrieve the entire document ("{}") if only a few fields are needed. * **Why:** Projection reduces the amount of data transferred over the network and processed by the client. * **Example:** Retrieving only the "name" and "email" fields from a "users" collection. """javascript db.collection('users').find({ status: "active" }, { projection: { name: 1, email: 1, _id: 0 } }); """ ### 3.3. Limit and Skip * **Do This:** Use "limit()" to restrict the number of documents returned and "skip()" for pagination. Be mindful of the "skip()" performance implications with large offsets. Use more performant pagination methods such as range-based queries when possible. * **Don't Do This:** Use "skip()" with large offsets, as it can become inefficient, especially on large collections. * **Why:** "limit()" reduces the amount of data transferred, while "skip()" allows for pagination but becomes slow with large offsets as it still has to traverse the skipped records. * **Example:** Implementing pagination with "limit()" and "skip()". """javascript const page = 2; const pageSize = 10; db.collection('products') .find({}) .skip((page - 1) * pageSize) .limit(pageSize) .toArray(); """ * **Alternative Pagination with Range Queries (more efficient):** If you have a field that can be used for ordering (e.g., "_id", "createdAt"), you can use range queries for more efficient pagination, especially for large datasets: """javascript // First page db.collection('products').find({}).sort({ createdAt: 1 }).limit(pageSize).toArray(); // Subsequent pages - assuming you have stored the createdAt value of the last item of the previous page const lastCreatedAt = new Date('2024-01-01T12:00:00Z'); // Replace with the actual value. db.collection('products').find({ createdAt: { $gt: lastCreatedAt } }).sort({ createdAt: 1 }).limit(pageSize).toArray(); """ ### 3.4. Aggregation Pipeline Optimization * **Do This:** Structure aggregation pipelines to filter data as early as possible using "$match" to reduce the amount of data processed in subsequent stages. Use "$project" to reshape or reduce the size of documents as needed throughout the pipeline. Utilize indexes for stages that support them, particularly "$match" and "$sort". * **Don't Do This:** Perform expensive operations like "$unwind" or "$group" on large unfiltered datasets. Accumulate large amounts of data in memory within pipeline stages without reducing it effectively. * **Why:** Optimizing the order and operations in an aggregation pipeline can significantly reduce resource consumption and improve performance, especially for complex data transformations. Filtering early reduces the amount of data the pipeline needs to shuffle and process. * **Example:** Aggregating order data to calculate total sales per product category, optimized with early filtering: """javascript db.collection('orders').aggregate([ { $match: { // Filter early to reduce data processed in later stages orderDate: { $gte: new Date('2023-01-01'), $lt: new Date('2024-01-01') } } }, { $unwind: "$items" // Deconstruct the items array to process each item }, { $lookup: { // Enrich each item with product details from the products collection from: "products", localField: "items.productId", foreignField: "_id", as: "productDetails" } }, { $unwind: "$productDetails" // Deconstruct the productDetails array for access }, { $group: { // Group by product category to calculate total sales _id: "$productDetails.category", totalSales: { $sum: { $multiply: ["$items.quantity", "$productDetails.price"] } } } }, { $project: { // Reshape the output to show category and total sales category: "$_id", totalSales: 1, _id: 0 } }, { $sort: { totalSales: -1 } // Sort by total sales in descending order } ]).toArray(); """ * Adding an index "{ orderDate: 1 } " will improve performance when using the "$match" stage. ## 4. Data Access Patterns and Caching ### 4.1. Connection Pooling * **Do This:** Implement connection pooling to reuse database connections, reducing the overhead of establishing new connections for each operation. Configure an adequate pool size based on the application's concurrency. * **Don't Do This:** Create a new database connection for every operation, as this will significantly increase latency and resource consumption. * **Why**: Establishing database connections is a resource-intensive process. Connection pooling allows applications to efficiently reuse existing connections. * **Example (Node.js):** """javascript const { MongoClient } = require('mongodb'); const uri = "mongodb://user:password@host:port/database"; // Replace with your connection string const client = new MongoClient(uri, { maxPoolSize: 100, // Adjust based on needs minPoolSize: 10, // Other pool options per driver }); async function run() { try { await client.connect(); const db = client.db("mydb"); // ... perform operations using 'db' } finally { // Ensures that the client will close when you finish/error // await client.close(); // Keep the connection open across multiple function calls across app lifetime. } } run().catch(console.dir); """ ### 4.2. Caching Strategies * **Do This:** Implement caching at various levels (application, database, or dedicated caching layer like Redis) to store frequently accessed data. Use appropriate cache invalidation strategies to ensure data consistency. Implement TTL (Time-To-Live) based caching for data that becomes stale after a certain period. * **Don't Do This:** Cache data indefinitely without considering data changes or consistency requirements. Rely solely on the database for all data access without leveraging caching. * **Why:** Caching reduces the load on the database and improves application response times by serving data from memory. * **Example (basic in-memory caching in Node.js):** """javascript const cache = new Map(); async function getUser(userId) { if (cache.has(userId)) { console.log("Serving from cache"); return cache.get(userId); } const user = await db.collection('users').findOne({ _id: userId }); if (user) { cache.set(userId, user); console.log("Fetched from DB and cached"); } return user; } """ * **Example (using a TTL):** """javascript const ttlCache = require( 'ttl-cache' ) const myCache = new ttlCache({ ttl: 60 * 1000 }) //60 seconds async function getUser(userId) { if (myCache.has(userId)) { console.log("Serving from TTL cache"); return myCache.get(userId); } const user = await db.collection('users').findOne({ _id: userId }); if (user) { myCache.set(userId, user); console.log("Fetched from DB and cached"); } return user; } """ ### 4.3. Read Preference * **Do This:** Configure read preference settings (e.g., "primaryPreferred", "secondaryPreferred") based on the application's read consistency requirements and deployment architecture. * **Don't Do This:** Always read from the primary, especially in read-heavy applications, which can overload the primary node. * **Why:** Read preference allows you to distribute read operations across replica set members, improving read scalability and reducing load on the primary. * **Example (Node.js driver):** """javascript const { MongoClient, ReadPreference } = require('mongodb'); const uri = "mongodb://user:password@host1:port,host2:port/?replicaSet=myReplicaSet"; async function readFromSecondary(db) { const collection = db.collection('myCollection').withReadPreference(ReadPreference.SECONDARY_PREFERRED); const doc = await collection.findOne({}); return doc; } """ ## 5. Monitoring and Profiling * **Do This:** Regularly monitor MongoDB performance metrics using tools like MongoDB Atlas Performance Advisor, "mongostat", "mongotop", or the MongoDB Profiler. Enable the database profiler to identify slow-running queries and operations. * **Don't Do This:** Neglect monitoring and profiling, as this can lead to unnoticed performance bottlenecks. * **Why:** Monitoring and profiling provide valuable insights into database performance, allowing you to identify and address performance issues proactively. * **Example (enabling the MongoDB Profiler):** """javascript db.setProfilingLevel(2); // Log all operations slower than the slowms threshold (default 100ms). Level 0 is off, level 1 logs slow operations """ * **Example (using Atlas Performance Advisor)-** Atlas provides query suggestions, based on your workload, to improve performance. ## 6. Hardware and Configuration ### 6.1. Storage Engine * **Do This:** Use the WiredTiger storage engine, which is the default and generally recommended storage engine for most workloads due to its compression and concurrency features. * **Don't Do This:** Continue to use the older MMAPv1 storage engine unless there is a specific reason to do so, as it lacks the performance optimizations of WiredTiger. * **Why**: WiredTiger provides significant improvements in performance and storage efficiency over MMAPv1. * The WiredTiger storage engine supports document-level concurrency control, compression, and encryption at rest. ### 6.2. Memory and Disk Configuration * **Do This:** Provide sufficient RAM to accommodate the working set (frequently accessed data) and indexes. Use fast storage (SSD) for optimal performance. * **Don't Do This:** Underestimate memory and disk requirements, as this can lead to disk I/O bottlenecks and poor performance. * **Why:** Adequate memory and fast storage are crucial for minimizing disk I/O and maximizing performance. ### 6.3. Sharding * **Do This:** Consider sharding for very large datasets or high-write workloads to distribute data and load across multiple servers. Choose the shard key carefully based on query patterns and data distribution. * **Don't Do This:** Implement sharding prematurely without assessing the need for it, as it adds complexity to the architecture. * **Why:** Sharding allows you to scale horizontally by distributing data across multiple servers. ## 7. Security Considerations * **Do This:** Enable authentication, authorization, and encryption to protect sensitive data. Follow MongoDB's security best practices to minimize the risk of security vulnerabilities. Rotate database credentials regularly. * **Don't Do This:** Expose MongoDB instances directly to the internet without proper security measures. Store sensitive data in plain text. * **Why:** Security is paramount, and failure to secure MongoDB can result in data breaches and other serious consequences. ## 8. Language and Version * **Do This:** Use the latest stable version of MongoDB and the official drivers for your programming language of choice. Stay informed about new features and performance improvements in each release. * **Don't Do This:** Stay on outdated versions of MongoDB or drivers, as you will miss out on performance optimizations and security fixes. * **Why:** Newer versions of MongoDB and drivers often include performance optimizations and new features that can significantly improve application performance. Staying current also ensures access to the latest security patches.
# Security Best Practices Standards for MongoDB This document outlines the security best practices for MongoDB development. Following these standards will help protect against common vulnerabilities, promote secure coding patterns, and ensure the overall security of your MongoDB applications. ## 1. Authentication and Authorization ### 1.1. Enable Authentication and Authorization **Standard:** Always enable authentication and authorization in your MongoDB deployments. Relying on default settings without authentication is a significant security risk. * **Do This:** Enable authentication and authorization using the "--auth" option in "mongod" or "mongos" configurations or within the configuration file. * **Don't Do This:** Never run MongoDB instances without authentication enabled, especially in production environments. **Why:** Unauthenticated access allows anyone to read or modify data. Authentication ensures that only authorized users can access the MongoDB instance. **Code Example (Configuration File):** """yaml security: authorization: enabled """ **Anti-Pattern:** Forgetting to enable authentication after initial setup. ### 1.2. Use Strong Authentication Mechanisms **Standard:** Employ strong authentication mechanisms and avoid weak or deprecated methods. * **Do This:** Use SCRAM-SHA-256 as the default authentication mechanism and use x.509 certificate based authentication for enhanced security. For user management via "mongosh", ensure you're connecting with a secure and encrypted connection. Consider using MongoDB Atlas for easier credential management. * **Don't Do This:** Avoid using the deprecated MONGODB-CR authentication mechanism. Never store passwords in plain text. **Why:** SCRAM-SHA-256 provides better protection against password cracking compared to older mechanisms. x.509 certificates establish trust at the network level. **Code Example (Creating a User with SCRAM-SHA-256):** """javascript // Using mongosh db.createUser( { user: "myUser", pwd: passwordPrompt(), // Or a securely generated password roles: [ { role: "readWrite", db: "mydb" } ], mechanisms: [ "SCRAM-SHA-256" ] } ) """ **Anti-Pattern:** Using default or easily guessable passwords. ### 1.3. Role-Based Access Control (RBAC) **Standard:** Implement RBAC to control access to data and operations within the database. * **Do This:** Define granular roles with specific privileges and assign users to these roles based on their responsibilities. Use built-in roles when appropriate or create custom roles for specialized needs. * **Don't Do This:** Avoid granting overly permissive roles (e.g., "dbOwner") to users who only require limited access. **Why:** RBAC limits the potential damage from compromised accounts and enforces the principle of least privilege. **Code Example (Creating a Custom Role):** """javascript db.createRole( { role: "reportReader", privileges: [ { resource: { db: "reports
# Core Architecture Standards for MongoDB This document outlines the core architectural standards to be followed when developing and maintaining MongoDB applications. These standards are designed to promote maintainability, performance, security, and scalability, leveraging the latest features and best practices of MongoDB. Following these guidelines will ensure consistency across projects and facilitate collaboration, especially when using AI coding assistants. ## 1. Overall Architectural Principles ### 1.1 Monorepo vs. Polyrepo **Standard:** Favor a monorepo structure for tightly coupled microservices or components within a single product or domain. Use polyrepos for independent services or libraries with less frequent interaction. * **Do This:** Implement a monorepo if your application consists of several microservices that frequently interact and are deployed together. * **Don't Do This:** Use a polyrepo if services have intricate dependencies managed and released together. **Why:** * **Monorepo:** Simplifies dependency management, code reuse, and coordinated refactoring. Facilitates atomic changes across multiple components. * **Polyrepo:** Provides clear ownership and isolation for independent components, reducing the risk of unintended side effects during development. **Example:** * Monorepo Structure (Example for a social media app): """ / ├── services/ │ ├── user-service/ │ │ ├── src/ │ │ ├── Dockerfile │ ├── post-service/ │ │ ├── src/ │ │ ├── Dockerfile │ ├── notification-service/ │ │ ├── src/ │ │ ├── Dockerfile ├── libs/ │ ├── common-utils/ │ │ ├── src/ """ * Polyrepo Structure (Three independent repositories): * "user-service" repository * "post-service" repository * "notification-service" repository ### 1.2 Layered Architecture **Standard:** Structure applications into well-defined layers (e.g., presentation, application/service, domain/business logic, data access/persistence). * **Do This:** Separate concerns by clearly defining the responsibility of each layer. Use dependency injection to promote loose coupling. * **Don't Do This:** Create monolithic blocks of code that mix presentation logic with database interactions. **Why:** Layered architecture enhances maintainability, testability, and reusability. Changes in one layer have minimal impact on other layers. **Example:** """python # data_access_layer.py from pymongo import MongoClient class UserRepository: def __init__(self, connection_string, database_name): self.client = MongoClient(connection_string) self.db = self.client[database_name] self.users = self.db.users def get_user_by_id(self, user_id): return self.users.find_one({"_id": user_id}) # business_logic_layer.py class UserService: def __init__(self, user_repository): self.user_repository = user_repository def get_user_profile(self, user_id): user = self.user_repository.get_user_by_id(user_id) if user: return { "user_id": str(user["_id"]), "username": user["username"], "email": user["email"] } else: return None # presentation_layer.py (e.g., Flask route) from flask import Flask, jsonify # Assuming data_access_layer and business_logic_layer are in the same dir from data_access_layer import UserRepository from business_logic_layer import UserService app = Flask(__name__) # Configuration (replace with your actual values) CONNECTION_STRING = "mongodb://localhost:27017/" DATABASE_NAME = "mydatabase" user_repository = UserRepository(CONNECTION_STRING, DATABASE_NAME) user_service = UserService(user_repository) @app.route("/users/<user_id>", methods=["GET"]) def get_user(user_id): user_profile = user_service.get_user_profile(user_id) if user_profile: return jsonify(user_profile) else: return jsonify({"message": "User not found"}), 404 if __name__ == "__main__": app.run(debug=True) """ ### 1.3 Modular Design **Standard:** Decompose the system into independent, reusable modules. * **Do This:** Create modules with well-defined interfaces and minimal dependencies on other modules. * **Don't Do This:** Build tightly coupled modules with extensive shared state. **Why:** Modularity promotes code reuse, reduces complexity, and simplifies maintenance. **Example:** * Modular Python structure: """ / ├── modules/ │ ├── authentication/ │ │ ├── __init__.py │ │ ├── auth_service.py │ │ ├── auth_repository.py │ ├── user_management/ │ │ ├── __init__.py │ │ ├── user_service.py │ │ ├── user_repository.py """ ### 1.4 Event-Driven Architecture **Standard:** Employ event-driven architecture for decoupled communication between services, using message queues or similar mechanisms. * **Do This:** Use message queues (RabbitMQ, Kafka, or MongoDB change streams) to asynchronously communicate between services. * **Don't Do This:** Rely on direct synchronous calls between services that create tight coupling. **Why:** Event-driven architecture enables scalability, fault tolerance, and flexible integration. **Example:** """python # Producer (User Service) from pymongo import MongoClient client = MongoClient("mongodb://localhost:27017/") db = client["mydatabase"] users = db["users"] def create_user(user_data): result = users.insert_one(user_data) user_id = str(result.inserted_id) # Simulate sending an event print(f"UserCreated Event: User ID - {user_id}") # In real use-case, publish to a message queue return user_id # Consumer (Notification Service - simulates consuming a change event) def handle_user_created_event(user_id): print(f"Notification Service: Sending welcome email to user with ID: {user_id}") # Simulate a user creation user_id = create_user({"username": "testuser", "email": "test@example.com"}) # Simulate the consumption of the event. This would actually be triggered async by the queue/change stream handle_user_created_event(user_id) # Example using MongoDB Change Streams as a consumer from pymongo import MongoClient client = MongoClient("mongodb://localhost:27017/") db = client["mydatabase"] users = db["users"] # Start a change stream with users.watch() as stream: for change in stream: if change['operationType'] == 'insert': user_id = str(change['documentKey']['_id']) print(f"Received insert operation for user ID: {user_id}") # Send welcome email to the user. print(f"Simulating email to user: {user_id}") # The "with users.watch() as stream:" block maintains the connection open # and continues to listen for changes indefinitely unless interrupted or encounter an unrecoverable error # Simulate user creation in another terminal or through a different process to trigger change stream events. """ ### 1.5 Microservices Architecture Considerations **Standard:** Favor Microservices (with Bounded Contexts) for isolating failure domains, scaling specific features independently, and enabling autonomous team development. * **Do This:** Design microservices with clear boundaries based on business domains. Employ lightweight communication protocols like REST or gRPC. * **Don't Do This:** Create large, monolithic services that combine multiple unrelated functions, defeating the purpose of the microservices approach. **Why:** Microservices enable independent deployment, scaling, and technology choices for each service. **Example:** * Microservice Architecture (Ordering Service Example): """ / ├── order-service/ │ ├── src/ │ ├── Dockerfile │ ├── api.py # REST endpoints for order management │ ├── models.py # Order-related data models │ ├── order_processor.py # Business logic for order processing │ ├── requirements.txt / """ Each microservice would have its own corresponding repository and deployment pipeline. The "order-service" might interact with "payment-service" and "shipping-service" via REST APIs or message queues. ## 2. MongoDB Specific Architecture ### 2.1 Schema Design **Standard:** Schema design should prioritize query patterns. Embed related data when read together frequently. Use references for less frequently accessed data. * **Do This:** Embed addresses within customer documents for frequently accessed information. Reference product details in order documents. * **Don't Do This:** Normalize data to the extreme, causing unnecessary joins. Embed excessive array data leading to document growth issues. **Why:** Optimized schema design is crucial for MongoDB performance, minimizing disk I/O and network traffic. **Example:** """json // Embedded: { "_id": ObjectId("..."), "customer_name": "John Doe", "address": { "street": "123 Main St", "city": "Anytown", "zip": "12345" }, "orders": [ //Keep "orders" array reasonably small. { "order_id": ObjectId("..."), "product_id": ObjectId("..."), "quantity": 2 } ] } // Referenced: { "_id": ObjectId("..."), "customer_name": "John Doe", "address_id": ObjectId("...") // Reference to address document } // Separate Address document: { "_id": ObjectId("..."), "street": "123 Main St", "city": "Anytown", "zip": "12345" } """ ### 2.2 Data Modeling Patterns **Standard:** Utilize data modeling patterns such as the Polymorphic pattern, Attribute pattern, and Bucket pattern to optimize data storage and retrieval. * **Do This:** Use the Polymorphic pattern to store different types of products within the same collection. Employ the Bucket pattern to group time-series data into manageable chunks. * **Don't Do This:** Avoid using modeling patterns inappropriately, such as applying the Bucket pattern to non-time-series data. **Why:** Data modeling patterns improve query efficiency and accommodate evolving data structures. **Examples:** * **Polymorphic Pattern:** """json // Products Collection [ { "_id": ObjectId("654321abbced123..."), "productType": "Book", "title": "The MongoDB Handbook", "author": "John Doe", "isbn": "123-4567890" }, { "_id": ObjectId("987654zyxwvu321..."), "productType": "DVD", "title": "MongoDB for Beginners", "director": "Jane Smith", "runtime": 120 } ] """ * **Bucket Pattern (Time Series data for sensor readings)** """json // Reads Collection with the "bucket" field [ { "_id": ObjectId(), "bucket": "2023-11-01", "sensorId": "sensor123", "readings": [ { "timestamp": ISODate("2023-11-01T10:00:00Z"), "value": 22.5 }, { "timestamp": ISODate("2023-11-01T10:01:00Z"), "value": 22.6 } ] }, { "_id": ObjectId(), "bucket": "2023-11-02", "sensorId": "sensor123", "readings": [ { "timestamp": ISODate("2023-11-02T10:00:00Z"), "value": 22.7 }, { "timestamp": ISODate("2023-11-02T10:01:00Z"), "value": 22.8 } ] } ] """ ### 2.3 Indexing Strategy **Standard:** Create indexes to support common query patterns and optimize performance. Follow the ESR (Equality, Sort, Range) rule when defining compound indexes. * **Do This:** Create indexes on fields used in "find()", "sort()", and range queries. Consider using compound indexes when filtering on multiple fields. * **Don't Do This:** Over-index collections which can degrade write performance. Create indexes on fields that are rarely queried. **Why:** Proper indexing significantly reduces query latency and resource consumption. **Examples:** """javascript // Single field index db.collection.createIndex( { "field1": 1 } ) // Compound index (ESR rule) db.collection.createIndex( { "equalityField": 1, "sortField": 1, "rangeField": 1 } ) // Text index db.collection.createIndex( { "field1": "text" } ) """ ### 2.4 Aggregation Pipeline **Standard:** Leverage the aggregation pipeline for complex data transformations and reporting tasks. Optimize pipelines by using indexes efficiently. * **Do This:** Use "$match" early in the pipeline to reduce the amount of data processed. Utilize "$project" to reshape documents and remove unnecessary fields. * **Don't Do This:** Run complex aggregations without considering performance. Avoid using "$lookup" excessively, which can be slow for large datasets (consider denormalization instead). **Why:** The aggregation pipeline provides powerful data processing capabilities directly within MongoDB. **Example:** """javascript db.orders.aggregate([ { $match: { //Stage 1: Filter using an index "status": "active", "order_date": { $gte: ISODate("2023-01-01T00:00:00Z") } } }, { $lookup: { // Stage 2: Join with products collection. Use an index on "products.product_id" from: "products", localField: "product_id", foreignField: "_id", as: "product" } }, { $unwind: "$product" //Stage 3: Deconstruct the product array }, { $group: { // Stage 4: Group by customer and sum the order values _id: "$customer_id", total_spent: { $sum: { $multiply: [ "$product.price", "$quantity" ] } } } }, { $sort: { total_spent: -1 } //Stage 5: Sort by total spent } ]) """ ### 2.5 Change Streams **Standard:** Effectively utilize change streams to react to real-time data changes and build reactive applications. Configure streams according to your application's needs. * **Do This:** Use change streams for auditing, real-time analytics, and triggering notifications based on data modifications. Filter events to reduce overhead. * **Don't Do This:** Neglect error handling within the change stream listener. Overload the change stream with unnecessary event processing. **Example:** """python from pymongo import MongoClient client = MongoClient("mongodb://localhost:27017/") db = client["mydatabase"] collection = db["mycollection"] with collection.watch() as stream: for change in stream: print(f"Change detected: {change}") # Process the change event (e.g., update a cache, send a notification) #To watch only specific events resume_token = None #start from the current point try: with collection.watch(resume_after=resume_token, full_document='updateLookup') as stream: for change in stream: resume_token = stream.resume_token if change['operationType'] == 'update': print(f"Partial Update detected: {change['updateDescription']}") elif change['operationType'] == 'insert': print(f"Insert detected: {change['fullDocument']}") except Exception as e: print(f"Change stream error: {e}") """ ### 2.6 Transactions **Standard:** Use transactions when atomicity is required across multiple operations or documents. Design schemas to minimize the need for complex transactions. * **Do This:** Use multi-document transactions to ensure data consistency in critical operations, such as transferring funds between accounts. * **Don't Do This:** Overuse transactions, which can impact performance. Avoid long-running transactions that hold locks for extended periods. **Example:** """python from pymongo import MongoClient, TransactionOptions client = MongoClient("mongodb://localhost:27017/") db = client["mydatabase"] accounts = db["accounts"] def transfer_funds(from_account_id, to_account_id, amount): with client.start_session() as session: def callback(session): from_account = accounts.find_one({"_id": from_account_id}, session=session) to_account = accounts.find_one({"_id": to_account_id}, session=session) if not from_account or not to_account or from_account["balance"] < amount: raise ValueError("Insufficient funds or invalid accounts") accounts.update_one({"_id": from_account_id}, {"$inc": {"balance": -amount}}, session=session) accounts.update_one({"_id": to_account_id}, {"$inc": {"balance": amount}}, session=session) return True # Indicate success try: session.with_transaction(callback, read_concern=ReadConcern('snapshot'), write_concern=WriteConcern('majority')) print("Transaction completed successfully.") except Exception as e: print(f"Transaction failed: {e}") # Example Usage transfer_funds("account1", "account2", 100) """ ## 3. Technology Stack & Tooling ### 3.1 ODM/ORM Libraries **Standard:** Utilize ODM/ORM libraries like Mongoose (Node.js), MongoEngine (Python) or Morphia (Java) but understand their performance implications. * **Do This:** Use these libraries to simplify data validation, schema management, and object mapping. * **Don't Do This:** Neglect performance optimization. Be mindful of how the ORM translates queries into MongoDB operations. **Why:** These simplify interactions with MongoDB and promote structured coding. **Example (Mongoose - Javascript/Node.js)**: """javascript const mongoose = require('mongoose'); // Define a schema const userSchema = new mongoose.Schema({ username: { type: String, required: true, unique: true }, email: { type: String, required: true }, age: { type: Number, min: 18, max: 120 } }); // Create a model from the schema const User = mongoose.model('User', userSchema); // Example usage const newUser = new User({ username: 'johndoe', email: 'john.doe@example.com', age: 30 }); newUser.save() .then(() => console.log('User created')) .catch(err => console.error(err)); """ ### 3.2 Connection Pooling **Standard:** Implement connection pooling for efficient database access. * **Do This:** Configure connection pooling in your MongoDB driver to reuse connections and reduce overhead. Control the maximum and minimum pool sizes. * **Don't Do This:** Open and close connections frequently, which can drain resources and slow down performance. **Why:** Connection pooling minimizes overhead and utilizes efficient performance. **Example (Python with PyMongo):** """python from pymongo import MongoClient # Configure connection pooling client = MongoClient("mongodb://localhost:27017/", maxPoolSize=50, minPoolSize=10) db = client["mydatabase"] collection = db["mycollection"] # The client automatically manages the connection pool """ ### 3.3 Monitoring and Logging **Standard:** Implement robust monitoring, logging and tracing solutions. * **Do This:** Use MongoDB Atlas, or tools like Grafana, Prometheus or ELK stack for monitoring key metrics. Include detailed logging in your services to track errors and performance. * **Don't Do This:** Ignore database performance metrics. Fail to log errors or slow queries. **Why:** Monitoring allows you to pinpoint potential performance issues or security problems. **Example (Logging slow queries - Javascript/Node.js):** """javascript // Enable profiler to log slow queries (for development/debugging - use with caution on production) db.setProfilingLevel(1, 100); // Log queries slower than 100ms // Retrieve slow queries db.system.profile.find({ millis : { $gt : 100 } }).sort( { ts : -1 } ).limit( 10 ) // Proper logging using a library like Winston/Bunyan is recommended const logger = require('winston'); logger.log('info', 'Query executed', { query: 'db.collection.find({})', duration: 120 }); """ These standards provide a robust foundation for building reliable, scalable, and secure MongoDB applications, improving code clarity, and facilitating easier maintenance by development teams. They should be enforced via code reviews, automated linters, and regular training. AI tools should be configured to adhere to these standards.
# Component Design Standards for MongoDB This document outlines the component design standards for MongoDB development. The goal is to promote the creation of reusable, maintainable, and performant components within MongoDB applications. These standards apply specifically to interactions with MongoDB, including schema design, query construction, data access, and aggregation pipelines. The best practices and modern approaches discussed here are based on the latest versions of MongoDB. ## I. General Principles of Component Design Before diving into MongoDB-specific considerations, it's essential to establish general principles for component design. These principles promote modularity, reusability, and maintainability, which are crucial for building robust applications. ### A. Single Responsibility Principle (SRP) * **Do This:** Ensure that each component has one, and only one, reason to change. For database interactions, this might mean a component is solely responsible for accessing or manipulating a specific collection or a defined subset of fields within a document. * **Don't Do This:** Avoid creating "god" components that handle multiple unrelated tasks. This leads to tight coupling and makes the component difficult to understand, test, and modify. Avoid unnecessary abstraction upfront; adhere to YAGNI ("You Ain't Gonna Need It") and DRY ("Don't Repeat Yourself"). * **Why:** SRP reduces complexity and improves maintainability. Changes in one area are less likely to affect other parts of the system. When creating components focused on database operations, SRP helps isolate issues related to data access and manipulation. ### B. Open/Closed Principle (OCP) * **Do This:** Design components that are open for extension but closed for modification. Achieved through interfaces, abstract classes, or configuration, not through directly modifying source code. * **Don't Do This:** Directly modify the core logic of a component to add new functionality. This can introduce bugs and makes it harder to track changes and revert to previous versions. * **Why:** OCP allows you to add new features without risking the stability of existing code. In a MongoDB context, this could mean using a configuration-driven approach to define query parameters or schema validation rules without altering the core data access logic. ### C. Liskov Substitution Principle (LSP) * **Do This:** Ensure that subtypes (derived classes or implementations) of a component can be used interchangeably with their base type without altering the correctness of the program. * **Don't Do This:** Create subtypes that violate the expectations of the base type. This can lead to unexpected behavior and runtime errors. * **Why:** LSP ensures that polymorphism works as expected and that substituting one component for another does not break the system. In data access patterns, if you define an interface for data retrieval, all implementations of that interface should behave predictably. ### D. Interface Segregation Principle (ISP) * **Do This:** Design interfaces that are specific to the needs of the client. Avoid forcing clients to depend on methods they don't use. * **Don't Do This:** Create large "fat" interfaces that expose a wide range of functionality to all clients. * **Why:** ISP reduces coupling and improves flexibility. Components only depend on the methods they need, making it easier to change or replace individual components without affecting others. In MongoDB, each interface should define the specific operations needed for database interactions for each component. ### E. Dependency Inversion Principle (DIP) * **Do This:** High-level modules should not depend on low-level modules. Both should depend on abstractions. Abstractions should not depend on details. Details should depend on abstractions. * **Don't Do This:** Allow high-level modules to directly depend on low-level modules. This creates tight coupling and makes it difficult to test or replace the low-level modules. * **Why:** DIP promotes loose coupling and improves testability. By depending on abstractions, components become more flexible and easier to adapt to changing requirements. In MongoDB scenarios, this could entail using repositories or data access objects (DAOs), mediating between the rest of the application and the MongoDB driver. ## II. MongoDB-Specific Component Design Here, we apply general component design principles to the specifics of MongoDB development. ### A. Schema Design * **Do This:** Design schemas that align with your application's data access patterns, querying needs, and consistency requirements. Use embedded documents ("$elemMatch"), arrays, and denormalization strategically to optimize read performance and reduce the need for joins. Use schema validation to enforce document structure and data types. Consider shard keys early in the design process if sharding is anticipated. * **Don't Do This:** Create overly normalized schemas that require numerous joins or inefficient queries. Design schemas that mirror relational database designs. Over-rely on schema validation to enforce application-level business rules. * **Why:** Effective schema design directly impacts query performance, storage efficiency, and overall application scalability. Schema validation ensures data integrity and reduces errors. A well-designed schema enables efficient data access and manipulation, reduces the need for complex aggregation pipelines, and simplifies code. """javascript // Example: Schema validation db.createCollection( "contacts", { validator: { $jsonSchema: { bsonType: "object", required: [ "phone", "name", "age", "status" ], properties: { phone: { bsonType: "string", description: "must be a string and match the pattern" }, name: { bsonType: "string", description: "must be a string and is required" }, age: { bsonType: "int", minimum: 0, maximum: 120, description: "must be an integer in [ 0, 120 ] and is required" }, status: { enum: [ "Unknown", "Incomplete", "Complete" ], description: "can only be one of the enum values and is required" } } } }, validationLevel: "moderate", validationAction: "warn" } ) """ ### B. Query Construction * **Do This:** Use the MongoDB query API effectively to retrieve data efficiently. Utilize indexes to speed up queries. Construct queries programmatically to avoid string concatenation and potential injection vulnerabilities. Leverage projection to retrieve only the necessary fields. Use aggregation pipelines for complex data transformations and analytics. Use "explain()" to view the query plan and identify performance bottlenecks. * **Don't Do This:** Construct queries using string concatenation, which can lead to NoSQL injection vulnerabilities. Over-index collections, as each index adds overhead to write operations. Retrieve all fields from documents when only a subset is needed. Neglect using aggregation pipelines for reporting and analytics. * **Why:** Efficient query construction is crucial for application performance. Indexes can dramatically speed up queries, while projections reduce network traffic and memory usage. Aggregation pipelines enable powerful data analysis capabilities directly within the database. Avoiding manual string construction for queries prevents security vulnerabilities. """javascript // Example: Programmatic query construction with projection const query = { status: "active", "profile.age": { $gt: 18 } }; const projection = { _id: 0, name: 1, email: 1, "profile.age": 1 }; db.collection('users').find(query, { projection: projection }).toArray() .then(users => { console.log(users); }) .catch(err => { console.error(err); }); """ ### C. Data Access Objects (DAOs) and Repositories * **Do This:** Implement DAOs or repositories to abstract data access logic from the rest of the application. Define interfaces for DAOs/repositories to promote loose coupling and testability. Use dependency injection (DI) to provide DAOs/repositories to consuming components. Handle connection management (connecting and disconnecting) within the DAOs/repositories. Use MongoDB's built-in connection pooling. * **Don't Do This:** Embed data access logic directly within business logic components. Create tight coupling between business logic and MongoDB driver code. Manually manage database connections in multiple places throughout the application, circumventing the driver's connection pooling. * **Why:** DAOs and repositories provide a layer of abstraction between the application and the database, making it easier to test, maintain, and evolve the system. They centralize data access logic, enforce consistency, and promote code reuse. DI enables loose coupling and simplifies unit testing. """java // Example: DAO interface (Java) public interface UserDAO { User findById(String id); List<User> findByStatus(String status); void save(User user); void delete(String id); } // Example: DAO implementation (Java) public class MongoDBUserDAO implements UserDAO { private final MongoCollection<User> userCollection; public MongoDBUserDAO(MongoClient mongoClient, String databaseName, String collectionName) { MongoDatabase database = mongoClient.getDatabase(databaseName); this.userCollection = database.getCollection(collectionName, User.class); // Assuming you have a User class } @Override public User findById(String id) { return userCollection.find(eq("_id", new ObjectId(id))).first(); } @Override public List<User> findByStatus(String status) { return userCollection.find(eq("status", status)).into(new ArrayList<>()); } @Override public void save(User user) { if (user.getId() == null) { user.setId(new ObjectId()); userCollection.insertOne(user); } else { userCollection.replaceOne(eq("_id", user.getId()), user); } } @Override public void delete(String id) { userCollection.deleteOne(eq("_id", new ObjectId(id))); } } """ ### D. Aggregation Pipelines * **Do This:** Design aggregation pipelines to perform complex data transformations, analytics, and reporting directly within the database. Use indexes to optimize the performance of aggregation pipelines. Understand the different aggregation stages and choose the most appropriate ones for your needs. Construct pipelines modularly and reuse common stages where applicable. Test the correctness and performance of aggregation pipelines. * **Don't Do This:** Perform complex data transformations in the application layer that could be done more efficiently within the database using aggregation pipelines. Neglect using indexes to optimize aggregation pipeline performance. Construct overly complex pipelines that are difficult to understand and maintain. * **Why:** Aggregation pipelines provide a powerful and efficient way to process large datasets directly within MongoDB. By performing data transformations within the database, you can reduce network traffic, memory usage, and CPU load on the application server. Modular pipelines are easier to understand, test, and maintain. """javascript // Example: Aggregation pipeline to calculate average age by city db.collection('users').aggregate([ { $match: { status: "active" } }, { $group: { _id: "$profile.city", averageAge: { $avg: "$profile.age" }, userCount: { $sum: 1 } } }, { $sort: { averageAge: -1 } } ]).toArray() .then(results => { console.log(results); }) .catch(err => { console.error(err); }); """ ### E. Data Validation * **Do this:** Implement MongoDB's built in Schema Validation with JSON Schema syntax to ensure data integrity on insert and update operations. Consider using "validationLevel: "moderate"" and "validationAction: "warn"" during development & staging to allow the application to handle validation errors instead of hard failing database operations. * **Don't do this:** Rely solely on application-level validation, to bypass database enforced schema. Set "validationAction" to "error" in production without adequately handling resulting exceptions in the application. * **Why:** Implementing validation at the database level provides a strong defense against malformed data. It improves data consistency, reduces errors, and simplifies application-level validation logic. Using "moderate" validation during development provides flexibility while still catching invalid data issues early. """javascript // Example: Schema Valication db.createCollection( "myCollection", { validator: { $jsonSchema: { bsonType: "object", required: [ "name", "age" ], properties: { name: { bsonType: "string", description: "must be a string and is required" }, age: { bsonType: "int", minimum: 0, description: "must be an integer >= 0 and is required" }, email: { bsonType: "string", pattern: "^.+@.+\\..+$", description: "must be a valid email address" } } } }, validationLevel: 'moderate', //or strict validationAction: 'warn' //or error } ) """ ### F. Error Handling * **Do This:** Implement robust error handling throughout the application. Catch MongoDB-specific exceptions and provide meaningful error messages to the user. Log errors appropriately. Implement retry logic for transient errors, such as network connectivity issues. Implement circuit breaker pattern for database outages. * **Don't Do This:** Ignore exceptions or provide generic error messages that don't help diagnose the problem. Expose sensitive database information in error messages. * **Why:** Proper error handling is crucial for application stability and usability. It helps prevent unexpected crashes, provides informative feedback to the user, and simplifies debugging. Logging errors allows you to monitor the health of the system and identify potential problems. """javascript // Example: Error handling with async/await async function getUser(userId) { try { const user = await db.collection('users').findOne({ _id: userId }); if (!user) { throw new Error("User with ID ${userId} not found"); } return user; } catch (err) { console.error("Error retrieving user with ID ${userId}:", err); // Consider logging to a central error logging service throw new Error("Failed to retrieve user. Please try again later."); // Mask the underlying exception. } } """ ## III. Further Considerations * **Security:** Implement security measures to protect sensitive data. Use authentication and authorization to control access to the database. Use encryption to protect data at rest and in transit. Follow the principle of least privilege. Sanitize user inputs to prevent injection vulnerabilities (e.g., NoSQL injection). Avoid storing sensitive information directly in the database. * **Performance Monitoring:** Implement performance monitoring to track database performance and identify potential bottlenecks. Use MongoDB's built-in monitoring tools or external monitoring services. Monitor query performance, index usage, and resource utilization. Use "explain()" to analyze slow queries. * **Logging:** Implement comprehensive logging to track application activity and diagnose problems. Log relevant events, such as user logins, data modifications, and errors. Use a structured logging format (e.g., JSON) to simplify analysis. Ensure logs are rotated and archived appropriately. * **Testing:** Implement thorough testing to ensure the correctness and reliability of the application. Write unit tests to verify the behavior of individual components, integration tests to verify the interaction between components, and end-to-end tests to verify the overall functionality of the system. Use mocking to isolate components during testing. Use test data that is representative of production data. By adhering to these component design standards, development teams can create robust, maintainable, and performant MongoDB applications. Remember to always stay up-to-date with the latest MongoDB features and best practices by consulting the official MongoDB documentation.