DocumentDB
is a MongoDB compatible open source document database built on PostgreSQL. It offers a native implementation of a document-oriented NoSQL database, enabling seamless CRUD (Create, Read, Update, Delete) operations on BSON(Binary JSON) data types within a PostgreSQL framework. Beyond basic operations, DocumentDB empowers users to execute complex workloads, including full-text searches, geospatial queries, and vector search, delivering robust functionality and flexibility for diverse data management needs.
PostgreSQL is a powerful, open source object-relational database system that uses and extends the SQL language combined with many features that safely store and scale the most complicated data workloads.
The project comprises of three components, which work together to support document operations.
- pg_documentdb_core : PostgreSQL extension introducing BSON datatype support and operations for native Postgres.
- pg_documentdb : The public API surface for DocumentDB providing CRUD functionality on documents in the store.
- pg_documentdb_gw : The gateway protocol translation layer that converts the user's MongoDB APIs into PostgreSQL queries.
At DocumentDB, we believe in the power of open-source to drive innovation and collaboration. Our commitment to being a fully open-source MongoDB compatible document database means that we are dedicated to transparency, community involvement, and continuous improvement. We are open-sourced under the most permissive MIT license, where developers and organizations alike have no restrictions incorporating the project into new and existing solutions of their own. DocumentDB introduces the BSON data type to PostgreSQL and provides APIs for seamless operation within native PostgreSQL, enhancing efficiency and aligning with operational advantages.
DocumentDB also provides a powerful on-premise solution, allowing organizations to maintain full control over their data and infrastructure. This flexibility ensures that you can deploy it in your own environment, meeting your specific security, compliance, and performance requirements. With DocumentDB, you get the best of both worlds: the innovation of open-source and the control of on-premise deployment.
We chose PostgreSQL as our platform for several reasons:
- Proven Stability and Performance: PostgreSQL has a long history of stability and performance, making it a trusted choice for mission-critical applications.
- Extensibility: The extensible architecture of PostgreSQL allows us to integrate a DocumentDB API on BSON data type seamlessly, providing the flexibility to handle both relational and document data.
- Active Community: PostgreSQL has a vibrant and active community that continuously contributes to its development, ensuring that it remains at the forefront of database technology.
- Advanced Features: PostgreSQL offers a rich feature set, including advanced indexing, full-text search, and powerful querying capabilities, which enhance the functionality of DocumentDB.
- Compliance and Security: PostgreSQL's robust security features and compliance with various standards makes it an ideal choice for organizations with stringent security and regulatory requirements.
- Python 3.7+
- pip package manager
- Docker
- Git (for cloning the repository)
Step 1: Install Python
pip install pymongo
Step 2. Install optional dependencies
pip install dnspython
Step 3. Setup DocumentDB using Docker
# Pull the latest DocumentDB Docker image
docker pull ghcr.io/microsoft/documentdb/documentdb-local:latest
# Tag the image for convenience
docker tag ghcr.io/microsoft/documentdb/documentdb-local:latest documentdb
# Run the container with your chosen username and password
docker run -dt -p 10260:10260 --name documentdb-container documentdb --username <YOUR_USERNAME> --password <YOUR_PASSWORD>
docker image rm -f ghcr.io/microsoft/documentdb/documentdb-local:latest || echo "No existing documentdb image to remove"
Note: During the transition to the Linux Foundation, Docker images may still be hosted on Microsoft's container registry. These will be migrated to the new DocumentDB organization as the transition completes. Note: Replace
<YOUR_USERNAME>
and<YOUR_PASSWORD>
with your desired credentials. You must set these when creating the container for authentication to work.Port Note: Port
10260
is used by default in these instructions to avoid conflicts with other local database services. You can use port27017
(the standard MongoDB port) or any other available port if you prefer. If you do, be sure to update the port number in both yourdocker run
command and your connection string accordingly.
Step 4: Initialize the pymongo client with the credentials from the previous step
import pymongo
from pymongo import MongoClient
# Create a MongoDB client and open a connection to DocumentDB
client = pymongo.MongoClient(
'mongodb://<YOUR_USERNAME>:<YOUR_PASSWORD>@localhost:10260/?tls=true&tlsAllowInvalidCertificates=true'
)
Step 5: Create a database and collection
quickStartDatabase = client["quickStartDatabase"]
quickStartCollection = quickStartDatabase.create_collection("quickStartCollection")
Step 6: Insert documents
# Insert a single document
quickStartCollection.insert_one({
'name': 'John Doe',
'email': '[email protected]',
'address': '123 Main St, Anytown, USA',
'phone': '555-1234'
})
# Insert multiple documents
quickStartCollection.insert_many([
{
'name': 'Jane Smith',
'email': '[email protected]',
'address': '456 Elm St, Othertown, USA',
'phone': '555-5678'
},
{
'name': 'Alice Johnson',
'email': '[email protected]',
'address': '789 Oak St, Sometown, USA',
'phone': '555-8765'
}
])
Step 7: Read documents
# Read all documents
for document in quickStartCollection.find():
print(document)
# Read a specific document
singleDocumentReadResult = quickStartCollection.find_one({'name': 'John Doe'})
print(singleDocumentReadResult)
Step 8: Run aggregation pipeline operation
pipeline = [
{'$match': {'name': 'Alice Johnson'}},
{'$project': {
'_id': 0,
'name': 1,
'email': 1
}}
]
results = quickStartCollection.aggregate(pipeline)
print("Aggregation results:")
for eachDocument in results:
print(eachDocument)
- Check out our website to stay up to date with the latest on the project.
- Check out our docs for MongoDB API compatibility, quickstarts and more.
- Contributors and users can join the DocumentDB Discord channel for quick collaboration.
- Check out FerretDB and their integration of DocumentDB as a backend engine.