# MongoDB Notes

# MongoDB vs SQL

# Terms comparision

RDBMS	MongoDB
database	database
table	collection
row	document / BSON document
column	field
index	index
table join	`$lookup`, embedded documents
primary key	primary key (automatically set to the `_id` field)
aggregation (e.g. group by)	aggregation pipeline
materialized views	On-Demand Materialized Views (opens new window)
transactions	transactions (opens new window)3

Full here - sql-comparison (opens new window)

Sql Comparison	MongoDB
WHERE col IN()	`$in()`
NOT IN()	`$nin`
`>`	`$gt`
`>=`	`$gte`
`<`	`$lt`
`<=`	`$lte`
`!=`	`$ne`

Sql Logical	MongoDB
AND	`$and`
NOT	`$not`
OR	`$or`
NOT ... AND NOT ...	`$nor`

# Arggregate Pipelines

Sql Terms	MongoDB
SELECT	`$project`
WHERE	`$match`
GROUP BY	`$group`
HAVING	`$match`
LIMIT	`$limit`
OFFSET	`$skip`
ORDER BY	`$sort`
SUM()	`$sum`
COUNT()	`$sum` and `$sortByCount`
JOIN	`$lookup`
SELECT INTO NEW_TABLE	`$out`
MERGE INTO TABLE	`$merge`
UNION ALL	`$unionWith`

Full here - sql-to-aggregation-pipeline (opens new window)

Aggregation Pipeline Stages (opens new window)

# Example

For example, you can use the $match stage to filter documents, the $group stage to group documents by a specific field, and the $project stage to reshape the output.


async function runAggregation() {
  const pipeline = [
    { $match: { status: 'active' } },
    { $group: { _id: '$category', total: { $sum: '$quantity' } } },
    { $project: { category: '$_id', total: 1, _id: 0 } }
  ];

  try {
    const collection = client.db().collection('products');
    const result = await collection.aggregate(pipeline).toArray();
    console.log(result);
  } catch (error) {
    console.error('Error executing aggregation', error);
  } finally {
    await client.close();
  }
}

runAggregation();

# Migration

https://dba.stackexchange.com/questions/203926/how-to-find-all-invalid-document-based-on-jsonschema-validator/203950#203950

https://www.mongodb.com/community/forums/t/can-i-update-a-schema/154565

# Indexes

# Single Field Index

This is the most basic type of index in MongoDB. It indexes a single field in a collection. Single field indexes can significantly improve the performance of queries that involve filtering or sorting based on that field.

Here's an example of creating a single field index on the "name" field of a collection called "users":

db.users.createIndex({ name: 1 });

# Compound Index

A compound index is created on multiple fields in a collection. It can improve the performance of queries that involve filtering or sorting based on multiple fields.

Here's an example of creating a compound index on the "name" and "age" fields of a collection called "users":

db.users.createIndex({ name: 1, age: -1 });

# Text Index

Text indexes are used to perform full-text search on string content. They enable efficient searching of text fields for words or phrases. Text indexes are particularly useful when implementing search functionality in applications.

Here's an example of creating a text index on the "description" field of a collection called "products":

db.products.createIndex({ description: "text" });

# Geospatial Index

Geospatial indexes are used to optimize queries that involve geospatial data. They enable efficient querying of data based on location and support various geospatial operations like finding nearby points or calculating distances.

Here's an example of creating a geospatial index on the "location" field of a collection called "places":

db.places.createIndex({ location: "2dsphere" });

# Hashed Index

Hashed indexes are useful for sharding data across multiple servers in a MongoDB cluster. They distribute data evenly based on the hash value of a field. Hashed indexes are typically used for fields with high cardinality. Here's an example of creating a hashed index on the "user_id" field of a collection called "orders":

db.orders.createIndex({ user_id: "hashed" });

These are just a few examples of the types of indexes available in MongoDB. Each index type serves a specific purpose and can greatly enhance the performance of queries in different scenarios. It's important to choose the appropriate index type based on the nature of your data and the types of queries you need to optimize.

Remember, creating indexes comes with a trade-off in terms of increased storage space and write performance. Therefore, it's essential to carefully analyze your application's requirements and query patterns before deciding on the indexes to create.

# Example data

Sample Data (opens new window)

# Best practices

# General Rules for MongoDB Schema Design:

Rule 1: Favor embedding unless there is a compelling reason not to

Rule 2: Needing to access an object on its own is a compelling reason not to embed it.

Rule 3: Avoid joins/lookups if possible, but don't be afraid if they can provide a better schema design.

Rule 4: Arrays should not grow without bound. - If there are more than a couple of hundred documents on the "many" side, don't embed them; - If there are more than a few thousand documents on the "many" side, don't use an array of ObjectID references. - High-cardinality arrays are a compelling reason not to embed.

Rule 5: As always, with MongoDB, how you model your data depends – entirely – on your particular application's data access patterns. You want to structure your data to match the ways that your application queries and updates it.

Recap: One-to-One - Prefer key value pairs within the document One-to-Few - Prefer embedding One-to-Many - Prefer embedding One-to-Squillions - Prefer Referencing Many-to-Many - Prefer Referencing

Full here (opens new window)

https://developer.mongodb.com/article/schema-design-anti-pattern-massive-arrays

# Distributed Architecture

Prefer Consistency

# Replica set

Replica set

Automatic Failover: The replica set cannot process write operations until the election completes successfully.

# Sharding

Sharded Cluster

# Tunable Consistency

Read concern	Description
Local	Returns data from the instance without guaranteeing the data has been written to a majority of the instances. This is equivalent to a `read uncommitted` isolation in a relational database
Available	Same as local
Majority	Guarantees that a majority of the cluster members acknowledged the request. A majority read returns only committed data.

Write concern	Description
0	Does not require an acknowledgment of the write.database
1	Requires acknowledgment from the primary member only.
`<number>`	Checks if the operation has replicated to the specified number of instances.
Majority	Checks if the operations have propagated to the majority.

# Performance

use projection to select few fields

https://www.mongodb.com/docs/manual/indexes/

https://www.mongodb.com/docs/manual/core/query-plans/

# Arggregate Pipelines

https://studio3t.com/knowledge-base/articles/build-mongodb-aggregation-queries/ https://studio3t.com/knowledge-base/categories/import-export/

# Tools

Compass (opens new window): The GUI for MongoDB
studio3t (opens new window)
robomongo (opens new window)

# Others

# Running on docker

https://hub.docker.com/_/mongo

docker run --name some-mongo -d -p 27017:27017 mongo:latest

← CouchDB Notes Apache Cassandra →