# Database types
# Key value DB
# Document oriented DB
A document-oriented database, also known as a document store, is a type of NoSQL database that stores data in documents. These documents are similar to JSON objects, containing key-value pairs.
Key Characteristics:
- Flexible Schema: Unlike relational databases, document databases don't enforce a strict schema. Documents can have different structures within the same collection.
- Scalability: They can handle large volumes of data and high traffic efficiently.
- Performance: Often optimized for rapid read and write operations.
- JSON-like Format: The data is stored in a human-readable format, making it easier to work with.
How it Works:
- Data Storage: Documents are stored as individual units with key-value pairs.
- Indexing: Indexes can be created on specific fields within documents for efficient querying.
- Querying: Document databases typically use query languages like MongoDB's query language or specialized APIs to retrieve data.
Use Cases:
- Content Management Systems: Storing articles, blog posts, and other content with rich structure.
- E-commerce: Managing product catalogs, customer data, and order information.
- Real-time Analytics: Processing and analyzing large volumes of data quickly.
- Mobile and Web Applications: Storing user data, preferences, and session information.
Examples: MongoDb, Amazon DocumentDB, Couchbase, Firebase Firestore, ...
# Wide column DB
Examples: Big Table, HBase, Cassandra.
# Vector DB
A vector database is a specialized database designed to store and search high-dimensional numerical data, known as vectors. These vectors are often representations of complex data like text, images, or audio, transformed into mathematical formats using techniques like embedding.
How it works:
- Data Transformation: Raw data (text, image, etc.) is converted into numerical vectors using machine learning models.
- Vector Storage: These vectors are stored in the vector database.
- Similarity Search: When a query vector is provided, the database efficiently finds the most similar vectors based on distance metrics (e.g., Euclidean, cosine).
Use Cases:
- Recommendation Systems: Finding similar products, movies, or music based on user preferences.
- Image Search: Searching for similar images based on visual content.
- Text Search: Finding semantically similar text documents.
- Fraud Detection: Identifying anomalous patterns in financial data.
- Drug Discovery: Finding similar molecules for potential drug candidates.
# Graph DB
A graph database is a specialized database that uses graph structures to store and manage data. Unlike relational databases that organize data in tables, rows, and columns, graph databases represent data as nodes (entities) and edges (relationships) between them.
How it Works:
- Data Modeling: Data is modeled as a graph, where entities are nodes and connections between them are edges.
- Data Storage: The database stores nodes, edges, and their properties.
- Querying: Queries are performed by traversing the graph structure to find related information.
Use Cases:
- Social Networks: Analyzing friend connections, recommendations, and influence.
- Fraud Detection: Identifying patterns of fraudulent activities.
- Recommendation Systems: Suggesting products or content based on user preferences and behavior.
- Knowledge Graphs: Representing complex information and relationships.
- Supply Chain Managemen: Tracking the flow of goods and materials.
# Object oriented DB
In an object-oriented database, data is organized and stored as objects, which are self-contained units that contain both data and the operations or methods that can be performed on that data.
# Time Series Database (TSDB)
A Time Series Database (TSDB) is specifically designed to handle and store data points that are collected at specific points in time. Think of it as a database optimized for time-stamped data.
Key Characteristics:
- Time-stamped Data: Every data point is associated with a timestamp.
- High Ingestion Rate: TSDBs can handle massive amounts of data points quickly.
- Efficient Querying: They are optimized for queries that involve time ranges, aggregations, and filtering.
- Data Retention: Often configured to automatically delete old data to manage storage.
How it Works: TSDBs typically use a combination of techniques to efficiently store and query time-series data:
- Compression: To reduce storage space.
- Indexing: To quickly locate data points based on time ranges.
- Partitioning: To distribute data across multiple storage nodes for scalability.
- Aggregation: To summarize data for faster query performance.
Common Use Cases:
- IoT Sensor Data: Collecting and storing data from devices like temperature sensors, humidity sensors, etc.
- Financial Data: Tracking stock prices, trading volumes, and market data.
- IT Monitoring: Storing metrics like CPU usage, memory utilization, network traffic.
- Web Analytics: Analyzing website traffic, user behavior, and performance metrics.
# New SQL
NewSQL databases are a relatively new class of database management systems that aim to bridge the gap between traditional relational databases (RDBMS) and NoSQL databases.
Key Characteristics:
- ACID Compliance: They maintain the ACID properties (Atomicity, Consistency, Isolation, Durability) of traditional RDBMS, ensuring data integrity and reliability.
- Scalability: They offer the horizontal scalability of NoSQL databases, allowing them to handle increasing workloads and data volumes.
- High Performance: NewSQL databases strive to achieve high performance, comparable to NoSQL systems, while maintaining ACID compliance.
- SQL Support: They typically support SQL for querying data, making them familiar to developers.
How It Works: NewSQL databases achieve these characteristics through a combination of techniques:
- Distributed Architecture: Data is distributed across multiple nodes for scalability and fault tolerance.
- Sharding: Data is partitioned based on specific criteria to improve performance and scalability.
- In-Memory Computing: Some NewSQL databases utilize in-memory data storage for faster processing.
- Hybrid Storage: A combination of disk-based and in-memory storage to balance performance and cost.
Use Cases: NewSQL databases are suitable for a wide range of applications, including:
- Online Transaction Processing (OLTP): Handling high volumes of concurrent transactions.
- Online Analytical Processing (OLAP): Supporting complex analytical queries on large datasets.
- Real-time Analytics: Processing and analyzing data in real-time.
- Internet of Things (IoT): Handling massive amounts of time-series data.
# Spatial DB
A spatial database is a specialized database designed to store, manage, and analyze geographic data. This type of data represents objects defined in a geometric space, such as points, lines, and polygons.
Key Characteristics:
- Spatial Data Types: Supports specific data types to represent geographic features like points, lines, polygons, and multi-polygons.
- Spatial Indexes: Uses specialized indexes to efficiently query and retrieve spatial data based on location and proximity.
- Spatial Functions: Offers functions for performing spatial operations like distance calculations, intersections, and overlays.
- Coordinate Systems: Handles different coordinate systems (e.g., latitude/longitude, UTM) to accurately represent geographic data.
Use Cases:
- Geographic Information Systems (GIS): Storing and analyzing geographic data for mapping, analysis, and visualization.
- Location-Based Services (LBS): Providing location-aware services like navigation, proximity search, and geofencing.
- Urban Planning: Managing and analyzing urban data for planning and development.
- Environmental Science: Storing and analyzing environmental data for monitoring and modeling.
Examples: PostgreSQL/PostGIS, Oracle Spatial, MongoDB GeoJSON, Redis geospatial ...
# Comparision
# Sql vs Nosql
# Row vs Column based
# Document vs Key value
# Centralized vs Decentralized
# OLTP and OLAP systems
Features | OLTP System (Operational Data) | OLAP System (Data Warehouse) |
---|---|---|
Data source | From Traditional or original source | From diverse databases |
Aim of data | Organizational basic activities running | For planning and problem solving known Business support system |
Data type | images of ongoing business transaction | Business activities from various sections |
Insertion and updating | Insert and update are by end users which is fast | Periodic refreshing inform of Batch jobs |
Queries | Simple queries that return small records | Complex aggregate queries |
Efficiency | Very high speed to answer queries | Takes some time depending on the requirement but can be improved using index |
Space hold | holds very small history | Comprehensive history due to aggregation |
Database design | Greatly normalized due to many tables involvement | Highly denormalised with few tables using Star and snowflake schema. |