# Apache Cassandra

Developed by Facebook in 2007. Gifted to Apache 2/2010
NoSQL but has structure
A database - column family
Can use SQL to manipulate with data
Peer to peer database, sharding, more nodes more performance

# Column-oriented Database

Its architecture uses (a) persistent, sparse matrix, multi-dimensional mapping (row-value, column-value, and timestamp) in a tabular format meant for massive scalability (over and above the petabyte scale).

Column-oriented

Source:

https://stackoverflow.com/questions/62010368/what-exactly-is-a-wide-column-store
https://www.baeldung.com/cassandra-column-family-data-model

# Ring cluster

cluster

Each nodes will be assigned a range of token.
Client could connect any nodes to write, that node will become coordinator node.
Partition keys will be hashed into a token. Coordinator will base on the token to know which node we can store the data.
When a Cassandra node becomes unavailable, processing continues and failed writes are temporarily saved as hints on the coordinator. If the hints have not expired, they are applied to the node when it becomes available.

# Replication Factor

Replication Factor (RF) = number copies we want to store
Replication node will be defined by Replication Strategy
Simple strategy = next 2 nodes

# Consistent hashing

... TODO

# Data consistency

WRITE	READ	Consistent	Read Availability	Write Availability
All	All	✅	Low	Low
Quorum	All	✅	Low	Medium
✅ One	All	✅	Low	High
All	Quorum	✅	Medium	Low
✅ Quorum	Quorum	✅	Medium	Medium
One	Quorum	❌	Medium	High
✅ All	One	✅	High	Low
Quorum	One	❌	High	Medium
One	One	❌	High	High

# Keys

Each table requires a unique primary key.
The first field listed is the partition key, since its hashed value is used to determine the node to store the data.
If those fields are wrapped in parentheses then the partition key is composite. Otherwise the first field is the partition key.
Any fields listed after the partition key are called clustering columns. These store data in ascending or descending order within the partition for the fast retrieval of similar values.
All the fields together are the primary key.

All data has the same partison key will be store at the same node

We should only access data using the partison key

Every node has 256 virtual nodes in default

# CQL

Tables should reflect the queries we are trying to make

# Keyspace

# Table

# Selection

Write time

SELECT car_make, car_model, writetime(car_model)
FROM employee_by_car_make;

Allow filtering

SELECT * FROM employee_by_id
WHERE name = 'John' ALLOW FILTERING;

# Update

Time to live (60seconds), value will be null after the given time.

Use case: token / coupon is only valid in 24hours

UPDATE employee_by_car_make 
USING TTL 60 
SET car_model='TRUCK'
WHERE car_make'BMW' AND id='2';

# Collections

UPDATE employee_by_id
SET phone = {'123', '321'}
WHERE id = 1

Append

UPDATE employee_by_id
SET phone = phone + {'555', '444'}
WHERE id = 1

Remove

UPDATE employee_by_id
SET phone = phone - {'555'}
WHERE id = 1

# UUID

# Refs

← MongoDB Notes Elastic Search →