# Kafka notes

Kafka overview

# Architecture

Event streaming platform

# Zoo keeper

manages cluster, brokers, topics, offsets, events,...

# Kafka cluster

is a distributed system
is a set of many broker servers (in prod > 3)
highly scalable & fault tolerant

# Kafka Broker

form storage layer
Apps (consumer & producer) use broker to exchange data async
Broker organized events into 1 or many topics

# Event / Message

Event is any data captured in real time
Event Stream is a stream of data in sequence of time from many data sources
Kafka stores events in files
Kafka Message Anatomy

# Producer & Consumer

Producer: sends events to topic in brokers
Consumer: read events from topics in brokers
Producers & Consumers are decoupled, both must provide Serializer/Deserializer for data types of event to convert them into a stream of bytes that Kafka uses to stores & transmits

# Consumer Group

is a set of consumers which corporate to consume data from topics
each consumer in group reads 1 partition in topic
If total of consumers <= total of partitions, some consumers read one more partitions
If total of consumers > total of partitions, some consumers will be idle. It means consumers in the same group can not read the same partition.
One consumer group can consume many topics

consumer group

# Kafka topic

a category to organize events in broker
identified by a name
broker can have many topics
Multi-producers can write events to a topic
Multi-consumers can read events from a / multi topic

# Kafka topic partitions

A topic is partitioned, break into multi smaller parts
Parts are located on different brokers

The producer will decide target partition to place any message, depending on:

Partition id, if it's specified within the message
key % num partitions, if no partition id is mentioned
Round robin if neither partition id nor message key is available in the message means only the value is available

# Kafka Event's Offset

Offset is a unique id of a message , event's position in partitions
Offset is unchangeable

# Kafka Partition's Replication

Every topic partition in replicated n times (factor) at different brokers in clsuter
One partition is leader, other are followers. Producer writes to leader, followers copy.
If leader fails, a follower is elected to become new leader.
Replication make sure that messages remain avaibable when some servers are failure
In production, factor should be odd, 3 , 5 , 7

# Comparision

Kafka	Database
Topic	Table
Event	Row
Offset	Primary key
Partition	Sharding / Horizontal Partition
Partition Replication	Master / Slave or Leader / Follower

# Use cases

Traditional messaging systems
website activity tracking
metrics
log aggregation
stream processing
event sourcing
commit log

# Error handling

register callback when publishing message
store failed message into db

# Polling Service

Scheduler to configure & schedule tasks in background
Read data from source (database, e.g message publishing failed), then send to Kafka server
- If success, update status or remove
- If failure, update status & re-send next time or notification

# Dead-letter Topic

for message consumming failed many times => move to a dead letter topic
Method use for consumers that cares about the sequence of events.
Avoid blocking if ordered consumer
Avoid data lost
Name same with original topic with postfix DLT. E.g: 'statistic.DLT'
use default handler

DefaultErrorHandler errorHandler(KafkaOperations<String, Object> template) {
    return new DefaultErrorHandler(new DeadLetterPublishingRecoverer(template), new FixedBackOff(1000L, 2));
}

# Kafka Consumer non-block retry

@RetryableTopic(attempts = "5", dltTopicSuffix = "-dlt", backoff = @Backoff(delay= 2000, multiplier = 2))
@KafkaListener(id = "statGroup", topic = "stat")
public void listen(Stat stat) {
    // ...
}

# Schema registry

const { Kafka } = require('kafkajs');
const { KafkaAvro } = require('kafka-avro');

// Configure the Kafka client:
const kafka = new Kafka({
  clientId: 'my-app',
  brokers: ['localhost:9092'],
});

// Create a KafkaAvro instance
const kafkaAvro = new KafkaAvro({
  kafka,
  schemaRegistry: 'http://localhost:8081',
});

// Define the Avro schema
const userSchema = {
  type: 'record',
  name: 'User',
  fields: [
    { name: 'id', type: 'int' },
    { name: 'name', type: 'string' },
    { name: 'email', type: 'string' },
  ],
};

// Register the Avro schema with the Schema Registry
await kafkaAvro.register(userSchema);

// Produce a message with the Avro schema
const producer = kafka.producer();
await producer.connect();

const user = { id: 1, name: 'John Doe', email: 'johndoe@example.com' };
const encodedMessage = await kafkaAvro.encode(userSchema, user);

await producer.send({
  topic: 'my-topic',
  messages: [{ value: encodedMessage }],
});

await producer.disconnect();


// Consume a message with the Avro schema
const consumer = kafka.consumer({ groupId: 'my-group' });
await consumer.connect();
await consumer.subscribe({ topic: 'my-topic', fromBeginning: true });

await consumer.run({
  eachMessage: async ({ topic, partition, message }) => {
    const decodedMessage = await kafkaAvro.decode(message.value);
    console.log(decodedMessage);
  },
});

# Why is Kafka fast?

Kafka achieves low latency message delivery through Sequential I/O and Zero Copy Principle. Kafka 0 copy

Zero copy is a shortcut to save the multiple data copies between application context and kernel context. This approach brings down the time by approximately 65%.

# Use cases

Message Broker
Metrics
Website Activity Tracking
Event Sourcing
Commit Logs
Log Aggregation
Kafka Stream Processing

# Tools

https://www.tutorialspoint.com/apache_kafka/index.htm

← Kafka Overview Kafka vs Jetstream →