# Rate limiter Pattern

The rate limiter pattern is a design technique used to control the rate at which incoming requests or actions are processed by a system. It's essentially a mechanism to manage the throughput of a service by limiting the number of requests it can handle within a specific timeframe.

Why is it important?

Protects from overload: Prevents system crashes due to excessive traffic.
Fairness: Ensures equal access to resources for all users.
API abuse prevention: Stops malicious attacks like DDoS.
Resource management: Optimizes resource utilization.

# Algorithms

Fixed Window Counter: Simplest approach, counts requests within a fixed time window.
Sliding Log Window: Maintains a sliding window of fixed size and counts requests within that window.
Token Bucket: Continuously generates tokens at a fixed rate. Each request consumes a token.
Leaky Bucket: Similar to Token Bucket but with a leakage rate, allowing for burstiness.

# Token bucket

A token bucket is a container that has pre-defined capacity. Tokens are put in the bucket at preset rates periodically. Once the bucket is full, no more tokens are added.

E.g: the token bucket capacity is 4. The refiller puts 2 tokens into the bucket every second. Once the bucket is full, extra tokens will overflow.

Each request consumes one token. When a request arrives, we check if there are enough tokens in the bucket.

If there are enough tokens, we take one token out for each request, and the request goes through.
If there are not enough tokens, the request is dropped.

The token bucket algorithm takes two parameters:

Bucket size: the maximum number of tokens allowed in the bucket
Refill rate: number of tokens put into the bucket every second

How many buckets do we need? This varies, and it depends on the rate-limiting rules. Here are a few examples.

It is usually necessary to have different buckets for different API endpoints. For instance, if a user is allowed to make 1 post per second, add 150 friends per day, and like 5 posts per second, 3 buckets are required for each user.
If we need to throttle requests based on IP addresses, each IP address requires a bucket
If the system allows a maximum of 10,000 requests per second, it makes sense to have a global bucket shared by all requests.

Pros:

The algorithm is easy to implement.
Memory efficient.
Token bucket allows a burst of traffic for short periods. A request can go through as long as there are tokens left. Cons:
Two parameters in the algorithm are bucket size and token refill rate. However, it might be challenging to tune them properly.

# Leaking bucket

Imagine a leaky bucket. Water (packets) is poured into the bucket at an arbitrary rate. However, there's a hole at the bottom, allowing water to leak out at a constant rate. If the water is poured in faster than it can leak out, the bucket will eventually overflow.

# How it Works

In the context of network traffic, the leaky bucket algorithm operates similarly:

Bucket: Represents a buffer to store incoming packets.
Input Rate: Packets arrive at the bucket at a variable rate.
Output Rate: Packets are removed from the bucket and sent at a constant rate.
Overflow: If the bucket fills up before packets can be removed, incoming packets are typically dropped.

# Key Characteristics

Smooths traffic: It ensures a constant output rate regardless of the input burstiness.
Simple implementation: Relatively easy to implement compared to other algorithms.
Packet loss: Can lead to packet loss if the input rate exceeds the output rate for an extended period.

# Sliding window counter

Redis lua example:

local totalHits = redis.call("INCR", KEYS[1])
local timeToExpire = redis.call("PTTL", KEYS[1])
if timeToExpire <= 0
then
    redis.call("PEXPIRE", KEYS[1], tonumber(ARGV[1]))
    timeToExpire = tonumber(ARGV[1])
end
return { totalHits, timeToExpire }

const scriptSrc = `...`;

const results: number[] = (await this.redis.call(
    'EVAL',
    scriptSrc,
    1,
    key,
    ttl * 1000,
)) as number[];

# Bulkhead vs Rate Limiter

Bulkhead

Limit number of concurrent calls at a time

Ex: Allow 5 concurrent calls at a time

Rate Limiter

Limit number total calls in given period of time

Ex: Allow 5 calls every 2 second.

← Retry pattern Bulkhead Pattern →