Redis Streams for Real-Time Pipeline Coordination

December 23, 2021

Peter Bieda

Author

In many of my projects, I’ve had to orchestrate real-time data pipelines that process thousands of events per second. Early on, my first instinct was to reach for Kafka—everyone says Kafka is the de facto standard for streaming—but in my case, it quickly became clear that Kafka was overkill. I didn’t need distributed durability, complex brokers, or exactly-once semantics for everything. What I needed was fast, low-latency coordination across multiple pipeline stages, and that’s where Redis Streams came in.

Redis Streams hit the sweet spot for millisecond-level coordination. It’s lightweight, simple to deploy, and has enough persistence and consumer group features to let you orchestrate pipelines reliably without adding unnecessary operational complexity. In this article, I want to walk you through why I switched to Redis Streams, how I built my real-time pipeline, the challenges I faced, and the scripts I used to test the system.

Why Kafka Was Overkill

When I first considered Kafka, I quickly ran into friction:

Heavy broker setup for just a handful of microservices
Multiple Zookeeper instances for cluster coordination
Extra operational overhead for replication and partition management
Higher latency than needed for sub-millisecond coordination

My pipeline didn’t need global durability or massive fan-out. What I really needed was a lightweight, reliable queue that could coordinate microservices in real-time, and Redis Streams provided exactly that.

Redis Streams is essentially a log-based data structure built into Redis that lets producers append events and consumers read them sequentially. Unlike Kafka, you don’t need a full cluster just to get high-throughput messaging, and you can run everything on a single Redis instance for testing or development.

Basic Architecture

The pipeline I designed had multiple stages:

Data Ingest – capturing events from external sources
Transformation – cleaning, enriching, and normalizing data
Simulation/Execution – running matching logic or calculations
Analytics/Monitoring – storing metrics and logs

I used Redis Streams to coordinate these stages. Every stage is essentially a consumer group, reading from a stream and writing processed events to the next stream in the pipeline.

Here’s a conceptual diagram of the flow:

External Source
      │
      ▼
+----------------+
| Ingest Service |
+----------------+
      │ XADD
      ▼
+--------------------+
| Transformation Svc |
+--------------------+
      │ XADD
      ▼
+--------------------+
| Simulation/Engine  |
+--------------------+
      │ XADD
      ▼
+--------------------+
| Analytics/Monitoring|
+--------------------+

Each arrow represents an event appended to a Redis stream using XADD. Consumer groups read events with XREADGROUP and acknowledge them with XACK. This simple mechanism ensures at-least-once processing with minimal latency.

Getting Started With Redis Streams

I started by defining a stream for each stage:

import redis

r = redis.Redis(host='localhost', port=6379, db=0)

# Define stream names
INGEST_STREAM = 'ingest_stream'
TRANSFORM_STREAM = 'transform_stream'
SIM_STREAM = 'simulation_stream'

Producers append events like this:

event_id = r.xadd(INGEST_STREAM, {'symbol': 'AAPL', 'price': 172.5, 'qty': 10})
print(f"Produced event {event_id}")

Consumers read from the stream using consumer groups:

GROUP_NAME = 'ingest_group'
CONSUMER_NAME = 'worker_1'

# Create the group (ignore if already exists)
try:
    r.xgroup_create(INGEST_STREAM, GROUP_NAME, id='0', mkstream=True)
except redis.exceptions.ResponseError:
    pass

# Read events
while True:
    entries = r.xreadgroup(GROUP_NAME, CONSUMER_NAME, {INGEST_STREAM: '>'}, count=10, block=1000)
    for stream_name, events in entries:
        for event_id, data in events:
            # Process event
            print(f"Processing {event_id}: {data}")
            # Acknowledge
            r.xack(INGEST_STREAM, GROUP_NAME, event_id)

This approach scales naturally. Each stage can have multiple consumers, and Redis Streams ensures each event is delivered to only one consumer within a group.

Handling Millisecond Coordination

One of the main reasons I chose Redis Streams was latency. In my pipeline, certain stages required sub-millisecond responsiveness. Python prototypes worked fine for logic, but I needed the message coordination to be extremely fast. Redis Streams delivered:

Append latency ~100 µs per message
Consumer read latency < 200 µs with small batch sizes
Consistent order guarantees within a stream

For example, when testing 10,000 events in sequence:

import time

start = time.time()
for i in range(10_000):
    r.xadd(INGEST_STREAM, {'id': i, 'value': i*2})
end = time.time()

print("Total time for 10k events:", end-start)

On a local Redis instance, this finished in ~1 second, or roughly 100 µs per event, which is acceptable for my pipeline.

Consumer Groups for Reliable Processing

A key feature I leveraged was consumer groups. Each stage had a group to distribute events to multiple workers. This prevents double processing while still allowing parallelism.

Example:

GROUP_NAME = 'transform_group'
CONSUMER_NAME = 'worker_1'

# Read pending messages first
pending = r.xpending(TRANSFORM_STREAM, GROUP_NAME)
print("Pending messages:", pending)

This allows me to recover events if a worker crashes. Redis keeps track of pending entries per group, so unacknowledged messages can be reassigned to another worker automatically. This is much simpler than Kafka’s partition management for my use case.

Simple Multi-Stage Test Script

To validate the pipeline, I wrote a simple test script that simulates three stages: ingest, transform, and analytics.

import redis
import time
import random

r = redis.Redis()

# Stage streams
streams = ['ingest_stream', 'transform_stream', 'analytics_stream']

# Create consumer groups
for stream in streams:
    try:
        r.xgroup_create(stream, f'{stream}_group', id='0', mkstream=True)
    except redis.exceptions.ResponseError:
        pass

# Produce 100 events
for i in range(100):
    r.xadd('ingest_stream', {'id': i, 'value': random.randint(1,100)})

# Consumer function
def process_stream(input_stream, output_stream=None):
    GROUP_NAME = f'{input_stream}_group'
    CONSUMER_NAME = 'worker_1'

    entries = r.xreadgroup(GROUP_NAME, CONSUMER_NAME, {input_stream: '>'}, count=10, block=500)
    for stream_name, events in entries:
        for event_id, data in events:
            print(f"{input_stream} processed {event_id}: {data}")
            if output_stream:
                r.xadd(output_stream, data)
            r.xack(input_stream, GROUP_NAME, event_id)

# Simulate pipeline processing
process_stream('ingest_stream', 'transform_stream')
process_stream('transform_stream', 'analytics_stream')
process_stream('analytics_stream')

This script showed that Redis Streams easily handles multi-stage pipelines with minimal overhead.

What I Learned

1. Redis Streams is lightweight but powerful

I initially underestimated it because I was thinking in Kafka terms. But for pipelines where millisecond coordination is sufficient, Redis Streams is enough.

2. At-least-once semantics are enough for most internal pipelines

I don’t always need exactly-once delivery. For analytics and simulation pipelines, at-least-once delivery works fine as long as downstream logic can handle duplicates.

3. Consumer groups simplify parallelism

Managing multiple workers per stage is trivial with Redis Streams. Redis keeps track of pending messages and allows easy reassignment.

4. Sub-millisecond latency is achievable

With a properly tuned Redis instance, append and consume operations are fast enough for many real-time applications.

5. Pipeline architecture becomes modular

Each stage can operate independently. You can add or remove stages without affecting other parts of the pipeline. Streams become a natural event bus.

Advanced Tips

Batching: Use count parameter in XREADGROUP to process events in batches and reduce latency overhead.
Monitoring: Use XLEN to monitor stream length and XPENDING to track unacknowledged messages.
Scaling: Add more consumers per group to handle higher throughput. Redis Streams scales well horizontally for read-heavy workloads.
Persistence: Enable AOF or snapshotting for durability if needed.

Why This Matters for Real-Time Trading Infrastructure

In a trading environment:

Market data updates need to propagate quickly
Execution events must be coordinated reliably
Analytics must see data almost in real-time

Kafka is great for enterprise-scale durability and fan-out, but for real-time microservice coordination in millisecond windows, Redis Streams is often a better fit. It’s simpler, faster, and easier to manage for these specific use cases.

Conclusion

Redis Streams gave me a sweet spot between speed and reliability. It allowed me to:

Build low-latency pipelines
Coordinate multiple stages reliably
Scale horizontally across workers
Keep operational complexity low

Kafka was overkill for this scenario. Redis Streams allowed me to focus on logic and reliability, not cluster management.

This project reinforced a lesson I’ve learned over and over: the right tool isn’t always the “industry standard,” it’s the tool that matches the problem’s real requirements.

Next Steps

I plan to extend the pipeline with:

Multiple event sources feeding the same streams
Time-series aggregation in Redis for analytics
Latency monitoring dashboards using Grafana
Integration with Python/C++ engines for simulation and execution

Redis Streams has become my go-to tool for real-time coordination whenever I need fast, simple, reliable pipelines without the overhead of full Kafka clusters.

Diagram: Multi-Stage Pipeline Using Redis Streams

+-----------------+       +---------------------+       +--------------------+       +----------------------+
| External Source |  ---> |   Ingest Stream     |  ---> | Transformation Svc |  ---> | Simulation/Execution |
+-----------------+       +---------------------+       +--------------------+       +----------------------+
                                                                                              |
                                                                                              v
                                                                                       +----------------+
                                                                                       | Analytics/Mon. |
                                                                                       +----------------+

Each arrow represents XADD to a Redis Stream, and each block is a consumer group reading, processing, and acknowledging events.