Building My First Latency-Sensitive Trading Bot in Python

September 6, 2019

#Trading Bot#python

Peter Bieda

Author

When I built my first real trading bot, I wasn’t trying to create anything complex or profitable. I was simply curious: How fast could I consume live ticks and react to them? How close could I get to “real-time” on a retail-level internet connection using only Python?

That simple experiment turned into my first practical exposure to latency engineering — and it taught me more about trading infrastructure than any book ever has.

This article breaks down the journey: what I built, where it failed, what I learned about latency, and how those lessons shaped the way I now design trading systems.

Why Latency Matters More Than You Think

Like most developers entering quantitative trading or market-driven systems, I knew “latency” was important. Everyone talks about microseconds like they’re dollars. But the actual impact doesn’t fully click until you see it in your own system.

In live trading:

A delayed tick leads to a delayed decision.
A delayed decision leads to a worse price.
A worse price leads to a losing strategy.

Even if a strategy operates at the millisecond level instead of nanoseconds, engineering discipline around timing still matters. The trading bot I built taught me this the hard way.

My First Approach: The Asyncio Streaming Bot

My first version was intentionally simple. I used Python’s asyncio and aiohttp libraries to subscribe to a WebSocket feed and stream incoming ticks.

stream_ticks.py

import asyncio
import aiohttp
import time

async def stream_ticks():
    async with aiohttp.ClientSession() as session:
        async with session.ws_connect("wss://api.exchange.example/ticks") as ws:
            async for msg in ws:
                tick = msg.json()
                print(f"{tick['symbol']} {tick['price']:.2f} @ {time.time():.6f}")

asyncio.run(stream_ticks())

It worked flawlessly in early tests — which is exactly why I trusted it.

But then I connected it to a real feed during market hours.

That’s when the truth showed up in blinking, ugly delays.

Instead of printing ticks with minimal drift, I started seeing 200–300 milliseconds of lag. Sometimes it even spiked above 500 ms when the feed was busy. In a market where prices can change multiple times per millisecond, my bot was effectively driving with half-second delayed eyesight.

The surprising part? Nothing in my logic was “slow.”
The event loop itself was the bottleneck.

The First Real Lesson: Async Isn’t a Magic Latency Cure

Python’s async model is cooperative, not preemptive. That means your code must yield control frequently or the loop gets backed up.

Under heavy tick flow:

JSON parsing slowed down frame dispatch.
Printing to console (a classic mistake) blocked the loop.
Occasional Python GC pauses created micro-bursts of delay.

Even though each operation seemed small, together they created measurable latency.

This was my first real exposure to how microsecond-scale systems behave:

Latency isn’t created by big slowdowns. It’s created by dozens of tiny, invisible ones.

Profiling async code became one of the most valuable engineering skills I gained from this project.

Profiling the Event Loop: Seeing the Invisible

To understand what was actually happening, I instrumented the loop with timestamps:

prev = time.time()

async for msg in ws:
    now = time.time()
    drift = (now - prev) * 1000
    prev = now
    print(f"Loop drift: {drift:.3f} ms")

What I learned shocked me:

Even when ticks were arriving quickly, the loop drift varied unpredictably.
GC was causing occasional 5–10 ms spikes.
Printing anything slowed down the loop significantly.
JSON decoding could introduce 1–4 ms of jitter on its own.

In a normal backend system, these numbers are irrelevant.
In a trading system, they’re catastrophic.

This is when I realized latency engineering is a mindset, not a library.

Optimizing the Pipeline: What Actually Worked

After identifying the bottlenecks, I iterated through several improvements.

1. Removing All Stdout Logging

Printing to console — even once per second — destroys latency.

I replaced print() with a lock-free ring buffer logger that flushed asynchronously. The improvement was immediate: tens of milliseconds of jitter disappeared.

2. Switching to Faster JSON Parsing

Python’s built-in json module is convenient but slow.
Using orjson or ujson reduced decode times drastically.

Example:

import orjson
tick = orjson.loads(msg.data)

This alone cut ~2 ms per tick during heavy flow.

3. Pre-Allocating Objects and Avoiding GC Pressure

I moved temporary objects outside the loop and reused them where possible.
I also called:

import gc
gc.disable()

With GC disabled, jitter became more predictable.
(Obviously, in real systems GC must be handled more carefully, but this was a prototype.)

4. Offloading Tick Processing to a Separate Task

Instead of doing everything in the WebSocket loop, I pushed ticks into an asyncio.Queue and processed them elsewhere:

queue = asyncio.Queue()

async def producer():
    async for msg in ws:
        await queue.put(msg)

async def consumer():
    while True:
        tick = await queue.get()
        handle_tick(tick)

This decoupling reduced back-pressure and preserved ordering.

My First Real Backtest vs. Live Feed Discrepancy

One unexpected lesson was how latency impacts strategy performance predictions.

My backtests assumed I was reacting to ticks immediately.
But in reality, I was 50–300 ms late.

That discrepancy meant:

Entries occurred too late.
Exits occurred too late.
Stops slipped more often than expected.

Even with the exact same logic, the live version performed worse simply because of timing.

This taught me that a trading strategy is not just algorithmic — it’s infrastructural.

If the infrastructure is slower than assumed, the strategy must be redesigned.

Experimenting With Multithreading and Multiprocessing

To push the boundaries further, I tested whether true parallelism would help.

Using Threads

Threads in Python are limited by the GIL, but certain operations (like I/O and JSON parsing under extensions) release the GIL. Threading helped slightly, but not significantly.

Using Processes

With multiprocessing, the tick throughput increased noticeably, especially when distributing tasks like:

signal generation
risk calculation
order routing simulation

Example architecture:

[WebSocket Process] → [Tick Queue] → [Strategy Worker] → [Order Router]

But processes introduced overhead too — mainly serialization cost. In a real prop shop environment, this architecture would likely be replaced with shared memory or a C++ microservice.

Still, as a Python experiment on a laptop, it provided a clean separation of responsibilities and clearer profiling.

The Most Important Lesson: Python Has Limits — But Also a Purpose

This project didn’t make me say, “Python is too slow for trading.”
Instead, it taught me:

Python is excellent for prototyping strategies.
Python is good for orchestration and tooling.
Python can handle light-latency pipelines with careful engineering.
But Python alone cannot match C++ or Rust in ultra-low-latency execution.

In other words:

Python is the glue. C++ is the engine.

This mirrors exactly how modern quant firms structure their stack:

Python for research, modeling, simulation, diagnostics
C++ for actual execution, market data handlers, and low-latency paths

Understanding where Python fits into a trading infrastructure is crucial — and this first bot helped me understand it intuitively.

My Final Architecture: Version 3 of the Bot

By the end of my iterations, my architecture looked like this:

┌───────────────────────────────┐
│     WebSocket Data Feed       │
└───────────────┬───────────────┘
                ▼
        [Async Data Ingestion]
                ▼
        Lock-Free Ring Buffer
                ▼
        [Multiprocess Pipeline]
                ├─ Strategy Worker
                ├─ Risk Worker
                └─ Logging Worker
                ▼
          Execution Simulator

Key improvements:

No console I/O
Faster JSON decoding
Event loop drift < 5 ms under load
Tick throughput increased from ~1,500 ticks/sec to 8,000+ ticks/sec
Live results more closely matched backtests

Was it HFT-grade? Of course not.
But it transformed my understanding of system design for trading.

What This Project Taught Me About Real Trading Infrastructure

Looking back, this early experiment taught me the foundational skills required for quantitative engineering roles:

✔ Working with high-frequency streaming data

Not all engineers have experience handling thousands of events per second while preserving order and avoiding back-pressure.

✔ Profiling and diagnosing latency issues

I now treat microseconds like currency. I can identify, measure, and eliminate bottlenecks at the event-loop, system, or networking level.

✔ Building clean, decoupled data pipelines

The producer/consumer model, multiprocessing pipelines, and clean separation of concerns are all skills that translate directly to trading systems.

✔ Understanding the interplay between strategy and infrastructure

A strategy’s success is not just mathematical — it depends on execution mechanics.

✔ Moving toward C++ for critical path logic

I now understand when Python is appropriate and when switching to compiled languages is mandatory.

These lessons are exactly what drive strong engineering in quantitative teams, especially those working between research, data, and infrastructure — the type of hybrid role this field demands.

Conclusion: The Start of a Much Bigger Journey

My first latency-sensitive Python trading bot was not profitable.
It wasn’t sophisticated.
And it certainly wasn’t low-latency by prop-shop standards.

But it changed the way I think as an engineer.

It forced me to see code as a real-time system.
It taught me to measure everything.
It showed me that great trading infrastructure is equal parts math, software architecture, and microsecond-level engineering.

That single project set the foundation for the work I now do — building more advanced strategy engines, machine-learning-powered flow predictors, and real-time monitoring dashboards.

And it all started with one simple WebSocket loop.