Building My First Latency-Sensitive Trading Bot in Python
Peter Bieda
Author
When I built my first real trading bot, I wasn’t trying to create anything complex or profitable. I was simply curious: How fast could I consume live ticks and react to them? How close could I get to “real-time” on a retail-level internet connection using only Python?
That simple experiment turned into my first practical exposure to latency engineering — and it taught me more about trading infrastructure than any book ever has.
This article breaks down the journey: what I built, where it failed, what I learned about latency, and how those lessons shaped the way I now design trading systems.
Why Latency Matters More Than You Think
Like most developers entering quantitative trading or market-driven systems, I knew “latency” was important. Everyone talks about microseconds like they’re dollars. But the actual impact doesn’t fully click until you see it in your own system.
In live trading:
- A delayed tick leads to a delayed decision.
- A delayed decision leads to a worse price.
- A worse price leads to a losing strategy.
Even if a strategy operates at the millisecond level instead of nanoseconds, engineering discipline around timing still matters. The trading bot I built taught me this the hard way.
My First Approach: The Asyncio Streaming Bot
My first version was intentionally simple. I used Python’s asyncio and aiohttp libraries to subscribe to a WebSocket feed and stream incoming ticks.
import asyncio
import aiohttp
import time
async def stream_ticks():
async with aiohttp.ClientSession() as session:
async with session.ws_connect("wss://api.exchange.example/ticks") as ws:
async for msg in ws:
tick = msg.json()
print(f"{tick['symbol']} {tick['price']:.2f} @ {time.time():.6f}")
asyncio.run(stream_ticks())It worked flawlessly in early tests — which is exactly why I trusted it.
But then I connected it to a real feed during market hours.
That’s when the truth showed up in blinking, ugly delays.
Instead of printing ticks with minimal drift, I started seeing 200–300 milliseconds of lag. Sometimes it even spiked above 500 ms when the feed was busy. In a market where prices can change multiple times per millisecond, my bot was effectively driving with half-second delayed eyesight.
The surprising part? Nothing in my logic was “slow.”
The event loop itself was the bottleneck.
The First Real Lesson: Async Isn’t a Magic Latency Cure
Python’s async model is cooperative, not preemptive. That means your code must yield control frequently or the loop gets backed up.
Under heavy tick flow:
- JSON parsing slowed down frame dispatch.
- Printing to console (a classic mistake) blocked the loop.
- Occasional Python GC pauses created micro-bursts of delay.
Even though each operation seemed small, together they created measurable latency.
This was my first real exposure to how microsecond-scale systems behave:
Latency isn’t created by big slowdowns. It’s created by dozens of tiny, invisible ones.
Profiling async code became one of the most valuable engineering skills I gained from this project.
Profiling the Event Loop: Seeing the Invisible
To understand what was actually happening, I instrumented the loop with timestamps:
prev = time.time()
async for msg in ws:
now = time.time()
drift = (now - prev) * 1000
prev = now
print(f"Loop drift: {drift:.3f} ms")
What I learned shocked me:
- Even when ticks were arriving quickly, the loop drift varied unpredictably.
- GC was causing occasional 5–10 ms spikes.
- Printing anything slowed down the loop significantly.
- JSON decoding could introduce 1–4 ms of jitter on its own.
In a normal backend system, these numbers are irrelevant.
In a trading system, they’re catastrophic.
This is when I realized latency engineering is a mindset, not a library.
Optimizing the Pipeline: What Actually Worked
After identifying the bottlenecks, I iterated through several improvements.
1. Removing All Stdout Logging
Printing to console — even once per second — destroys latency.
I replaced print() with a lock-free ring buffer logger that flushed asynchronously. The improvement was immediate: tens of milliseconds of jitter disappeared.
2. Switching to Faster JSON Parsing
Python’s built-in json module is convenient but slow.
Using orjson or ujson reduced decode times drastically.
Example:
import orjson
tick = orjson.loads(msg.data)
This alone cut ~2 ms per tick during heavy flow.
3. Pre-Allocating Objects and Avoiding GC Pressure
I moved temporary objects outside the loop and reused them where possible.
I also called:
import gc
gc.disable()
With GC disabled, jitter became more predictable.
(Obviously, in real systems GC must be handled more carefully, but this was a prototype.)
4. Offloading Tick Processing to a Separate Task
Instead of doing everything in the WebSocket loop, I pushed ticks into an asyncio.Queue and processed them elsewhere:
queue = asyncio.Queue()
async def producer():
async for msg in ws:
await queue.put(msg)
async def consumer():
while True:
tick = await queue.get()
handle_tick(tick)
This decoupling reduced back-pressure and preserved ordering.
My First Real Backtest vs. Live Feed Discrepancy
One unexpected lesson was how latency impacts strategy performance predictions.
My backtests assumed I was reacting to ticks immediately.
But in reality, I was 50–300 ms late.
That discrepancy meant:
- Entries occurred too late.
- Exits occurred too late.
- Stops slipped more often than expected.
Even with the exact same logic, the live version performed worse simply because of timing.
This taught me that a trading strategy is not just algorithmic — it’s infrastructural.
If the infrastructure is slower than assumed, the strategy must be redesigned.
Experimenting With Multithreading and Multiprocessing
To push the boundaries further, I tested whether true parallelism would help.
Using Threads
Threads in Python are limited by the GIL, but certain operations (like I/O and JSON parsing under extensions) release the GIL. Threading helped slightly, but not significantly.
Using Processes
With multiprocessing, the tick throughput increased noticeably, especially when distributing tasks like:
- signal generation
- risk calculation
- order routing simulation
Example architecture:
[WebSocket Process] → [Tick Queue] → [Strategy Worker] → [Order Router]
But processes introduced overhead too — mainly serialization cost. In a real prop shop environment, this architecture would likely be replaced with shared memory or a C++ microservice.
Still, as a Python experiment on a laptop, it provided a clean separation of responsibilities and clearer profiling.
The Most Important Lesson: Python Has Limits — But Also a Purpose
This project didn’t make me say, “Python is too slow for trading.”
Instead, it taught me:
- Python is excellent for prototyping strategies.
- Python is good for orchestration and tooling.
- Python can handle light-latency pipelines with careful engineering.
- But Python alone cannot match C++ or Rust in ultra-low-latency execution.
In other words:
Python is the glue. C++ is the engine.
This mirrors exactly how modern quant firms structure their stack:
- Python for research, modeling, simulation, diagnostics
- C++ for actual execution, market data handlers, and low-latency paths
Understanding where Python fits into a trading infrastructure is crucial — and this first bot helped me understand it intuitively.
My Final Architecture: Version 3 of the Bot
By the end of my iterations, my architecture looked like this:
┌───────────────────────────────┐
│ WebSocket Data Feed │
└───────────────┬───────────────┘
▼
[Async Data Ingestion]
▼
Lock-Free Ring Buffer
▼
[Multiprocess Pipeline]
├─ Strategy Worker
├─ Risk Worker
└─ Logging Worker
▼
Execution Simulator
Key improvements:
- No console I/O
- Faster JSON decoding
- Event loop drift < 5 ms under load
- Tick throughput increased from ~1,500 ticks/sec to 8,000+ ticks/sec
- Live results more closely matched backtests
Was it HFT-grade? Of course not.
But it transformed my understanding of system design for trading.
What This Project Taught Me About Real Trading Infrastructure
Looking back, this early experiment taught me the foundational skills required for quantitative engineering roles:
✔ Working with high-frequency streaming data
Not all engineers have experience handling thousands of events per second while preserving order and avoiding back-pressure.
✔ Profiling and diagnosing latency issues
I now treat microseconds like currency. I can identify, measure, and eliminate bottlenecks at the event-loop, system, or networking level.
✔ Building clean, decoupled data pipelines
The producer/consumer model, multiprocessing pipelines, and clean separation of concerns are all skills that translate directly to trading systems.
✔ Understanding the interplay between strategy and infrastructure
A strategy’s success is not just mathematical — it depends on execution mechanics.
✔ Moving toward C++ for critical path logic
I now understand when Python is appropriate and when switching to compiled languages is mandatory.
These lessons are exactly what drive strong engineering in quantitative teams, especially those working between research, data, and infrastructure — the type of hybrid role this field demands.
Conclusion: The Start of a Much Bigger Journey
My first latency-sensitive Python trading bot was not profitable.
It wasn’t sophisticated.
And it certainly wasn’t low-latency by prop-shop standards.
But it changed the way I think as an engineer.
It forced me to see code as a real-time system.
It taught me to measure everything.
It showed me that great trading infrastructure is equal parts math, software architecture, and microsecond-level engineering.
That single project set the foundation for the work I now do — building more advanced strategy engines, machine-learning-powered flow predictors, and real-time monitoring dashboards.
And it all started with one simple WebSocket loop.