System Architecture | Ultra-Low Latency HFT Simulator

Pipeline Overview

The system is a pipeline of six components connected by lock-free SPSC (Single-Producer Single-Consumer) queues. Each component can be pinned to a dedicated CPU core for maximum throughput and minimum latency.

Market Data Feed (FIX Protocol)
      |
      v
[Market Data Handler] --SPSC queue--> [Order Book Engine]
      (Core 2)                              (Core 4)
                                                |
                                                v
                                       [Strategy Engine] --> [Execution Engine]
                                          (Core 6)              (Core 8)
                                                |
                                          [Risk Manager]
                                        (pre-trade checks)
                                                |
                                       [Performance Monitor]
                                          (Core 10)

Data Flow

FIX messages arrive from the market data feed
Zero-copy parser extracts fields via string_view into the raw buffer
Parsed messages pass through a lock-free SPSC queue to the order book
Order book updates price levels using price-time priority
Updated book state triggers strategy signal generation
Signals pass through the risk manager for pre-trade validation (~20ns)
Approved orders route to the execution engine via another SPSC queue
Execution reports feed back for position tracking

Threading Model

Thread	Core	Components	Communication
Main	2	MD Handler + Order Book + Strategy + Risk	Produces to order queue
Execution	8	Execution Engine	Consumes order queue, produces exec reports
Logger	any	Background log drain	Consumes log queue

The main hot loop runs market data parsing, order book updates, strategy signal generation, and risk checks on a single thread to minimize queue hops. The execution engine runs on a separate thread since it simulates exchange latency.

Key Data Structures

Order Book

Bids: std::map<Price, PriceLevel, std::greater<>> (descending by price)
Asks: std::map<Price, PriceLevel> (ascending by price)
Order lookup: std::unordered_map<OrderId, OrderBookEntry*> for O(1) cancel
Price levels: intrusive doubly-linked list of orders (O(1) insert/remove)
Trade output: std::span over thread_local static array (zero allocation)

Fixed-Point Prices

All prices stored as int64_t with 2 decimal places. For example, $150.50 is stored as 15050. This eliminates floating-point overhead and enables exact comparison — critical for price-time priority matching.

Lock-Free SPSC Ring Buffer

Power-of-2 capacity for efficient index masking (index & (Capacity - 1))
Producer: write data, then store(tail, memory_order_release)
Consumer: load(tail, memory_order_acquire), then read data
Head and tail on separate cache lines (alignas(64)) to prevent false sharing
Heap-allocated buffer (single allocation at construction, not on hot path)

Memory Pool

Contiguous pre-allocated array (single heap allocation at startup)
Index-based intrusive free list
O(1) allocate: pop from free list head
O(1) deallocate: push to free list head
Single-threaded — no atomics overhead needed

Design Decisions

Cache-Line Alignment

alignas(64) on all hot data structures prevents false sharing between CPU cores. Head and tail indices of the SPSC queue are on separate cache lines.

Acquire/Release Ordering

Uses the minimum necessary memory ordering. seq_cst is never used — it adds unnecessary overhead. Own-index loads use relaxed since only one thread writes.

Zero-Copy Parsing

FIX parser uses string_view into the raw buffer. Tag lookup via a flat array indexed by tag number gives O(1) field access.

No Exceptions

Compiled with -fno-exceptions -fno-rtti. All error handling via return codes. Eliminates exception table overhead and enables better code generation.