Building a Home Lab Trading System: Architecture of a Personal Market Intelligence Platform¶

A deep dive into the architecture of a personal cryptocurrency market intelligence system built for a home lab environment. This post explores the design decisions, technical challenges, and lessons learned from building a real-time data processing pipeline that combines microservices, machine learning, and cloud-hybrid execution.

The Vision: Personal Market Intelligence¶

The cryptocurrency market operates 24/7, generating millions of data points per day. Professional trading firms spend millions on infrastructure to process this data in real-time. But what if you could build a sophisticated market intelligence system in your own home lab?

This project, which I've been developing over the past several months, attempts to answer that question. The goal wasn't to build a "get rich quick" trading bot, but rather to create a comprehensive platform for:

Real-time market data ingestion from multiple exchanges
Signal processing with 60+ technical and microstructure indicators
Machine learning inference for market regime detection
Risk management with position sizing and circuit breakers
Full observability through monitoring and alerting

High-Level Architecture¶

The system follows a microservices architecture pattern, running entirely in Docker containers on a single home server with a dedicated GPU for ML training.

flowchart TB
    subgraph External["External Data Sources"]
        EX1[("Exchange APIs<br/>(WebSocket)")]
        EX2[("Market Data<br/>(REST)")]
        EX3[("Sentiment<br/>Feeds")]
    end

    subgraph Ingestion["Data Ingestion Layer"]
        COL1["Real-time<br/>Collectors"]
        COL2["Historical<br/>Backfill"]
        COL3["Macro Data<br/>Collector"]
    end

    subgraph Processing["Processing Layer"]
        REDIS[("Redis<br/>Hot State")]
        SIG["Signal<br/>Engine"]
        RISK["Risk<br/>Engine"]
    end

    subgraph Storage["Persistence Layer"]
        PG[("TimescaleDB<br/>Cold Storage")]
        PERSIST["Persister<br/>Service"]
    end

    subgraph ML["ML Pipeline"]
        TRAIN["Model<br/>Training"]
        INF["Real-time<br/>Inference"]
        GPU["GPU<br/>(CUDA)"]
    end

    subgraph App["Application Layer"]
        API["REST API<br/>(FastAPI)"]
        UI["Dashboard<br/>(Next.js)"]
    end

    subgraph Monitoring["Observability"]
        PROM["Prometheus"]
        GRAF["Grafana"]
        ALERT["Discord/Telegram<br/>Alerts"]
    end

    EX1 & EX2 & EX3 --> COL1 & COL2 & COL3
    COL1 & COL2 & COL3 --> REDIS
    REDIS --> SIG
    SIG --> RISK
    SIG --> PERSIST
    PERSIST --> PG
    REDIS <--> INF
    PG --> TRAIN
    TRAIN --> GPU
    TRAIN --> INF
    INF --> RISK
    REDIS --> API
    PG --> API
    API --> UI
    SIG & RISK & INF --> PROM
    PROM --> GRAF
    RISK --> ALERT

The Data Ingestion Challenge¶

Real-time WebSocket Streams¶

The first challenge was reliably ingesting real-time market data. Exchange WebSocket APIs provide tick-by-tick trade data, order book snapshots, and other market microstructure information.

sequenceDiagram
    participant Exchange
    participant Collector
    participant Redis
    participant SignalEngine
    participant Persister
    participant DB

    Exchange->>Collector: WebSocket Stream<br/>(trades, depth, funding)

    activate Collector
    Collector->>Redis: Publish to channels<br/>(market.trades.*, market.depth.*)
    deactivate Collector

    Redis-->>SignalEngine: Subscribe
    Redis-->>Persister: Subscribe

    activate SignalEngine
    SignalEngine->>SignalEngine: Calculate Indicators
    SignalEngine->>Redis: Publish signals<br/>(signals.*)
    deactivate SignalEngine

    activate Persister
    Persister->>Persister: Batch accumulation<br/>(5-second window)
    Persister->>DB: Bulk INSERT
    deactivate Persister

Key design decisions for the ingestion layer:

Redis as the event bus: All real-time data flows through Redis Pub/Sub channels, decoupling producers from consumers
Batch persistence: Rather than writing every tick to the database, the persister accumulates data in 5-second windows before bulk inserting
Automatic reconnection: Collectors implement exponential backoff and automatic reconnection when WebSocket connections drop
Gap detection: A separate health monitor detects data gaps and triggers backfill jobs

Historical Data Management¶

For machine learning, you need historical data - lots of it. The system maintains several years of minute-level price data, along with derived signals.

flowchart LR
    subgraph Sources["Data Sources"]
        REST["Exchange REST API<br/>(Historical)"]
        LOCAL["Local Archives<br/>(CSV/Parquet)"]
    end

    subgraph Pipeline["Backfill Pipeline"]
        DETECT["Gap<br/>Detector"]
        BACKFILL["Backfill<br/>Worker"]
        VALIDATE["Data<br/>Validator"]
    end

    subgraph Storage["Storage"]
        TS[("TimescaleDB<br/>Hypertables")]
        COMPRESS["Compression<br/>Policies"]
    end

    REST --> BACKFILL
    LOCAL --> BACKFILL
    DETECT --> BACKFILL
    BACKFILL --> VALIDATE
    VALIDATE --> TS
    TS --> COMPRESS

    style COMPRESS fill:#4a90d9

TimescaleDB's hypertables and compression policies are essential here - without compression, several years of minute-level data would consume hundreds of gigabytes.

The Signal Processing Engine¶

The heart of the system is the signal processing engine, which calculates 60+ indicators in real-time.

Signal Categories¶

The signals are organized into functional categories:

mindmap
  root((Signal<br/>Engine))
    Technical
      Momentum
        RSI
        MACD
        ADX
      Volatility
        ATR
        Bollinger
        Keltner
      Volume
        OBV
        VWAP
        Volume Profile
    Microstructure
      Order Flow
        CVD
        OFI
        Taker Imbalance
      Book Analysis
        Depth Imbalance
        Liquidity Levels
    Research-Based
      Statistical
        Funding Z-Score
        OI Divergence
      Advanced
        Hawkes Process
        Conditional OI
    Composite
      Multi-Signal
        Confluence Score
        MTF Alignment

The Base Signal Pattern¶

Each signal inherits from a base class that provides:

Standardized calculation interface
Hyperparameter registration for optimization
Redis caching for intermediate results
Metrics exposure for monitoring

# Conceptual pattern (simplified)
class Signal:
    def __init__(self):
        self.hyperparameters = []

    def register_hyperparameters(self):
        """Define tunable parameters"""
        pass

    async def calculate(self, df: pd.DataFrame) -> float:
        """Core calculation logic"""
        raise NotImplementedError

Multi-Timeframe Analysis¶

One of the more interesting patterns is multi-timeframe (MTF) confluence detection. The system calculates signals across multiple timeframes and looks for alignment:

flowchart TB
    subgraph Timeframes["Timeframe Hierarchy"]
        TF1["1-minute<br/>(Tactical)"]
        TF2["1-hour<br/>(Short-term)"]
        TF3["4-hour<br/>(Medium-term)"]
        TF4["Daily<br/>(Strategic)"]
    end

    subgraph Analysis["MTF Analysis"]
        CALC["Signal<br/>Calculation"]
        ALIGN["Alignment<br/>Detection"]
        CONF["Confluence<br/>Score"]
    end

    TF1 & TF2 & TF3 & TF4 --> CALC
    CALC --> ALIGN
    ALIGN --> CONF

    CONF -->|"All timeframes<br/>aligned"| STRONG["Strong Signal<br/>(High confidence)"]
    CONF -->|"Mixed<br/>signals"| WEAK["Weak Signal<br/>(Low confidence)"]

The ML Pipeline¶

Architecture¶

The ML pipeline uses modern time-series forecasting techniques to predict market direction and volatility regimes.

flowchart TB
    subgraph Data["Data Preparation"]
        RAW[("Historical<br/>Data")]
        FE["Feature<br/>Engineering"]
        SPLIT["Train/Val/Test<br/>Split"]
    end

    subgraph Training["Training Pipeline"]
        LOADER["Data<br/>Loader"]
        MODEL["Time-Series<br/>Model"]
        OPT["Hyperparameter<br/>Optimization"]
        TRACK["Experiment<br/>Tracking"]
    end

    subgraph Deploy["Deployment"]
        REG["Model<br/>Registry"]
        SERVE["Feature<br/>Store"]
        INF["Inference<br/>Service"]
    end

    RAW --> FE
    FE --> SPLIT
    SPLIT --> LOADER
    LOADER --> MODEL
    MODEL <--> OPT
    MODEL --> TRACK
    TRACK --> REG
    REG --> INF
    SERVE --> INF

    GPU["GPU<br/>(Training)"] -.-> MODEL

Feature Engineering¶

The feature engineering pipeline transforms raw market data and signals into model-ready features:

Feature Type	Examples	Purpose
Time-varying known	Hour of day, day of week, minutes to funding	Temporal patterns
Time-varying unknown	Price, volume, signals	Core predictive features
Static	Average daily volume, historical volatility	Per-asset characteristics

Training Infrastructure¶

Training runs on a dedicated GPU in the home lab. The system uses:

Experiment tracking for comparing model versions
Hyperparameter optimization with Bayesian search
Model versioning for rollback capability

TPU Training on Kaggle: The Cloud Factory Pattern¶

For computationally intensive experiments, the home GPU isn't always enough. The project leverages Kaggle's free TPU v5e (8-core) for training cutting-edge architectures that would take days on a single GPU.

flowchart LR
    subgraph Home["Home Lab"]
        RAW["Raw Data<br/>(15GB JSON)"]
        CONVERT["Vectorized<br/>Parquet"]
    end

    subgraph Kaggle["Kaggle Cloud"]
        DATASET[("Dataset<br/>mercuryg-lob-data")]
        TPU["TPU v5e<br/>(8-core)"]
        MAMBA["Mamba SSM<br/>Training"]
        WANDB["W&B<br/>Monitoring"]
    end

    subgraph Output["Artifacts"]
        WEIGHTS["Model<br/>Weights"]
        ONNX["ONNX<br/>Export"]
    end

    RAW --> CONVERT
    CONVERT -->|"Upload"| DATASET
    DATASET --> TPU
    TPU --> MAMBA
    MAMBA --> WANDB
    MAMBA --> WEIGHTS
    WEIGHTS --> ONNX

    style TPU fill:#4285f4
    style Kaggle fill:#20beff22

The "Cloud Factory" pattern works as follows:

Local preprocessing: Convert 15GB of raw order book JSON to 42 vectorized Parquet files (4.2M snapshots)
Upload to Kaggle: The processed dataset becomes a reusable Kaggle dataset
TPU training: Run training kernels on TPU v5e with torch_xla
Live monitoring: Weights & Biases tracks loss curves and metrics
Download weights: Retrieve the trained model for local ONNX conversion

The Mamba State Space Model¶

The latest experiments use Mamba, a state-space model architecture that's particularly well-suited for sequential financial data:

flowchart TB
    subgraph MambaBlock["Mamba Block"]
        IN["Input"] --> PROJ["Linear Projection"]
        PROJ --> CONV["1D Conv<br/>(Depthwise)"]
        CONV --> SSM["State Space<br/>Model"]
        SSM --> OUT["Output"]

        PROJ -->|"Gate"| GATE["SiLU Gate"]
        GATE --> OUT
    end

    subgraph SSM["Selective State Space"]
        DT["Δt (Discretization)"]
        A["A (State Matrix)"]
        B["B (Input Matrix)"]
        SCAN["Parallel<br/>Associative Scan"]
    end

    style SSM fill:#6a4a6a

Why Mamba for trading?

Property	Benefit for Trading
Linear complexity	O(L) vs O(L²) for Transformers - handles long sequences
Selective state	Learns which past information to retain or forget
Hardware efficient	Optimized for TPU/GPU with parallel scan algorithms
Continuous-time	Natural fit for irregularly-sampled tick data

TPU Optimizations:

The training script includes several TPU-specific optimizations:

Parallel Associative Scan: Hillis-Steele algorithm for O(log L) SSM computation
XLA compilation: torch_xla with PJRT runtime for TPU VM
RMSNorm: More stable than LayerNorm on TPU at scale
Zero-allocation loops: Pre-allocated buffers to avoid XLA recompilation

# Hillis-Steele parallel scan (simplified)
def parallel_scan_mamba(dA, dB, x):
    """Autograd-safe parallel scan for TPU."""
    i = 1
    while i < seq_len:
        # Recursive doubling - no in-place ops for XLA
        new_b = dA[:, i:] * b[:, :-i] + b[:, i:]
        b = torch.cat([b[:, :i], new_b], dim=1)
        i *= 2
    return b

This approach enables training on millions of order book snapshots in hours rather than days.

The "Split Brain" Cloud Architecture¶

One of the most interesting architectural decisions was the "Split Brain" pattern for execution.

The Problem¶

The home lab has a GPU for training, but it's physically far from exchange servers. Low-latency execution requires proximity to exchange data centers. Sending Level 2 order book data from Tokyo to a home lab introduces ~200ms of latency - an eternity in high-frequency trading.

The Solution¶

Separate the "brain" (strategy/ML) from the "hands" (execution), deploying a lightweight executor in AWS Tokyo (ap-northeast-1), close to exchange matching engines.

flowchart TB
    subgraph Home["Home Lab (Brain)"]
        GPU["GPU Training"]
        STRAT["Strategy<br/>Service"]
        REGIME["Regime<br/>Detection"]
    end

    subgraph AWS["AWS Tokyo (Hands)"]
        EC2["EC2 t4g.micro<br/>(ARM64/Graviton2)"]
        RUST["Rust Executor<br/>(Tokio)"]
        ONNX["ONNX Runtime<br/>(Inference)"]
    end

    subgraph Exchange["Exchange (Tokyo)"]
        WS["WebSocket<br/>L2 Data"]
        API["Order<br/>API"]
    end

    subgraph Sync["State Synchronization"]
        S3[("S3 Bucket<br/>Models + Context")]
    end

    GPU --> STRAT
    STRAT --> REGIME
    REGIME -->|"Push context.json<br/>(every 5 min)"| S3
    S3 -->|"Pull models<br/>+ context"| RUST

    WS -->|"Real-time<br/>depth data"| RUST
    RUST --> ONNX
    ONNX --> RUST
    RUST -->|"<10ms"| API

    style Home fill:#2d5a27
    style AWS fill:#1e3a5f
    style Exchange fill:#5a2d27

AWS Infrastructure (Free Tier Friendly)¶

The cloud infrastructure is managed with Terraform and designed for minimal cost:

Component	Choice	Rationale
Compute	EC2 `t4g.micro` (ARM64)	Graviton2 efficiency, free tier eligible
Storage	S3 Standard	ONNX models + context files
Container Registry	ECR	Docker image storage
Access	SSM Session Manager	Zero open ports (no SSH)
IP	Elastic IP	Required for exchange API whitelisting

The infrastructure is self-healing: if the instance is terminated, Terraform recreates it and a User Data script automatically pulls the latest code bundle from S3 and restarts the executor.

The Rust Tokyo Executor¶

The execution layer is written in Rust for maximum performance. Python is great for ML research, but for sub-millisecond tick processing, Rust's zero-cost abstractions and predictable latency are essential.

flowchart LR
    subgraph DataSources["Market Data"]
        BN["Binance<br/>WebSocket"]
        CB["Coinbase<br/>WebSocket"]
    end

    subgraph RustExecutor["Rust Executor (Tokio)"]
        AGG["Aggregator<br/>Actor"]
        SIG["Signal<br/>Engine"]
        TAM["TamTam<br/>Pipeline"]
        INF["ONNX<br/>Inference"]
        EXEC["Execution<br/>Actor"]
    end

    subgraph Output["Actions"]
        ORD["Orders"]
        MET["Metrics"]
    end

    BN & CB -->|"MicroTicks"| AGG
    AGG --> SIG
    AGG --> TAM
    SIG --> INF
    TAM -->|"Anomaly<br/>Detection"| EXEC
    INF -->|"ML Signal"| EXEC
    EXEC --> ORD
    EXEC --> MET

    style RustExecutor fill:#b7410e

Key Rust Crates in the Workspace:

Crate	Purpose
`core`	Domain types and shared logic
`features`	Signal engine (RSI, Z-Score) ported to Rust
`execution`	Main async runtime with Tokio actors
`aggregator`	Tick aggregation and snapshot latching
`reservoir`	Echo State Network for temporal patterns
`detector`	Multivariate streaming anomaly detection
`tamtam`	Novel pipeline combining ESN + Gaussian detector

Performance Optimizations:

jemalloc: Custom allocator for predictable memory behavior
SIMD JSON: Fast parsing of WebSocket messages
Zero-allocation hot paths: Pre-allocated buffers in the tick processing loop
Actor model: Tokio channels for concurrent, non-blocking data flow

The TamTam Pipeline¶

One of the more experimental components is the "TamTam" pipeline - a combination of reservoir computing and streaming anomaly detection:

flowchart LR
    TICK["Market<br/>Tick"] --> FEAT["Feature<br/>Extraction"]
    FEAT --> ESN["Echo State<br/>Network"]
    ESN -->|"Reservoir<br/>State"| GAUSS["Streaming<br/>Gaussian"]
    GAUSS -->|"Anomaly<br/>Score > 3σ"| ALERT["Stealth<br/>Order Logic"]

    style ESN fill:#4a4a6a
    style GAUSS fill:#6a4a4a

The idea is to use the Echo State Network as a "tuning fork" that resonates with normal market patterns. When the market behaves unusually, the reservoir state deviates, and the Multivariate Streaming Gaussian detector flags it as an anomaly.

State Synchronization¶

The Brain and Hands communicate asynchronously via S3:

Direction	File	Contents	Frequency
Brain → Hands	`context.json`	Regime, bias, thresholds	Every 5 min
Brain → Hands	`*.onnx`	ML models	On retrain
Hands → Brain	`positions.json`	Current inventory	On change

This design means the executor can operate autonomously for extended periods, even if the home lab goes offline. It simply continues executing within its last-known strategic context.

Monitoring and Observability¶

For a system running 24/7, observability is critical. The monitoring stack includes:

flowchart LR
    subgraph Services["Services"]
        S1["Collectors"]
        S2["Signal Engine"]
        S3["ML Service"]
        S4["API"]
    end

    subgraph Exporters["Metric Exporters"]
        E1["Redis Exporter"]
        E2["Postgres Exporter"]
        E3["cAdvisor"]
    end

    subgraph Monitoring["Monitoring Stack"]
        PROM["Prometheus"]
        GRAF["Grafana"]
        WATCH["Watchtower<br/>(Health Monitor)"]
    end

    subgraph Alerting["Alerting"]
        DISCORD["Discord"]
        TELEGRAM["Telegram"]
    end

    S1 & S2 & S3 & S4 -->|"/metrics"| PROM
    E1 & E2 & E3 --> PROM
    PROM --> GRAF
    WATCH -->|"Dead service<br/>detection"| DISCORD
    PROM -->|"Alert rules"| TELEGRAM

    style Monitoring fill:#4a4a6a

The Watchtower Pattern¶

Services write heartbeats to Redis. A dedicated "Watchtower" service polls these heartbeats and flags services that haven't updated within the threshold:

sequenceDiagram
    participant Service
    participant Redis
    participant Watchtower
    participant Discord

    loop Every 5 seconds
        Service->>Redis: SET health:service-id:heartbeat<br/>(timestamp)
    end

    loop Every 30 seconds
        Watchtower->>Redis: GET health:*:heartbeat
        Redis-->>Watchtower: All heartbeat timestamps

        alt Heartbeat > 30s old
            Watchtower->>Discord: ALERT: Service X is DOWN
            Watchtower->>Watchtower: Update Prometheus metric<br/>(service_health_status=0)
        end
    end

Lessons Learned¶

What Worked Well¶

Redis as the central nervous system: Using Redis for both pub/sub and caching simplified the architecture significantly
Containerization from day one: Every service runs in Docker, making deployment and scaling straightforward
Comprehensive testing: With 900+ tests, refactoring is much less risky
Incremental development: Building signals and features incrementally allowed for continuous validation

Challenges Encountered¶

Data quality issues: "Garbage in, garbage out" is very real. Significant effort went into data validation and gap detection
Feature engineering: The ML model is only as good as its features - this required substantial iteration
Complexity management: With 20+ services, understanding the system requires good documentation

Future Directions¶

The system continues to evolve. Current areas of exploration include:

Advanced order book modeling: Using Level 2 data for better execution
State-space models: Experimenting with newer architectures for time-series prediction
Reinforcement learning: For execution optimization rather than signal generation

Conclusion¶

Building a personal trading system has been an incredible learning experience. It touches on:

Distributed systems design
Real-time data processing
Machine learning pipelines
DevOps and monitoring
Financial data engineering

Whether or not the system ever generates alpha, the engineering challenges alone make it a rewarding project. The architecture patterns developed here - event-driven microservices, split-brain cloud deployment, comprehensive observability - are applicable far beyond trading.

If you're considering a similar project, my advice would be:

Start simple: Get data flowing before worrying about ML
Invest in observability early: You'll thank yourself later
Automate testing: With financial data, bugs can be expensive
Document as you go: Your future self is your most important reader

This post describes a personal project built for educational purposes. Nothing here constitutes financial advice.