Building an HFT Data Pipeline on Solana: A Reference Architecture

Think in stages and legs

Every low-latency Solana system, under the branding, is the same pipeline:

pipeline.txt

        [ source ]                                   [ leader ]
            │  ① network leg                              ▲
            ▼                                             │ ⑤ network leg
   ┌────────────────┐   ┌──────────┐   ┌──────────┐   ┌───┴──────┐
   │ ② ingest       │──▶│ ③ decode │──▶│ ④ strategy│──▶│ execute  │
   │ shreds/gRPC/RPC│   │ normalize│   │ evaluate │   │ TPU/Jito │
   └────────────────┘   └──────────┘   └──────────┘   └──────────┘
                                                           │
                                                           ▼
                                                 ⑥ reconcile + observe

Two of those arrows are network legs - source-to-you and you-to-leader - and they're pure distance. The boxes in the middle are compute - your code, your decode, your strategy. You optimize the boxes with engineering; you optimize the legs with location. A pipeline that's fast in the middle and slow on the legs still loses.

Your pipeline is compute bracketed by distance. You refactor the compute; you can only relocate the distance. Most teams optimize the half they can see and ignore the half that's actually costing them.

① + ② Ingest: read the right source for the job

The ingest stage is where most of the edge is won or lost, because it decides how early you see the world. Don't pick one source - layer them:

Source	Gives you	Use for
Decoded shreds	First-seen, pre-confirmation transactions	The signal that triggers a strategy
Yellowstone gRPC	Live account/slot state, pushed, at scale	Tracking pool state, balances, markets
JSON-RPC	Confirmed/finalized truth, writes, history	Settlement, reconciliation, backfill

Shreds give you the earliest possible trigger (see Reading Decoded Transactions Before Confirmation), gRPC keeps your view of live state current without polling (see Why RPC Polling Can't Keep Up), and RPC anchors you to finalized truth. Aggregate shreds from multiple upstreams so you take whichever copy of a slot arrives first - redundancy that doubles as latency insurance.

ingest.ts

// first-seen signal from shreds; live state from gRPC
const shreds = await rpcedge.subscribe({ source: "shreds", decode: true,
  filter: { programs: [RAYDIUM, PUMP_FUN] } });
const state = await rpcedge.subscribe({ source: "geyser",
  accounts: { pools: { owner: [RAYDIUM_AMM] } }, commitment: "processed" });

③ Decode and normalize

Raw transactions aren't strategy input - you want instructions, accounts, and program-level meaning. The decode stage reassembles entries, deserializes transactions, and decodes them against the relevant program layouts into a normalized shape your strategy understands.

Two rules keep this stage off your critical path:

Decode close to the data. If decoding happens server-side, co-located, you receive ready-to-use objects instead of raw bytes you still have to process. rpc edge decodes shreds before they hit your gRPC stream for exactly this reason.
Normalize once. Convert every source - shred-decoded, gRPC, RPC - into one internal event shape early, so your strategy never branches on where data came from.

④ Strategy: the part that's actually yours

Everything else in the pipeline exists to get clean events to this box as early as possible. Keep it that way: the strategy stage should be doing decision work, not parsing, not waiting on I/O, not re-deriving state it could have cached. Pre-compute what you can, keep hot state in memory, and treat every allocation on the hot path as latency.

This is also where the first-seen discipline lives: react to pre-confirmation events for speed, but tag those actions as provisional so the reconcile stage knows to verify them.

⑤ Execute: deliver, don't broadcast

Output is a delivery race. Resolve the current and upcoming leaders from the schedule and send straight to their TPUs, or submit a Jito bundle when you need atomicity, ordering, or to bid for inclusion - the full breakdown is in Landing Transactions: Leader Paths and the Jito Block Engine. Fee dynamically, own your retries, and keep this leg short by being close to the leaders.

⑥ Reconcile and observe

A pipeline that acts on first-seen data must close the loop. Reconcile every provisional action against a finalized read: did it land? On the winning fork? At the price you assumed? Anything that didn't needs an explicit path - unwind, hedge, or cancel.

And you can't optimize what you don't measure. Stamp each event as it moves through the stages so you can see where latency accrues - source-to-decode, decode-to-decision, decision-to-send. Most teams discover their real bottleneck isn't where they assumed.

The latency budget, honestly

Add it up and your end-to-end latency is: source→you network + decode + strategy + you→leader network. You can shave the decode and strategy boxes with good engineering - and you should. But the two network legs are set by geography, and they often dominate. This is the uncomfortable truth most "fast RPC" pitches skip: a brilliantly optimized pipeline a region away from the cluster loses to an ordinary one that's co-located.

That's the entire premise of rpc edge: rack the ingest, decode, and send stages beside Solana stake clusters and the Jito Block Engine, so the legs you can't refactor are as short as physics allows.

Build your pipeline next to the cluster.

rpc edge gives you co-located shreds, gRPC, and a transaction sender - the ingest and execution legs, already short.

View plans & pricing →

The takeaway

An HFT pipeline on Solana is five stages bracketed by two network legs. Read the right source per job, decode close to the data, keep the strategy box doing only decisions, deliver straight to the leaders, and reconcile against finality. Optimize the compute with engineering - and the legs with location. Get both right and you're not just reacting to Solana; you're early to it, every slot.

Frequently asked questions

What are the stages of an HFT data pipeline on Solana?

Ingest (shreds + gRPC + RPC), decode and normalize, strategy evaluation, execution (direct TPU or Jito bundles), and reconcile/observe. Each stage adds latency, so each is a place to optimize - and co-location shrinks the network legs that bracket the whole thing.

What's a realistic latency budget for Solana HFT?

Think in legs: source-to-you network time, decode time, strategy compute, and you-to-leader send time. Co-location compresses the two network legs; lean decoding and tight strategy code handle the middle. The goal is to remove every millisecond you didn't choose to spend.

Should I use shreds, gRPC, or RPC?

All three, for different jobs. Shreds give first-seen signal, gRPC gives live account state at scale, and RPC gives confirmed truth and handles writes and history. A serious pipeline layers them rather than forcing one to do everything.

Why does co-location matter so much for a trading pipeline?

Two of your pipeline's legs are pure network distance - source-to-you and you-to-leader. No amount of faster software removes geography. Racking the pipeline beside the stake clusters and the Jito Block Engine cuts those legs to the minimum physics allows.

How do I make a pre-confirmation pipeline safe?

Separate signal from settlement: act on first-seen data for speed, but reconcile every action against a finalized commitment, bound your exposure to survive reorgs, and make the pipeline idempotent so re-seen data converges instead of double-acting.

Frequently asked questions

Related articles

Solana Arbitrage Bot Infrastructure: What You Actually Need

How to Choose a Solana RPC Provider (2026 Guide)

What Are Solana Shreds? The Fastest Way to Read On-Chain Data

Ready to read Solana at cluster speed?