Skip to main content

Designing a Ride-Hailing System

Yash Sachdeva
Author
Yash Sachdeva
Software Engineer | Turning Complex Problems into Simple Solutions
Software engineer by trade, problem solver by nature. I write about the systems I build in my free time and the experiences that shape them.

Introduction
#

As part of this post, we will be designing a real-time matching and dispatch system for a ride-hailing platform focusing on geospatial indexing, low-latency matching, streaming data, pricing and consistency trade-offs.

1. Requirements
#

Functional Requirements
#

Rider:

  • Request a ride (pickup, dropoff, ride type)
  • Get quote with ETA and estimated price
  • Confirm ride, see assigned driver and live location

Driver:

  • Receive ride offers, accept or reject.
  • Stream location while en route and during trip.
  • Go online/offline.

System:

  • Select a best driver (multi-factor, not just nearest)
  • Compute ETAs for pickup and drop
  • Run dynamic pricing / surge per area.
  • Persist drips, payments, ratings and audit logs.

Non-Functional Requirements
#

Key tradeoffs:

  • Availability over strong consistency on read paths: Riders should almost always see some ETA and price, even if it’s slightly stale.
  • Low latency: Matching decision < 200 - 300ms in typical load. Location update end-to-end (driver -> rider map) in < 500ms.
  • Scalability: Handle >= 1M GPS updates globally. Scale horizontally across regions and data centers.
  • Correctness: A driver should never be fully committed to two trips at once - strong consistency around assignment.
  • Observability and cost efficiency: Streaming-first, cheap writes, heavy analytics off hot path.

Note: We accept eventual consistency for ETAs and surge levels, but require strong consistency for ride assignment and payment settlement.

2. High-Level Architecture
#

Core Components
#

We can think in terms of 3 planes:

  1. Online request plane - matching and real-time user interactions.
  2. Streaming plane - GPS ingestion, event processing, feature computation.
  3. Batch/analytics plane - offline modeling and business insights.
Mobile Apps -> API Gateway -> Connection Service (WebSockets)
                                    |
                                    v
                                --- Ride Service -> Payment
                                |
        Location Service        |       Matching Service -> Geospatial Index
    GPS -> Ingestion -> Stream ------>  ETA  / Routing -> Pricing and Surge
                                |
                                |----> Analytics Plane

APIs
#

  • POST /rides/quote

    • Input: pickup_lat, pickup_lng, dropoff_lat, dropoff_lng, ride_type
    • Output: estimated_price, estimated_eta
  • POST /rides

    • Input: pickup, drop, price, ride_type, client_token
    • Output: ride_id, status (eg: requested)
  • POST /rides/{ride_id}/cancel (for both rider and driver)

  • POST /rides/{ride_id}/complete (driver)

  • POST /driver/{driver_id}/status (whether driver is online or offline)

Real Time Over Sockets
#

  • Rider Channel:

    • ride_update (matched, driver en route, started, completed).
    • driver_location (stream)
  • Driver Channel:

    • ride_offer (new ride request with countdown)
    • ride_update (GPS updates, cancelled ride, ride started, ride completed)
Component Diagram

Geospatial Indexing: Quadtrees vs S2 vs H3
#

We need to handle millions of GPS updates per second, and indexing raw lat/longitude in a traditional RDBMS does not scale. Instead, we will use a geospatial index that will:

  • Partition the world into cells
  • Map each (lat, lng) to a cell ID
  • Maintain in-memory state per cell and small neighbourhoods.

Quadtrees / Geohash
#

  • Idea: Recursively subdivide the world into quadrants; each level adds precision.
  • Geohash encodes lat/lng into a string; prefix similarity means proximity.
  • Pros:
    • Simple to implement
    • String prefix ranged queries.
  • Cons:
    • Cell shapes distort with latitude
    • Neighbourhood queries across cell boundaries are messier.

Google S2
#

S2 partitions the sphere into hierarchical cells using a space-filling curve (Hilbert) over faces of a cube projected onto Earth.

  • Pros:
    • Native spherical geometry (good for global systems).
    • Good locality properties via Hilbert curve.
    • Widely used (Google Maps, BigQuery GIS, MongoDB etc).
  • Cons:
    • High complexity

Uber H3
#

H3 is Uber’s open-source hexagonal hierarchical spatial index.

  • Hexagonal grid over the globe, resolutions 0 - 15.
  • Each cell has a 64-bit index; supports k-ring neighbours, polygons etc.
  • Pros:
    • Hexagons have more uniform neighbour distances than squares.
    • Built-in functions make Nearest neighbour, k-ring (neighbours within N rings), aggregation across resolutions straightforward.

Location Tracking: Handling Millions of GPS Pings
#

When a driver sends a GPS update every 1-5 seconds:

  1. Ingress
    • Driver -> WebSocket -> Connection Service -> Location Service
  2. Matching / Tracking
    • Convert (lat, lng) -> cell ID (eg: H3 index)
    • Update in-memory driver state:
      • drivers_by_cell[cell_id] -> set of available drivers in that cell
      • driver_state[driver_id] -> last location, status
    • Redis cache
      • Set TTL for driver locations to automatically remove stale drivers.

This ensures matching queries never hit the database and are served from memory.

Partitioning and Scaling
#

To handle high write volume:

  • Partition by region and cell ID:
    • Each shard manages a subset of cells.
    • Topic partition key: region, cell_id prefix
  • Tunable update frequency:
    • Adaptive throttling (slower updates when driver is stopped)
  • Backpressure and drop policies:
    • If downstream is overloaded, drop older updates for the same driver - only the latest matters per driver.

Note: We aim for a write-optimized, streaming-first architecture: We treat the GPS stream as the ground truth, update an in-memory index for sub-second queries, and persist raw events separately to a data lake for analytics and ML.

Matching Logic: Finding the “Best” Driver
#

Best is rarely “closest” - it’s a weighted combination of:

  • ETA to pickup (road, traffic)
  • Driver rating and cancellation history
  • Driver’s current trip (direction)
  • Vehicle constraints (type)
  • Platform incentives (driver utilization etc)

Step 1: Candidate Retrieval Given a pickup point:

  • Map (lat,lng) -> cell_id
  • Query current cell + k-ring neighbours (eg: 2-3 km).
  • Collect available drivers from these cells, with a cap (eg: top 100 drivers by proximity).

Using H3:

  • geoToH3(lat,lng,res) -> cell_id
  • kRing(cell_id, k) -> set of neighbour cell ids

Step 2: Scoring Function Define a match score: score = w1*eta + w2*rating + w3*vehicle_type + w4*driver_utilization + ...

Step 3: Offer and Reservation To avoid double-assignment, we use short-lived reservations:

  • Matching Service chooses top candidate driver.
  • It calls a Driver Allocation Store (Redis) with compare-and-set (CAS):
    • Set driver_status = "reserved" if current status is "available".
  • If CAS succeeds:
    • Create RideOffer
    • Send offer via WebSocket to driver.
  • Driver accepts or rejects via WebSocket -> ConnectionService -> Matching / Ride Service
  • If accepted, Ride Service updates trip status to "assigned" and notifies both rider and driver.
  • On timeout / rejection, Matching Service removes reservation, makes driver available and tries next candidate.

Note: 2 riders trying to get the same driver concurrently won’t happen due to CAS/locks that enforce exclusive assignment.

WebSockets for Real-Time Signalling
#

  • Both rider and driver maintain persistent WebSocket connections to Connection Service.
  • The Connection Service knows which instance holds the WebSocket for each user.

Benefits:

  • Full-duplex, low latency updates vs polling.
  • Separates stateful connection management from stateless matching logic.

Data Consistency: Preventing Double Booking
#

  • Per-driver linearizability: A single “home” shard / key for each driver. All state transitions go through that shard.
  • Idempotent APIs: Client includes request_id on mutation calls. Server stores (request_id -> result) with some TTL.
  • Ride state machine with explicit transitions: Requested -> OfferPending -> Matched -> Ongoing -> Completed/Cancelled. Every transition is a compare-and-set on current state.

ETA Engine
#

A basic ETA engine has 2 layers:

  • Routing layer: Represents the road network as a graph
    • Nodes: intersection / waypoints.
    • Edges: road segments with base travel times and constraints

Given source and destination:

  • Use A* or Dijkstra’s algorithm to find the shortest path
  • Integrate with real-time traffic data to adjust edge weights

Pricing and Surge
#

We can come up with a base fare model: base_price = (per_km * distance) + (per_min * time) + (surge_multiplier * base_fare)

Measuring Supply and Demand For each cell:

  • Supply: number of available drivers in / around that cell.
  • Demand: recent ride requests / search queries originating from that cell.

Computing Price:

  1. Map pick-up and drop to cells
  2. Estimate ETA and trip duration using ETA engine
  3. Compute base fare components
  4. Fetch surge multiplier for pickup cell
  5. Compute final price and show to user with time-limited validity (eg: 2 minutes).

3. Potential Bottlenecks and Mitigations
#

  • Hotspots (eg: New Year’s eve): Sudden spike in demand in a few cells can cause:
    • a high queue backlog in matching service
    • Connection Service Saturation i.e. too many websockets in a single region.

Mitigations can include:

  • Limit search radius or candidate count for driver matching during overload
  • Degrade gracefully i.e. approximate ETAs, cached routes, cheaper heuristics
  • Admission Control: if Matching Service queue length exceeds limit, surface an error - “High demand in your area, try again later”.
  • Connection Service Scaling: Use consistent hashing of ride_ids across nodes. During high demand, auto scale and rebalance.

4. Global Scaling
#

  • Multi-AZ (Availability Zone) deployment with automatic failover to prevent downtime. In this setup, you have a Primary instance in one AZ and a Standby instance in a second AZ. If the Primary goes dark, the system automatically swaps them so your application stays online.
  • Per-region H3/S2 index: Global coordination only at higher levels (e.g., cross-border trips).
  • Caching layers: Surge multipliers, popular routes, static map data.
  • Backpressure signals from downstream services: ETA / routing service indicates overload → Matching Service switches to heuristic ETAs.
  • Replayability: All requests and location streams written to Kafka enable reprocessing with improved algorithms and debugging incidents post hoc.

Related

Order Management System

As part of this post, we’ll be covering the design of a modern, production-grade Order Management System (OMS) with a focus on multi-fulfillment, cancellations, refunds, inventory synchronization, and multi-region deployment. Let’s first start with the requirements. Requirements # Functional Requirements # Core order lifecycle: Create order with multiple line items, shipping options, and payment methods. Order state machine: Support states such as PENDING → CONFIRMED → PARTIALLY_FULFILLED → FULFILLED → CANCELLED → REFUNDED. Split shipments: Support split shipments and partial fulfillment when items originate from multiple locations or arrive at different times. Cancellations: Allow customer and system-initiated cancellations in various states (pre-fulfillment, mid-fulfillment) with clear rules. Refunds: Support refunds (full and partial), including multi-payment or mixed-method scenarios (card, wallet, store credit). Multi-fulfillment: Route each line item to an optimal fulfillment node (warehouse, store, 3PL, marketplace drop-shipper). Multiple shipments: Track multiple shipments per order with independent tracking IDs and statuses. Backorders and preorders: Support delayed fulfillment while the order remains active. Inventory and payments: Reserve inventory atomically as part of the order creation saga; release on failure or cancellation. Inventory sync: Prevent overselling across channels with near real-time inventory sync and event-driven updates. Payment gateways: Integrate with one or more payment gateways for authorization, capture, and refund. Multi-channel and integrations: Receive orders from internal checkout, marketplaces, and POS; normalize into a canonical order model. Fulfillment updates: Push fulfillment updates and cancellations back to channels and customer notification systems. Multi-region deployment: Deploy OMS in multiple regions, each with a full stack of services fronted by a global load balancer. Data synchronization: Keep critical data (orders, payments, inventory) synchronized across regions using a mix of strong and eventual consistency depending on domain constraints. Non-Functional Requirements # High Availability and resilience: One failure in a downstream flow should not take down the entire order flow. Scalability: Capable of handling peak events such as flash sales and promotions. Consistency: Clear consistency model for orders and inventory (strong vs eventual consistency). Observability: Comprehensive logging, monitoring, and tracing. Extensibility: Easy to add new fulfillment types, payment methods, or regions without major rewrites. High Level Design # Order Lifecycle and Domain Model # Order Lifecycle Stages # A typical e-commerce order lifecycle contains the following high-level stages:

Mobile Wallet Payment System

As part of this post, we’ll be covering the design of a mobile wallet payment system that supports - Top-ups (add money to wallet from bank/card) P2P transfers (wallet -> wallet) Basic fraud detection Concurrency with clear trade-offs between strong and eventual consistency at scale. Let’s start with a basic design and then we can scale it up.