As part of this post, we’ll be covering the design of a modern, production-grade Order Management System (OMS) with a focus on multi-fulfillment, cancellations, refunds, inventory synchronization, and multi-region deployment.
Let’s first start with the requirements.
Requirements #
Functional Requirements #
- Core order lifecycle: Create order with multiple line items, shipping options, and payment methods.
- Order state machine: Support states such as PENDING → CONFIRMED → PARTIALLY_FULFILLED → FULFILLED → CANCELLED → REFUNDED.
- Split shipments: Support split shipments and partial fulfillment when items originate from multiple locations or arrive at different times.
- Cancellations: Allow customer and system-initiated cancellations in various states (pre-fulfillment, mid-fulfillment) with clear rules.
- Refunds: Support refunds (full and partial), including multi-payment or mixed-method scenarios (card, wallet, store credit).
- Multi-fulfillment: Route each line item to an optimal fulfillment node (warehouse, store, 3PL, marketplace drop-shipper).
- Multiple shipments: Track multiple shipments per order with independent tracking IDs and statuses.
- Backorders and preorders: Support delayed fulfillment while the order remains active.
- Inventory and payments: Reserve inventory atomically as part of the order creation saga; release on failure or cancellation.
- Inventory sync: Prevent overselling across channels with near real-time inventory sync and event-driven updates.
- Payment gateways: Integrate with one or more payment gateways for authorization, capture, and refund.
- Multi-channel and integrations: Receive orders from internal checkout, marketplaces, and POS; normalize into a canonical order model.
- Fulfillment updates: Push fulfillment updates and cancellations back to channels and customer notification systems.
- Multi-region deployment: Deploy OMS in multiple regions, each with a full stack of services fronted by a global load balancer.
- Data synchronization: Keep critical data (orders, payments, inventory) synchronized across regions using a mix of strong and eventual consistency depending on domain constraints.
Non-Functional Requirements #
- High Availability and resilience: One failure in a downstream flow should not take down the entire order flow.
- Scalability: Capable of handling peak events such as flash sales and promotions.
- Consistency: Clear consistency model for orders and inventory (strong vs eventual consistency).
- Observability: Comprehensive logging, monitoring, and tracing.
- Extensibility: Easy to add new fulfillment types, payment methods, or regions without major rewrites.
High Level Design #
Order Lifecycle and Domain Model #
Order Lifecycle Stages #
A typical e-commerce order lifecycle contains the following high-level stages:
- Order captured: Request received from channel with cart items, prices, and customer data.
- Order validated: Items, pricing, and addresses verified; taxes and shipping calculated.
- Payment authorization: Payment instrument authorized for total amount.
- Inventory reservation: Stock reserved or allocated at chosen location(s).
- Fulfillment: Warehouse/store picks, packs, and ships or hands over for pickup.
- Shipment and delivery: Carrier tracking pushed; order marked shipped/delivered.
- Post-order events: Cancellations, returns, exchanges, refunds, inventory adjustments.
Core Entities #
Key domain entities include:
- Order: Immutable identity, with overall status (CREATED, CONFIRMED, FULFILLING, SHIPPED, COMPLETED, CANCELLED, RETURNED).
- OrderItem: Per-SKU line with quantity, price, and fulfillment status.
- Payment: Records authorization, capture, refund events with idempotent transaction keys.
- InventoryItem / StockLevel: Per SKU, location, availability, and reservations.
- FulfillmentRequest: A unit of work sent to a fulfillment node (warehouse, store, 3PL).
- ReturnRequest: Tracks customer-initiated returns, RMA, and refund disposition.
Core Services #
Saga Orchestration #
In a microservices-based Order Management System (OMS), you can’t easily use a single “giant” database transaction to ensure everything succeeds or fails together. If the Payment service is down but the Inventory service already deducted stock, you have a data consistency nightmare.
The Saga Pattern solves this by breaking a large, distributed transaction into a sequence of smaller, local transactions.
How a Saga Works Instead of one big lock on the data, each service performs its own local transaction and publishes an event or message. This triggers the next service in the chain. If any step fails, the Saga executes compensating transactions—essentially “undo” operations—to revert the changes made by previous steps.
There are two primary ways to coordinate these steps:
-
Event-Based (Choreography) There is no central “boss.” Each service listens for events and decides what to do next.
- Pros: Simple to start; low coupling.
- Cons: Hard to track the “state” of an order as the number of services grows. It can become a “spaghetti” of events.
-
Orchestration (Centralized) A central “Orchestrator” (the Saga Manager) tells each service what to do and when.
- Pros: Easier to debug and monitor; the logic for the entire business process is in one place.
- Cons: Risk of the orchestrator becoming a “fat” service that knows too much about everyone else’s business.
Let’s start with an event-based design for Order, Payment, and Inventory services:
public class OrderService {
private final EventBus eventBus;
private final OrderRepository orderRepository;
public OrderService(EventBus eventBus, OrderRepository orderRepository) {
this.eventBus = eventBus;
this.orderRepository = orderRepository;
}
public Order createOrder(CreateOrderCommand cmd) {
Order order = Order.pending(cmd);
orderRepository.save(order);
eventBus.publish(new OrderCreatedEvent(order));
return order;
}
@EventListener
public void on(PaymentCompletedEvent event) {
Order order = orderRepository.findById(event.orderId());
order.markPaymentCompleted(event.paymentId());
orderRepository.save(order);
}
@EventListener
public void on(InventoryReservedEvent event) {
Order order = orderRepository.findById(event.orderId());
order.confirm();
orderRepository.save(order);
eventBus.publish(new OrderConfirmedEvent(order.getId()));
}
@EventListener
public void on(PaymentFailedEvent event) {
Order order = orderRepository.findById(event.orderId());
order.cancel("PAYMENT_FAILED");
orderRepository.save(order);
eventBus.publish(new OrderCancelledEvent(order.getId(), "PAYMENT_FAILED"));
}
@EventListener
public void on(InventoryFailedEvent event) {
Order order = orderRepository.findById(event.orderId());
order.startCompensation("INVENTORY_FAILED");
orderRepository.save(order);
eventBus.publish(new CompensatePaymentCommand(order.getId(), event.reason()));
}
}Multi-Fulfillment #
public class FulfillmentGroup {
private Long id;
private Long orderId;
private String fulfillmentNodeId; // warehouse, store, 3PL
private FulfillmentStatus status;
private ShippingMethod shippingMethod;
private String trackingNumber;
private List<FulfillmentLine> lines;
}
public class FulfillmentLine {
private Long id;
private Long fulfillmentGroupId;
private Long orderItemId;
private int quantity;
}Inventory Reservation #
@Transactional
public ReservationResult reserveItems(String orderId, List<ReservationRequest> requests) {
List<InventoryReservation> reservations = new ArrayList<>();
for (ReservationRequest req : requests) {
InventoryRow row = inventoryRepository.lockForUpdate(req.getSku(), req.getLocationId());
int available = row.getOnHand() - row.getReserved();
if (available < req.getQuantity()) {
throw new InsufficientInventoryException(req.getSku(), req.getLocationId());
}
row.setReserved(row.getReserved() + req.getQuantity());
inventoryRepository.save(row);
reservations.add(new InventoryReservation(orderId, req.getSku(), req.getLocationId(), req.getQuantity()));
}
reservationRepository.saveAll(reservations);
eventBus.publish(new InventoryReservedEvent(orderId, reservations));
return new ReservationResult(reservations);
}Cancellations and Refunds #
public void cancelOrder(String orderId, CancelReason reason) {
Order order = orderRepository.findById(orderId);
order.cancel(reason);
orderRepository.save(order);
eventBus.publish(new OrderCancelledEvent(orderId, reason));
}
@EventListener
public void on(OrderCancelledEvent event) {
// In Inventory Service
reservationRepository.findByOrderId(event.orderId()).forEach(res -> {
InventoryRow row = inventoryRepository.lockForUpdate(res.getSku(), res.getLocationId());
row.setReserved(row.getReserved() - res.getQuantity());
inventoryRepository.save(row);
});
// Publish release event to other consumers if needed
eventBus.publish(new InventoryReleasedEvent(event.orderId()));
}Multi-Region Strategy #
- Each region has its own Order, Payment, Inventory, and Fulfillment services plus local databases.
- Orders are sticky to a “home” region determined by user profile or channel.
- Events that need to be globally visible (e.g., inventory changes, loyalty updates) are replicated to other regions asynchronously using topics or cross-region database replication.
- Global reporting and reconciliation use eventually consistent data.