Distributed Transactions
What are Distributed Transactions?
Distributed transactions are operations that span multiple databases or services and must be executed atomically - either all operations succeed or all fail. They’re challenging because traditional ACID transactions don’t work across network boundaries.
The Challenge
Traditional Transaction (Single Database):
- BEGIN TRANSACTION
- Update account A: -$100
- Update account B: +$100
- COMMIT
- Database ensures atomicity
Distributed Transaction (Multiple Databases):
- Service A updates its database
- Service B updates its database
- Network can fail between operations
- How to ensure both succeed or both fail?
Two-Phase Commit (2PC)
Purpose: Coordinate distributed transaction across multiple participants
Roles:
- Coordinator: Orchestrates the transaction
- Participants: Services/databases involved
Phase 1: Prepare
- Coordinator asks all participants: “Can you commit?”
- Participants prepare transaction (lock resources)
- Participants vote: YES or NO
- If any participant votes NO, transaction aborts
Phase 2: Commit
- If all voted YES, coordinator sends COMMIT
- If any voted NO, coordinator sends ROLLBACK
- Participants execute decision
- Participants acknowledge completion
Example Flow:
Coordinator: "Prepare to transfer $100 from A to B"
Participant 1 (Account A):
- Check balance ($500 available)
- Lock account
- Vote: YES
Participant 2 (Account B):
- Check account exists
- Lock account
- Vote: YES
Coordinator: "All voted YES, COMMIT"
Participant 1: Deduct $100, unlock, ACK
Participant 2: Add $100, unlock, ACK
Coordinator: "Transaction complete"Advantages:
- Strong consistency
- Clear commit/rollback semantics
Disadvantages:
- Blocking protocol (locks held during voting)
- Coordinator is single point of failure
- Poor performance (multiple network round trips)
- Not suitable for microservices
Saga Pattern
Purpose: Manage distributed transactions without locking, using compensating transactions
How it Works:
- Break transaction into sequence of local transactions
- Each step has compensating transaction (undo)
- If step fails, execute compensating transactions for completed steps
Types:
Choreography-Based Saga
Approach: Services communicate via events, no central coordinator
Example: Order Processing
1. Order Service: Create order (status: PENDING)
→ Publishes: OrderCreated
2. Payment Service: Process payment
→ Success: Publishes PaymentCompleted
→ Failure: Publishes PaymentFailed
3. Inventory Service: Reserve items
→ Success: Publishes InventoryReserved
→ Failure: Publishes InventoryFailed
4. Shipping Service: Create shipment
→ Success: Publishes ShipmentCreated
→ Order status: COMPLETEDCompensation Flow (if inventory fails):
1. Inventory Service: Publishes InventoryFailed
2. Payment Service: Listens to InventoryFailed
→ Refund payment (compensating transaction)
→ Publishes PaymentRefunded
3. Order Service: Listens to PaymentRefunded
→ Update order status: CANCELLEDImplementation:
// Order Service
class OrderService {
async createOrder(orderData) {
const order = await db.orders.create({
...orderData,
status: 'PENDING'
});
await eventBus.publish('OrderCreated', {
orderId: order.id,
userId: order.userId,
items: order.items,
total: order.total
});
return order;
}
// Listen for success/failure events
async onPaymentFailed(event) {
await db.orders.update(event.orderId, {
status: 'PAYMENT_FAILED'
});
}
async onShipmentCreated(event) {
await db.orders.update(event.orderId, {
status: 'COMPLETED'
});
}
}
// Payment Service
class PaymentService {
async onOrderCreated(event) {
try {
const payment = await this.processPayment(event);
await eventBus.publish('PaymentCompleted', {
orderId: event.orderId,
paymentId: payment.id
});
} catch (error) {
await eventBus.publish('PaymentFailed', {
orderId: event.orderId,
reason: error.message
});
}
}
// Compensating transaction
async onInventoryFailed(event) {
const payment = await db.payments.findByOrderId(event.orderId);
if (payment) {
await this.refundPayment(payment.id);
await eventBus.publish('PaymentRefunded', {
orderId: event.orderId,
paymentId: payment.id
});
}
}
}Advantages:
- No central coordinator
- Services loosely coupled
- Scales well
Disadvantages:
- Complex to understand and debug
- Difficult to track saga state
- Cyclic dependencies possible
Orchestration-Based Saga
Approach: Central orchestrator coordinates saga steps
Example:
class OrderSagaOrchestrator {
async executeOrderSaga(orderData) {
const sagaId = generateId();
const state = {
sagaId,
currentStep: 0,
completedSteps: [],
orderData
};
try {
// Step 1: Create order
const order = await this.orderService.createOrder(orderData);
state.completedSteps.push('createOrder');
state.orderId = order.id;
// Step 2: Process payment
const payment = await this.paymentService.processPayment({
orderId: order.id,
amount: order.total
});
state.completedSteps.push('processPayment');
state.paymentId = payment.id;
// Step 3: Reserve inventory
await this.inventoryService.reserveItems({
orderId: order.id,
items: order.items
});
state.completedSteps.push('reserveInventory');
// Step 4: Create shipment
await this.shippingService.createShipment({
orderId: order.id,
address: order.shippingAddress
});
state.completedSteps.push('createShipment');
// Complete saga
await this.orderService.completeOrder(order.id);
return { success: true, orderId: order.id };
} catch (error) {
// Compensate completed steps in reverse order
await this.compensate(state);
return { success: false, error: error.message };
}
}
async compensate(state) {
const steps = state.completedSteps.reverse();
for (const step of steps) {
try {
switch (step) {
case 'createShipment':
await this.shippingService.cancelShipment(state.orderId);
break;
case 'reserveInventory':
await this.inventoryService.releaseItems(state.orderId);
break;
case 'processPayment':
await this.paymentService.refundPayment(state.paymentId);
break;
case 'createOrder':
await this.orderService.cancelOrder(state.orderId);
break;
}
} catch (error) {
console.error(`Compensation failed for ${step}:`, error);
// Log for manual intervention
}
}
}
}Advantages:
- Centralized logic, easier to understand
- Clear saga state
- Easier to monitor and debug
Disadvantages:
- Orchestrator is single point of failure
- Services coupled to orchestrator
- Orchestrator can become complex
Event Sourcing
Concept: Store all changes as sequence of events, rebuild state by replaying events
Benefits for Distributed Transactions:
- Natural audit trail
- Can replay events to recover state
- Events are immutable facts
Example:
// Events
const events = [
{ type: 'OrderCreated', orderId: '123', total: 100 },
{ type: 'PaymentProcessed', orderId: '123', paymentId: 'p456' },
{ type: 'InventoryReserved', orderId: '123', items: [...] },
{ type: 'ShipmentCreated', orderId: '123', trackingId: 't789' }
];
// Rebuild order state
function rebuildOrderState(orderId, events) {
const orderEvents = events.filter(e => e.orderId === orderId);
let state = { status: 'UNKNOWN' };
for (const event of orderEvents) {
switch (event.type) {
case 'OrderCreated':
state = { status: 'PENDING', total: event.total };
break;
case 'PaymentProcessed':
state.status = 'PAID';
state.paymentId = event.paymentId;
break;
case 'InventoryReserved':
state.status = 'RESERVED';
break;
case 'ShipmentCreated':
state.status = 'SHIPPED';
state.trackingId = event.trackingId;
break;
}
}
return state;
}Idempotency
Critical for Distributed Transactions: Operations must be safely retryable
Idempotent Operation: Executing multiple times has same effect as executing once
Example:
// NOT idempotent
async function addToBalance(userId, amount) {
const user = await db.users.findById(userId);
await db.users.update(userId, {
balance: user.balance + amount
});
}
// Retry adds amount again!
// Idempotent
async function addToBalance(userId, amount, transactionId) {
// Check if already processed
const existing = await db.transactions.findById(transactionId);
if (existing) {
return existing; // Already processed
}
const user = await db.users.findById(userId);
await db.users.update(userId, {
balance: user.balance + amount
});
// Record transaction
await db.transactions.create({
id: transactionId,
userId,
amount,
processedAt: new Date()
});
}
// Safe to retry with same transactionId.NET Distributed Transaction Example
public class DistributedTransactionService
{
private readonly IOrderRepository _orderRepo;
private readonly IPaymentService _paymentService;
private readonly IInventoryService _inventoryService;
private readonly IMessageBus _messageBus;
// Saga orchestrator
public async Task<Result> ProcessOrderAsync(OrderRequest request)
{
var sagaId = Guid.NewGuid();
var compensations = new Stack<Func<Task>>();
try
{
// Step 1: Create order
var order = await _orderRepo.CreateAsync(request);
compensations.Push(async () =>
await _orderRepo.CancelAsync(order.Id));
// Step 2: Process payment
var payment = await _paymentService.ProcessAsync(
order.Id, order.Total);
compensations.Push(async () =>
await _paymentService.RefundAsync(payment.Id));
// Step 3: Reserve inventory
await _inventoryService.ReserveAsync(
order.Id, order.Items);
compensations.Push(async () =>
await _inventoryService.ReleaseAsync(order.Id));
// Step 4: Publish success event
await _messageBus.PublishAsync(new OrderCompletedEvent
{
OrderId = order.Id,
SagaId = sagaId
});
return Result.Success(order.Id);
}
catch (Exception ex)
{
// Execute compensations
await CompensateAsync(compensations);
return Result.Failure(ex.Message);
}
}
private async Task CompensateAsync(Stack<Func<Task>> compensations)
{
while (compensations.Count > 0)
{
var compensation = compensations.Pop();
try
{
await compensation();
}
catch (Exception ex)
{
// Log compensation failure for manual intervention
_logger.LogError(ex, "Compensation failed");
}
}
}
}Best Practices
- Avoid distributed transactions when possible - Design services with clear boundaries
- Use saga pattern for microservices - Better than 2PC
- Make operations idempotent - Safe retries
- Implement compensating transactions - Undo completed steps
- Monitor saga state - Track progress and failures
- Set timeouts - Don’t wait forever
- Log everything - Essential for debugging
- Plan for partial failures - System must handle them
- Use unique transaction IDs - Prevent duplicates
- Test failure scenarios - Chaos engineering
Interview Tips
- Explain the challenge: ACID doesn’t work across services
- Show 2PC: Traditional approach, blocking
- Demonstrate saga: Modern approach for microservices
- Discuss compensation: How to undo steps
- Mention idempotency: Critical for retries
- Show orchestration vs choreography: Trade-offs
Summary
Distributed transactions coordinate operations across multiple services/databases. Two-Phase Commit (2PC) uses coordinator and voting but blocks and has single point of failure. Saga pattern breaks transaction into local transactions with compensating actions. Choreography-based sagas use events (loosely coupled). Orchestration-based sagas use central coordinator (easier to understand). Make operations idempotent for safe retries. Use unique transaction IDs to prevent duplicates. Event sourcing provides audit trail. Avoid distributed transactions when possible through better service boundaries. Essential for building reliable microservices.
Test Your Knowledge
Take a quick quiz to test your understanding of this topic.