Circuit breakers for external services
Circuit Breaker Pattern Guide
This guide explains the circuit breaker pattern implementation for external services in the e-commerce application.
Overview
The circuit breaker pattern prevents cascading failures when external services are down. It acts as a safety mechanism that:
- Monitors external service calls
- Opens the circuit after consecutive failures
- Blocks requests when the circuit is open
- Probes the service after a timeout period
- Closes the circuit when the service recovers
Architecture
Generic Circuit Breaker
A generic CircuitBreaker concern (app/services/concerns/circuit_breaker.rb) provides the base implementation that can be used for any external service.
Service-Specific Implementations
- Payment Gateway (
PaymentCircuitBreaker)- Wrapper around generic circuit breaker
- Service name:
"payment_gateway"
- Shipping Calculator (
ShippingCalculatorService)- Uses generic circuit breaker
- Service name:
"shipping_calculator"
- Tax Service (
TaxService)- Uses generic circuit breaker
- Service name:
"tax_service"
Circuit Breaker States
1. CLOSED (Normal Operation)
- Circuit is closed, allowing all requests through
- Failures are tracked
- After 5 consecutive failures, circuit opens
2. OPEN (Service Down)
- Circuit is open, blocking all requests
- Immediate failure without calling the service
- After 60 seconds, circuit moves to HALF_OPEN
3. HALF_OPEN (Probing)
- Circuit allows one probe request
- If probe succeeds: circuit closes
- If probe fails: circuit re-opens
Configuration
Default Settings
All services use the same default configuration:
- Failure Threshold: 5 consecutive failures
- Timeout: 60 seconds (half-open after timeout)
- Close: After successful request
Environment Variables
# Payment Gateway
PAYMENT_GATEWAY_CIRCUIT_BREAKER_THRESHOLD=5
PAYMENT_GATEWAY_CIRCUIT_BREAKER_TIMEOUT=60
# Shipping Calculator
SHIPPING_CALCULATOR_CIRCUIT_BREAKER_THRESHOLD=5
SHIPPING_CALCULATOR_CIRCUIT_BREAKER_TIMEOUT=60
# Tax Service
TAX_SERVICE_CIRCUIT_BREAKER_THRESHOLD=5
TAX_SERVICE_CIRCUIT_BREAKER_TIMEOUT=60
Per-Service Configuration
You can override configuration per service:
with_circuit_breaker(
service_name: "shipping_calculator",
failure_threshold: 3, # Override default
timeout_seconds: 30 # Override default
) do
# Service call
end
Usage Examples
Payment Gateway
class PaymentGatewayService
include PaymentCircuitBreaker
def process_payment(order)
with_payment_circuit_breaker do
# Payment processing code
Razorpay::Order.create(...)
end
end
end
Shipping Calculator
class ShippingCalculatorService
include CircuitBreaker
def calculate_shipping(address, weight)
with_circuit_breaker(service_name: "shipping_calculator") do
# External shipping API call
ShippingAPI.calculate(...)
end
rescue CircuitBreaker::CircuitBreakerOpen => e
# Fallback to local calculation
calculate_locally(address, weight)
end
end
Tax Service
class TaxService
include CircuitBreaker
def calculate_tax(address, subtotal)
with_circuit_breaker(service_name: "tax_service") do
# External tax API call
TaxAPI.calculate(...)
end
rescue CircuitBreaker::CircuitBreakerOpen => e
# Fallback to local calculation
calculate_locally(address, subtotal)
end
end
State Transitions
CLOSED → (5 failures) → OPEN → (60 seconds) → HALF_OPEN → (success) → CLOSED
↓
(failure)
↓
OPEN
Error Handling
Circuit Breaker Open Exception
When the circuit is open, a CircuitBreakerOpen exception is raised:
begin
with_circuit_breaker(service_name: "payment_gateway") do
process_payment
end
rescue CircuitBreaker::CircuitBreakerOpen => e
# Handle circuit breaker open
Rails.logger.warn("Circuit breaker open: #{e.service_name}")
# Fallback logic
end
Fallback Strategies
- Local Calculation (Shipping/Tax)
- Use local calculation when external service is unavailable
- Ensures order processing continues
- Queue for Retry (Payment)
- Queue payment for later processing
- Notify user of delay
- Default Values
- Use safe default values
- Continue with degraded functionality
Monitoring & Alerts
Metrics Tracked
- Circuit breaker state changes
- Failure counts
- Time in each state
- Probe success/failure rates
Sentry Integration
Circuit breaker openings are automatically reported to Sentry:
Sentry.capture_message(
"Circuit breaker opened",
level: :error,
tags: { service: service_name },
extra: { failure_count: count, error: error }
)
Event System
Events are emitted for circuit breaker state changes:
circuit_breaker.opened- Circuit openedcircuit_breaker.closed- Circuit closedcircuit_breaker.half_open- Circuit moved to half-open
Best Practices
1. Always Implement Fallbacks
def calculate_shipping(address, weight)
with_circuit_breaker(service_name: "shipping_calculator") do
external_api.calculate(...)
end
rescue CircuitBreaker::CircuitBreakerOpen => e
# Always have a fallback
calculate_locally(address, weight)
end
2. Use Appropriate Timeouts
- Short timeouts (30-60s) for fast-recovering services
- Long timeouts (120-300s) for slow-recovering services
3. Monitor Circuit Breaker State
# Check circuit breaker state
service = ShippingCalculatorService.new
state = service.get_circuit_state("shipping_calculator")
# => "closed", "open", or "half_open"
4. Test Circuit Breaker Behavior
# In tests, you can manually open the circuit
Rails.cache.write("circuit_breaker:shipping_calculator:state", "open")
# Or simulate failures
allow(ShippingAPI).to receive(:calculate).and_raise(StandardError)
Configuration Recommendations
For High-Traffic Services
# More lenient (fewer false positives)
PAYMENT_GATEWAY_CIRCUIT_BREAKER_THRESHOLD=10
PAYMENT_GATEWAY_CIRCUIT_BREAKER_TIMEOUT=120
For Critical Services
# More strict (faster failure detection)
PAYMENT_GATEWAY_CIRCUIT_BREAKER_THRESHOLD=3
PAYMENT_GATEWAY_CIRCUIT_BREAKER_TIMEOUT=30
For Non-Critical Services
# Very lenient (avoid unnecessary fallbacks)
SHIPPING_CALCULATOR_CIRCUIT_BREAKER_THRESHOLD=15
SHIPPING_CALCULATOR_CIRCUIT_BREAKER_TIMEOUT=180
Troubleshooting
Circuit Breaker Stuck Open
Symptom: Circuit breaker remains open even after service recovers
Solution:
- Check service health manually
- Manually reset circuit breaker:
Rails.cache.delete("circuit_breaker:service_name:state") Rails.cache.delete("circuit_breaker:service_name:failures")
Too Many False Positives
Symptom: Circuit opens too frequently
Solution:
- Increase failure threshold
- Increase timeout period
- Review error handling logic
Circuit Breaker Not Opening
Symptom: Service is down but circuit remains closed
Solution:
- Check error handling (errors must be raised)
- Verify failure counting logic
- Check cache configuration
Implementation Details
State Storage
Circuit breaker state is stored in Rails cache (Solid Cache):
- State:
circuit_breaker:#{service_name}:state - Failures:
circuit_breaker:#{service_name}:failures - Opened At:
circuit_breaker:#{service_name}:opened_at - Probe In Progress:
circuit_breaker:#{service_name}:probe_in_progress
Advisory Locks
PostgreSQL advisory locks ensure thread-safe state management:
- Prevents race conditions
- Ensures atomic state transitions
- Uses service-specific lock names
Cache Expiration
- State: 1 hour
- Failures: 5 minutes
- Opened At: 1 hour
- Probe In Progress: 1 hour