# Circuit breakers for external services ## Circuit Breaker Pattern Guide This guide explains the circuit breaker pattern implementation for external services in the e-commerce application. ## Overview The circuit breaker pattern prevents cascading failures when external services are down. It acts as a safety mechanism that: 1. **Monitors** external service calls 2. **Opens** the circuit after consecutive failures 3. **Blocks** requests when the circuit is open 4. **Probes** the service after a timeout period 5. **Closes** the circuit when the service recovers ## Architecture ### Generic Circuit Breaker A generic `CircuitBreaker` concern (`app/services/concerns/circuit_breaker.rb`) provides the base implementation that can be used for any external service. ### Service-Specific Implementations 1. **Payment Gateway** (`PaymentCircuitBreaker`) - Wrapper around generic circuit breaker - Service name: `"payment_gateway"` 2. **Shipping Calculator** (`ShippingCalculatorService`) - Uses generic circuit breaker - Service name: `"shipping_calculator"` 3. **Tax Service** (`TaxService`) - Uses generic circuit breaker - Service name: `"tax_service"` ## Circuit Breaker States ### 1. CLOSED (Normal Operation) - Circuit is closed, allowing all requests through - Failures are tracked - After 5 consecutive failures, circuit opens ### 2. OPEN (Service Down) - Circuit is open, blocking all requests - Immediate failure without calling the service - After 60 seconds, circuit moves to HALF_OPEN ### 3. HALF_OPEN (Probing) - Circuit allows one probe request - If probe succeeds: circuit closes - If probe fails: circuit re-opens ## Configuration ### Default Settings All services use the same default configuration: - **Failure Threshold**: 5 consecutive failures - **Timeout**: 60 seconds (half-open after timeout) - **Close**: After successful request ### Environment Variables ```bash # Payment Gateway PAYMENT_GATEWAY_CIRCUIT_BREAKER_THRESHOLD=5 PAYMENT_GATEWAY_CIRCUIT_BREAKER_TIMEOUT=60 # Shipping Calculator SHIPPING_CALCULATOR_CIRCUIT_BREAKER_THRESHOLD=5 SHIPPING_CALCULATOR_CIRCUIT_BREAKER_TIMEOUT=60 # Tax Service TAX_SERVICE_CIRCUIT_BREAKER_THRESHOLD=5 TAX_SERVICE_CIRCUIT_BREAKER_TIMEOUT=60 ``` ### Per-Service Configuration You can override configuration per service: ```ruby with_circuit_breaker( service_name: "shipping_calculator", failure_threshold: 3, # Override default timeout_seconds: 30 # Override default ) do # Service call end ``` ## Usage Examples ### Payment Gateway ```ruby class PaymentGatewayService include PaymentCircuitBreaker def process_payment(order) with_payment_circuit_breaker do # Payment processing code Razorpay::Order.create(...) end end end ``` ### Shipping Calculator ```ruby class ShippingCalculatorService include CircuitBreaker def calculate_shipping(address, weight) with_circuit_breaker(service_name: "shipping_calculator") do # External shipping API call ShippingAPI.calculate(...) end rescue CircuitBreaker::CircuitBreakerOpen => e # Fallback to local calculation calculate_locally(address, weight) end end ``` ### Tax Service ```ruby class TaxService include CircuitBreaker def calculate_tax(address, subtotal) with_circuit_breaker(service_name: "tax_service") do # External tax API call TaxAPI.calculate(...) end rescue CircuitBreaker::CircuitBreakerOpen => e # Fallback to local calculation calculate_locally(address, subtotal) end end ``` ## State Transitions ``` CLOSED → (5 failures) → OPEN → (60 seconds) → HALF_OPEN → (success) → CLOSED ↓ (failure) ↓ OPEN ``` ## Error Handling ### Circuit Breaker Open Exception When the circuit is open, a `CircuitBreakerOpen` exception is raised: ```ruby begin with_circuit_breaker(service_name: "payment_gateway") do process_payment end rescue CircuitBreaker::CircuitBreakerOpen => e # Handle circuit breaker open Rails.logger.warn("Circuit breaker open: #{e.service_name}") # Fallback logic end ``` ### Fallback Strategies 1. **Local Calculation** (Shipping/Tax) - Use local calculation when external service is unavailable - Ensures order processing continues 2. **Queue for Retry** (Payment) - Queue payment for later processing - Notify user of delay 3. **Default Values** - Use safe default values - Continue with degraded functionality ## Monitoring & Alerts ### Metrics Tracked - Circuit breaker state changes - Failure counts - Time in each state - Probe success/failure rates ### Sentry Integration Circuit breaker openings are automatically reported to Sentry: ```ruby Sentry.capture_message( "Circuit breaker opened", level: :error, tags: { service: service_name }, extra: { failure_count: count, error: error } ) ``` ### Event System Events are emitted for circuit breaker state changes: - `circuit_breaker.opened` - Circuit opened - `circuit_breaker.closed` - Circuit closed - `circuit_breaker.half_open` - Circuit moved to half-open ## Best Practices ### 1. Always Implement Fallbacks ```ruby def calculate_shipping(address, weight) with_circuit_breaker(service_name: "shipping_calculator") do external_api.calculate(...) end rescue CircuitBreaker::CircuitBreakerOpen => e # Always have a fallback calculate_locally(address, weight) end ``` ### 2. Use Appropriate Timeouts - **Short timeouts** (30-60s) for fast-recovering services - **Long timeouts** (120-300s) for slow-recovering services ### 3. Monitor Circuit Breaker State ```ruby # Check circuit breaker state service = ShippingCalculatorService.new state = service.get_circuit_state("shipping_calculator") # => "closed", "open", or "half_open" ``` ### 4. Test Circuit Breaker Behavior ```ruby # In tests, you can manually open the circuit Rails.cache.write("circuit_breaker:shipping_calculator:state", "open") # Or simulate failures allow(ShippingAPI).to receive(:calculate).and_raise(StandardError) ``` ## Configuration Recommendations ### For High-Traffic Services ```bash # More lenient (fewer false positives) PAYMENT_GATEWAY_CIRCUIT_BREAKER_THRESHOLD=10 PAYMENT_GATEWAY_CIRCUIT_BREAKER_TIMEOUT=120 ``` ### For Critical Services ```bash # More strict (faster failure detection) PAYMENT_GATEWAY_CIRCUIT_BREAKER_THRESHOLD=3 PAYMENT_GATEWAY_CIRCUIT_BREAKER_TIMEOUT=30 ``` ### For Non-Critical Services ```bash # Very lenient (avoid unnecessary fallbacks) SHIPPING_CALCULATOR_CIRCUIT_BREAKER_THRESHOLD=15 SHIPPING_CALCULATOR_CIRCUIT_BREAKER_TIMEOUT=180 ``` ## Troubleshooting ### Circuit Breaker Stuck Open **Symptom**: Circuit breaker remains open even after service recovers **Solution**: 1. Check service health manually 2. Manually reset circuit breaker: ```ruby Rails.cache.delete("circuit_breaker:service_name:state") Rails.cache.delete("circuit_breaker:service_name:failures") ``` ### Too Many False Positives **Symptom**: Circuit opens too frequently **Solution**: 1. Increase failure threshold 2. Increase timeout period 3. Review error handling logic ### Circuit Breaker Not Opening **Symptom**: Service is down but circuit remains closed **Solution**: 1. Check error handling (errors must be raised) 2. Verify failure counting logic 3. Check cache configuration ## Implementation Details ### State Storage Circuit breaker state is stored in Rails cache (Solid Cache): - State: `circuit_breaker:#{service_name}:state` - Failures: `circuit_breaker:#{service_name}:failures` - Opened At: `circuit_breaker:#{service_name}:opened_at` - Probe In Progress: `circuit_breaker:#{service_name}:probe_in_progress` ### Advisory Locks PostgreSQL advisory locks ensure thread-safe state management: - Prevents race conditions - Ensures atomic state transitions - Uses service-specific lock names ### Cache Expiration - State: 1 hour - Failures: 5 minutes - Opened At: 1 hour - Probe In Progress: 1 hour ## References - [Circuit Breaker Pattern](https://martinfowler.com/bliki/CircuitBreaker.html) - [Resilience Patterns](https://www.oreilly.com/library/view/release-it/9781680500264/) - [PostgreSQL Advisory Locks](https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS)