# A taxonomy of error handling in a Rails monolith ## Error Handling & Resilience Guide This guide explains the comprehensive error handling and resilience mechanisms implemented for the order system. ## Overview The system implements multi-layered error handling for: 1. **Inventory Conflicts** - Retry with exponential backoff, user notifications 2. **Payment Failures** - Inventory release, error logging, user notifications, retry support 3. **Database Deadlocks** - Automatic retry, exponential backoff, queue fallback ## Error Handling Architecture ### OrderErrorHandler Service Centralized error handling service (`app/services/order_error_handler.rb`) that provides: - Consistent error handling across all order operations - Retry logic with exponential backoff - User notifications - Inventory release on failures - Queue-based fallback for deadlocks ## Error Scenarios ### 1. Inventory Conflicts **Scenario**: User tries to order items that are out of stock or insufficient inventory available. **Handling**: 1. **Retry with Exponential Backoff** - Automatic retry up to 3 attempts - Exponential backoff: 0.1s, 0.2s, 0.4s, 0.8s, etc. - Max delay: 10 seconds 2. **User Notification** - Email notification with unavailable items list - Clear error messages indicating which items are out of stock - Available quantities shown 3. **Partial Order Fulfillment** (if applicable) - Create order with available items - Mark unavailable items in order metadata - Notify user of partial fulfillment **Implementation**: ```ruby result = OrderErrorHandler.handle_inventory_conflict( order: order, unavailable_items: [ { variant_id: 1, variant_name: "Product A", requested: 2, available: 0 } ], user: user, reservations: reservations, retry_count: 0, max_retries: 3 ) ``` **User Experience**: - Clear error message: "Product A: Only 0 available (requested 2)" - Email notification with details - Option to retry with updated cart ### 2. Payment Failures **Scenario**: Payment gateway returns failure or payment processing error. **Handling**: 1. **Error Logging** - Detailed error information logged - Sentry integration for error tracking - Payment gateway error details captured 2. **Inventory Release** - Automatic release of inventory reservations - Prevents inventory from being held indefinitely - Ensures inventory is available for other users 3. **User Notification** - Email notification of payment failure - Clear message explaining what happened - Instructions for retry 4. **Retry Support** - Order remains in pending state - User can retry payment - New payment intent created on retry **Implementation**: ```ruby result = OrderErrorHandler.handle_payment_failure( order: order, payment_intent: payment_intent, error: payment_error, user: user, gateway_type: "razorpay" ) ``` **User Experience**: - Email: "Payment Failed for Order #12345" - Order status: `payment_status: :failed` - Inventory released automatically - Can retry payment from order page ### 3. Database Deadlocks **Scenario**: Concurrent transactions cause database deadlocks or serialization failures. **Handling**: 1. **Automatic Retry** - Up to 3 retry attempts - Exponential backoff: 0.1s, 0.2s, 0.4s - Tracks retry count and delays 2. **Exponential Backoff** - Formula: `base_delay * (2 ^ retry_count)` - Base delay: 0.1 seconds - Max delay: 10 seconds - Prevents thundering herd 3. **Queue-Based Fallback** - If retries exhausted, enqueue to background job - Process asynchronously to avoid blocking - User notified that request is queued **Implementation**: ```ruby result = OrderErrorHandler.handle_database_deadlock( error: deadlock_error, operation: :create_order, context: { user_id: user.id, items_count: 5 }, retry_count: 0, max_retries: 3 ) ``` **User Experience**: - First 3 attempts: Automatic retry (transparent to user) - After 3 attempts: "Request queued due to high load. You will be notified when processing completes." - Email notification when order is processed ## Error Response Format All error handlers return consistent result hashes: ```ruby { success: true/false, retry: true/false, # Whether to retry retry_delay: 0.2, # Seconds to wait before retry retry_count: 1, # Current retry attempt fallback_to_queue: false, # Whether to queue for async processing notify_user: true, # Whether user was notified released_inventory: true, # Whether inventory was released errors: ["Error message"], # Array of error messages message: "User-friendly message", unavailable_items: [...], # For inventory conflicts queued: false # Whether request was queued } ``` ## Exponential Backoff Algorithm ```ruby def calculate_exponential_backoff(retry_count, base_delay: 0.1, max_delay: 10.0) delay = base_delay * (2 ** retry_count) [delay, max_delay].min end ``` **Retry Delays**: - Attempt 1: 0.1 seconds - Attempt 2: 0.2 seconds - Attempt 3: 0.4 seconds - Attempt 4: 0.8 seconds - Max: 10.0 seconds ## User Notifications ### Email Notifications All error scenarios trigger email notifications: 1. **Inventory Conflict** (`OrderMailer.inventory_conflict`) - Lists unavailable items - Shows available quantities - Provides link to update cart 2. **Payment Failure** (`OrderMailer.payment_failed`) - Explains payment failure - Provides retry instructions - Includes order details 3. **Partial Fulfillment** (`OrderMailer.partial_fulfillment`) - Lists fulfilled items - Lists unavailable items - Explains next steps ### Event System All errors emit events to Rails event system: - `order.inventory_conflict` - `order.payment_failed` - `order.partial_fulfillment` - `order.deadlock_retry` ## Metrics & Monitoring ### Tracked Metrics 1. **Inventory Conflicts** - `inventory.reservation.conflicts.total` - `inventory.reservation.conflicts.variant.#{variant_id}` 2. **Payment Failures** - `payments.failed.total` - `payments.failed.gateway.#{gateway_type}` 3. **Database Deadlocks** - `database.deadlock.total` - `database.serialization_failure.total` ### Sentry Integration All errors are reported to Sentry with: - Error class and message - Order context (ID, number, user) - Payment intent details (if applicable) - Gateway information - Retry count ## Best Practices 1. **Always Release Inventory on Failure** - Prevents inventory from being held indefinitely - Ensures fair access for all users 2. **Notify Users Promptly** - Email notifications sent immediately - Clear, actionable error messages 3. **Log Detailed Information** - Full error context for debugging - Retry attempts tracked - Performance metrics recorded 4. **Graceful Degradation** - Queue-based fallback for high load - Partial fulfillment when possible - Clear user communication 5. **Monitor Error Rates** - Track error frequencies - Alert on high error rates - Analyze error patterns ## Testing ### Test Error Scenarios ```ruby # Test inventory conflict RSpec.describe "Inventory Conflict Handling" do it "retries with exponential backoff" do # Mock inventory conflict # Verify retry logic # Check user notification end end # Test payment failure RSpec.describe "Payment Failure Handling" do it "releases inventory and notifies user" do # Mock payment failure # Verify inventory release # Check email sent end end # Test database deadlock RSpec.describe "Database Deadlock Handling" do it "retries and falls back to queue" do # Mock deadlock # Verify retry logic # Check queue fallback end end ``` ## Configuration ### Retry Settings ```bash # Maximum retry attempts for order creation ORDER_CREATION_MAX_RETRIES=3 # Base delay for exponential backoff (seconds) ORDER_RETRY_BASE_DELAY=0.1 # Maximum delay for exponential backoff (seconds) ORDER_RETRY_MAX_DELAY=10.0 ``` ## Troubleshooting ### High Inventory Conflict Rate **Symptom**: Many inventory conflicts reported **Solution**: 1. Check inventory levels 2. Review reservation expiration times 3. Consider increasing inventory 4. Analyze concurrent order patterns ### High Payment Failure Rate **Symptom**: Many payment failures **Solution**: 1. Check payment gateway status 2. Review payment gateway logs 3. Verify payment gateway configuration 4. Check for gateway-specific issues ### Frequent Database Deadlocks **Symptom**: Many deadlock errors **Solution**: 1. Review transaction isolation levels 2. Optimize database queries 3. Reduce transaction duration 4. Consider read replicas for reads ## References - [Rails Error Handling](https://guides.rubyonrails.org/active_record_validations.html#working-with-validation-errors) - [PostgreSQL Deadlocks](https://www.postgresql.org/docs/current/explicit-locking.html) - [Exponential Backoff](https://en.wikipedia.org/wiki/Exponential_backoff)