# Observing the Rails cache in production ## Cache Performance Monitoring & Auto-Adjustment Guide ## Overview Comprehensive production monitoring system for cache hit rates with automatic pre-warming frequency adjustment based on performance metrics. --- ## System Components ### 1. CachePerformanceAdjusterService **Purpose**: Automatically adjusts cache pre-warming frequency based on hit rates **Frequency Adjustment Logic**: - **Low Hit Rate (< 60%)**: Pre-warm every **10 minutes** (high frequency) - **Medium Hit Rate (60-75%)**: Pre-warm every **15 minutes** (normal frequency) - **High Hit Rate (> 75%)**: Pre-warm every **30 minutes** (low frequency) **Features**: - Analyzes cache hit rates - Recommends optimal pre-warming frequency - Stores adjustment history - Tracks performance trends **Methods**: - `check_and_adjust` - Check performance and adjust frequency - `get_current_frequency` - Get current pre-warming frequency - `recommend_frequency(hit_rate)` - Get recommended frequency - `get_adjustment_history(limit: 20)` - View adjustment history - `get_performance_trends(window: 24)` - Analyze trends --- ### 2. MonitorCachePerformanceJob **Purpose**: Periodic job to monitor cache performance and send alerts **Configuration**: - **Schedule**: Every 1 hour (production & development) - **Queue**: `analytics` (low priority) **Features**: - Monitors cache hit rates - Checks cache health - Adjusts pre-warming frequency automatically - Sends alerts when performance degrades - Stores monitoring history - Tracks performance trends **Alert Thresholds**: - **Critical** (< 50% hit rate): Error-level alert + Sentry notification - **Warning** (< 60% hit rate): Warning-level alert + Sentry notification --- ## Monitoring Dashboard ### Admin Interface **Route**: `/admin/cache_monitoring` **Features**: - View current cache statistics - View cache health status - View current pre-warming frequency - View adjustment history - View performance trends - Manual actions: - Trigger pre-warming - Check and adjust frequency - Run performance monitoring ### JSON API **Route**: `/admin/cache_monitoring/stats` **Response**: ```json { "status": "success", "data": { "show": { "hits": 100, "misses": 20, "total": 120, "hit_rate": 83.33, "avg_hit_duration_ms": 50.2, "avg_miss_duration_ms": 250.5, "improvement_ms": 200.3 }, "overall": { "hits": 500, "misses": 100, "total": 600, "hit_rate": 83.33 } }, "timestamp": "2025-01-09T12:00:00Z" } ``` --- ## Automatic Frequency Adjustment ### How It Works 1. **MonitorCachePerformanceJob** runs every hour 2. Collects cache statistics (hit rates, request counts) 3. **CachePerformanceAdjusterService** analyzes performance 4. Compares current frequency with recommended frequency 5. Adjusts if difference > 5 minutes 6. Logs adjustment and stores in history ### Adjustment Criteria - **Minimum Data**: Requires at least 100 requests before adjusting - **Threshold**: Only adjusts if difference > 5 minutes (prevents thrashing) - **Recommendation**: Based on overall hit rate ### Manual Adjustment ```ruby # Check and adjust result = CachePerformanceAdjusterService.check_and_adjust # => { adjusted: true, previous_frequency: 15, new_frequency: 10, ... } # Get current frequency frequency = CachePerformanceAdjusterService.get_current_frequency # => 15 # Get adjustment history history = CachePerformanceAdjusterService.get_adjustment_history(limit: 10) # => [{ timestamp: "...", frequency: 10, hit_rate: 58.5 }, ...] ``` --- ## Alerting ### Alert Types 1. **Critical Alert** (Hit Rate < 50%) - Logged as ERROR - Sent to Sentry (error level) - Includes recommendations 2. **Warning Alert** (Hit Rate < 60%) - Logged as WARNING - Sent to Sentry (warning level) - Includes recommendations ### Alert Channels - **Logs**: Rails logger (ERROR/WARNING level) - **Sentry**: Automatic error tracking (if configured) - **Future**: Email/Slack notifications (can be added) --- ## Performance Metrics ### Tracked Metrics 1. **Cache Hit Rate**: Percentage of requests served from cache 2. **Response Times**: Average duration for hits vs misses 3. **Performance Improvement**: Time saved by cache hits 4. **Trend Analysis**: Performance over time (improving/degrading/stable) ### Storage - **Latest Monitoring**: Stored in `cache_monitoring:latest` (2 hour expiry) - **History**: Stored in `cache_monitoring:history` (last 100 entries, 7 day expiry) - **Adjustment History**: Stored in `cache_prewarming:adjustment_history` (last 50 entries, 7 day expiry) --- ## Configuration ### Recurring Jobs (`config/recurring.yml`) ```yaml production: prewarm_product_cache: class: PrewarmProductCacheJob queue: low schedule: every 15 minutes # Base frequency (adjusted automatically) monitor_cache_performance: class: MonitorCachePerformanceJob queue: analytics schedule: every 1 hour ``` **Note**: The pre-warming frequency in `recurring.yml` is the base frequency. The system will recommend adjustments, but actual schedule changes require updating `recurring.yml` and restarting the job scheduler. --- ## Usage Examples ### Manual Monitoring ```ruby # Run monitoring job result = MonitorCachePerformanceJob.perform_now # => { stats: {...}, health: {...}, adjustment: {...}, trends: {...} } # Check cache health health = ProductCacheMonitoringService.check_cache_health # => { healthy: true, overall_hit_rate: 85.5%, recommendations: [] } # Get cache statistics stats = ProductCacheMonitoringService.get_cache_stats # => { show: {...}, index: {...}, overall: {...} } ``` ### Frequency Adjustment ```ruby # Check and adjust adjustment = CachePerformanceAdjusterService.check_and_adjust # => { adjusted: true, previous_frequency: 15, new_frequency: 10, ... } # Get current frequency frequency = CachePerformanceAdjusterService.get_current_frequency # => 10 # Get performance trends trends = CachePerformanceAdjusterService.get_performance_trends # => { recent_hit_rate: 58.5, daily_hit_rate: 62.3, trend: "degrading", ... } ``` --- ## Production Deployment ### Initial Setup 1. **Deploy Code**: All services and jobs are ready 2. **Configure Recurring Jobs**: Already configured in `recurring.yml` 3. **Start Job Scheduler**: Ensure Solid Queue is running 4. **Monitor**: Check `/admin/cache_monitoring` after deployment ### Monitoring Checklist - [ ] Verify `MonitorCachePerformanceJob` runs every hour - [ ] Verify `PrewarmProductCacheJob` runs at base frequency (15min) - [ ] Check cache hit rates in admin dashboard - [ ] Review adjustment history - [ ] Set up Sentry alerts (if not already configured) - [ ] Monitor logs for warnings/errors ### Expected Behavior 1. **First Hour**: System collects baseline metrics 2. **After 100+ Requests**: System can start adjusting frequency 3. **Low Hit Rate**: Frequency increases to 10 minutes 4. **High Hit Rate**: Frequency decreases to 30 minutes 5. **Alerts**: Sent when hit rate drops below thresholds --- ## Troubleshooting ### Low Hit Rates **Symptoms**: Hit rate < 60% **Possible Causes**: - Cache expiry too short - Pre-warming not running - Too many product updates - Cache invalidation too frequent **Solutions**: 1. Check pre-warming job is running 2. Review cache expiry times 3. Increase pre-warming frequency (manual or automatic) 4. Review cache invalidation logic ### Frequency Not Adjusting **Symptoms**: Frequency stays at base (15 minutes) **Possible Causes**: - Insufficient data (< 100 requests) - Hit rate in normal range (60-75%) - Difference < 5 minutes (no adjustment needed) **Solutions**: 1. Wait for more requests 2. Check hit rate is outside 60-75% range 3. Manually trigger adjustment: `CachePerformanceAdjusterService.check_and_adjust` ### Alerts Not Firing **Symptoms**: Low hit rate but no alerts **Possible Causes**: - Monitoring job not running - Sentry not configured - Alerts disabled **Solutions**: 1. Check `MonitorCachePerformanceJob` is scheduled 2. Verify Sentry configuration 3. Check logs for alert messages --- ## Files Created 1. **`app/services/cache_performance_adjuster_service.rb`** - Frequency adjustment service 2. **`app/jobs/monitor_cache_performance_job.rb`** - Monitoring job --- ## Files Modified 1. **`config/recurring.yml`** - Added monitoring job schedule 2. **`app/controllers/admin/cache_monitoring_controller.rb`** - Added monitoring dashboard features 3. **`config/routes.rb`** - Added monitoring routes --- ## Next Steps 1. ✅ **System Complete**: Monitoring and auto-adjustment implemented 2. 🔄 **Deploy**: Deploy to production 3. 📊 **Monitor**: Watch cache hit rates in admin dashboard 4. 🚨 **Alert**: Set up additional alert channels (email/Slack) if needed 5. 📈 **Optimize**: Review adjustment history and fine-tune thresholds --- ## Conclusion The cache performance monitoring system is now fully operational: - ✅ Automatic monitoring every hour - ✅ Automatic frequency adjustment based on hit rates - ✅ Alerting for degraded performance - ✅ Admin dashboard for visibility - ✅ Performance trend analysis **Expected Result**: Cache hit rates will be automatically optimized, and alerts will notify you when performance degrades.