Observing the Rails cache in production
Cache Performance Monitoring & Auto-Adjustment Guide
Overview
Comprehensive production monitoring system for cache hit rates with automatic pre-warming frequency adjustment based on performance metrics.
System Components
1. CachePerformanceAdjusterService
Purpose: Automatically adjusts cache pre-warming frequency based on hit rates
Frequency Adjustment Logic:
- Low Hit Rate (< 60%): Pre-warm every 10 minutes (high frequency)
- Medium Hit Rate (60-75%): Pre-warm every 15 minutes (normal frequency)
- High Hit Rate (> 75%): Pre-warm every 30 minutes (low frequency)
Features:
- Analyzes cache hit rates
- Recommends optimal pre-warming frequency
- Stores adjustment history
- Tracks performance trends
Methods:
check_and_adjust- Check performance and adjust frequencyget_current_frequency- Get current pre-warming frequencyrecommend_frequency(hit_rate)- Get recommended frequencyget_adjustment_history(limit: 20)- View adjustment historyget_performance_trends(window: 24)- Analyze trends
2. MonitorCachePerformanceJob
Purpose: Periodic job to monitor cache performance and send alerts
Configuration:
- Schedule: Every 1 hour (production & development)
- Queue:
analytics(low priority)
Features:
- Monitors cache hit rates
- Checks cache health
- Adjusts pre-warming frequency automatically
- Sends alerts when performance degrades
- Stores monitoring history
- Tracks performance trends
Alert Thresholds:
- Critical (< 50% hit rate): Error-level alert + Sentry notification
- Warning (< 60% hit rate): Warning-level alert + Sentry notification
Monitoring Dashboard
Admin Interface
Route: /admin/cache_monitoring
Features:
- View current cache statistics
- View cache health status
- View current pre-warming frequency
- View adjustment history
- View performance trends
- Manual actions:
- Trigger pre-warming
- Check and adjust frequency
- Run performance monitoring
JSON API
Route: /admin/cache_monitoring/stats
Response:
{
"status": "success",
"data": {
"show": {
"hits": 100,
"misses": 20,
"total": 120,
"hit_rate": 83.33,
"avg_hit_duration_ms": 50.2,
"avg_miss_duration_ms": 250.5,
"improvement_ms": 200.3
},
"overall": {
"hits": 500,
"misses": 100,
"total": 600,
"hit_rate": 83.33
}
},
"timestamp": "2025-01-09T12:00:00Z"
}
Automatic Frequency Adjustment
How It Works
- MonitorCachePerformanceJob runs every hour
- Collects cache statistics (hit rates, request counts)
- CachePerformanceAdjusterService analyzes performance
- Compares current frequency with recommended frequency
- Adjusts if difference > 5 minutes
- Logs adjustment and stores in history
Adjustment Criteria
- Minimum Data: Requires at least 100 requests before adjusting
- Threshold: Only adjusts if difference > 5 minutes (prevents thrashing)
- Recommendation: Based on overall hit rate
Manual Adjustment
# Check and adjust
result = CachePerformanceAdjusterService.check_and_adjust
# => { adjusted: true, previous_frequency: 15, new_frequency: 10, ... }
# Get current frequency
frequency = CachePerformanceAdjusterService.get_current_frequency
# => 15
# Get adjustment history
history = CachePerformanceAdjusterService.get_adjustment_history(limit: 10)
# => [{ timestamp: "...", frequency: 10, hit_rate: 58.5 }, ...]
Alerting
Alert Types
- Critical Alert (Hit Rate < 50%)
- Logged as ERROR
- Sent to Sentry (error level)
- Includes recommendations
- Warning Alert (Hit Rate < 60%)
- Logged as WARNING
- Sent to Sentry (warning level)
- Includes recommendations
Alert Channels
- Logs: Rails logger (ERROR/WARNING level)
- Sentry: Automatic error tracking (if configured)
- Future: Email/Slack notifications (can be added)
Performance Metrics
Tracked Metrics
- Cache Hit Rate: Percentage of requests served from cache
- Response Times: Average duration for hits vs misses
- Performance Improvement: Time saved by cache hits
- Trend Analysis: Performance over time (improving/degrading/stable)
Storage
- Latest Monitoring: Stored in
cache_monitoring:latest(2 hour expiry) - History: Stored in
cache_monitoring:history(last 100 entries, 7 day expiry) - Adjustment History: Stored in
cache_prewarming:adjustment_history(last 50 entries, 7 day expiry)
Configuration
Recurring Jobs (config/recurring.yml)
production:
prewarm_product_cache:
class: PrewarmProductCacheJob
queue: low
schedule: every 15 minutes # Base frequency (adjusted automatically)
monitor_cache_performance:
class: MonitorCachePerformanceJob
queue: analytics
schedule: every 1 hour
Note: The pre-warming frequency in recurring.yml is the base frequency. The system will recommend adjustments, but actual schedule changes require updating recurring.yml and restarting the job scheduler.
Usage Examples
Manual Monitoring
# Run monitoring job
result = MonitorCachePerformanceJob.perform_now
# => { stats: {...}, health: {...}, adjustment: {...}, trends: {...} }
# Check cache health
health = ProductCacheMonitoringService.check_cache_health
# => { healthy: true, overall_hit_rate: 85.5%, recommendations: [] }
# Get cache statistics
stats = ProductCacheMonitoringService.get_cache_stats
# => { show: {...}, index: {...}, overall: {...} }
Frequency Adjustment
# Check and adjust
adjustment = CachePerformanceAdjusterService.check_and_adjust
# => { adjusted: true, previous_frequency: 15, new_frequency: 10, ... }
# Get current frequency
frequency = CachePerformanceAdjusterService.get_current_frequency
# => 10
# Get performance trends
trends = CachePerformanceAdjusterService.get_performance_trends
# => { recent_hit_rate: 58.5, daily_hit_rate: 62.3, trend: "degrading", ... }
Production Deployment
Initial Setup
- Deploy Code: All services and jobs are ready
- Configure Recurring Jobs: Already configured in
recurring.yml - Start Job Scheduler: Ensure Solid Queue is running
- Monitor: Check
/admin/cache_monitoringafter deployment
Monitoring Checklist
- Verify
MonitorCachePerformanceJobruns every hour - Verify
PrewarmProductCacheJobruns at base frequency (15min) - Check cache hit rates in admin dashboard
- Review adjustment history
- Set up Sentry alerts (if not already configured)
- Monitor logs for warnings/errors
Expected Behavior
- First Hour: System collects baseline metrics
- After 100+ Requests: System can start adjusting frequency
- Low Hit Rate: Frequency increases to 10 minutes
- High Hit Rate: Frequency decreases to 30 minutes
- Alerts: Sent when hit rate drops below thresholds
Troubleshooting
Low Hit Rates
Symptoms: Hit rate < 60%
Possible Causes:
- Cache expiry too short
- Pre-warming not running
- Too many product updates
- Cache invalidation too frequent
Solutions:
- Check pre-warming job is running
- Review cache expiry times
- Increase pre-warming frequency (manual or automatic)
- Review cache invalidation logic
Frequency Not Adjusting
Symptoms: Frequency stays at base (15 minutes)
Possible Causes:
- Insufficient data (< 100 requests)
- Hit rate in normal range (60-75%)
- Difference < 5 minutes (no adjustment needed)
Solutions:
- Wait for more requests
- Check hit rate is outside 60-75% range
- Manually trigger adjustment:
CachePerformanceAdjusterService.check_and_adjust
Alerts Not Firing
Symptoms: Low hit rate but no alerts
Possible Causes:
- Monitoring job not running
- Sentry not configured
- Alerts disabled
Solutions:
- Check
MonitorCachePerformanceJobis scheduled - Verify Sentry configuration
- Check logs for alert messages
Files Created
app/services/cache_performance_adjuster_service.rb- Frequency adjustment serviceapp/jobs/monitor_cache_performance_job.rb- Monitoring job
Files Modified
config/recurring.yml- Added monitoring job scheduleapp/controllers/admin/cache_monitoring_controller.rb- Added monitoring dashboard featuresconfig/routes.rb- Added monitoring routes
Next Steps
- ✅ System Complete: Monitoring and auto-adjustment implemented
- 🔄 Deploy: Deploy to production
- 📊 Monitor: Watch cache hit rates in admin dashboard
- 🚨 Alert: Set up additional alert channels (email/Slack) if needed
- 📈 Optimize: Review adjustment history and fine-tune thresholds
Conclusion
The cache performance monitoring system is now fully operational:
- ✅ Automatic monitoring every hour
- ✅ Automatic frequency adjustment based on hit rates
- ✅ Alerting for degraded performance
- ✅ Admin dashboard for visibility
- ✅ Performance trend analysis
Expected Result: Cache hit rates will be automatically optimized, and alerts will notify you when performance degrades.