worker-bee

Ecosystem health verification and logging

worker-bee

The guardian of your AI infrastructure

Worker-bee continuously monitors the health of all F3L1X realms, logs activity for audit trails, and alerts you when services degrade or fail. It's your operational safety net—the thing that notices when things go wrong before you do.


What It Does

Worker-bee runs background checks on all 40+ realms in the F3L1X ecosystem. It:

  1. Pings each realm - Verifies every service is responsive
  2. Checks dependencies - Ensures Herald, Redis, PostgreSQL are running
  3. Monitors resources - Tracks CPU, memory, disk usage
  4. Logs all activity - Creates audit trail for compliance and debugging
  5. Alerts on failures - Notifies you when services degrade

Think of worker-bee as your on-call operations engineer, working 24/7 without needing coffee breaks.


Key Capabilities

Realm Monitoring

  • Health Checks: Ping all realms every 30 seconds
  • Dependency Verification: Ensure Herald, Redis, PostgreSQL are accessible
  • Port Availability: Detect port conflicts and blocked ports
  • Service Responsiveness: Track response times (P50, P95, P99 latency)

Auto-Recovery

  • Service Restart: Auto-restart crashed realms (configurable per realm)
  • Dependency Recovery: Restart Redis if message queuing fails
  • Graceful Degradation: Continue operating if secondary services fail
  • Recovery Logging: Record all restart events for audit trail

Activity Logging

  • Request Logs: Every HTTP request logged with timestamp and status
  • Error Logs: Full stack traces for exceptions
  • Audit Trail: Compliance logging for security-sensitive operations
  • Compressed Storage: Logs rotated and compressed daily

Alerting System

  • Email Alerts: Critical failures sent to configured email
  • Slack Integration: Channel notifications for service issues
  • Severity Levels: Critical/warning/info classification
  • Silence Rules: Prevent alert fatigue on expected downtime

Accessing worker-bee

Web Dashboard

URL: http://127.0.0.1:8082

The worker-bee dashboard shows:
- Realm status grid (green/yellow/red)
- Uptime percentages for each service
- Resource usage graphs
- Recent alerts and recovery actions

Command Line Interface

# Run ecosystem verification
python manage.py verify-ecosystem

# Check specific realm
python manage.py check-realm <realm-name>

# View recent alerts
python manage.py alerts --limit 50

# Generate health report
python manage.py health-report

Metrics & Alerts API

Endpoint Purpose Method
/api/health/ Overall ecosystem status GET
/api/realms/ Status of all realms GET
/api/realms/<name>/ Status of single realm GET
/api/alerts/ Recent alerts GET

Common Use Cases

Use Case 1: Check Ecosystem Health Before Work

Goal: Verify all services are healthy before starting session

  1. Open worker-bee dashboard at :8082
  2. View "Realm Status Grid"
  3. Check for red (down) or yellow (degraded) services
  4. Green = all healthy, ready to work
  5. If problems found, click realm for details

Use Case 2: Investigate Service Outage

Goal: Understand what happened when a realm crashed

  1. Open worker-bee dashboard
  2. Click "Recent Alerts" section
  3. Find alert for failed realm
  4. Review timestamp and error message
  5. Click realm name to see recovery log
  6. Check if auto-restart succeeded

Use Case 3: Monitor Long-Running Job

Goal: Ensure services stay healthy during intensive tasks

  1. Start long-running operation
  2. Open worker-bee dashboard
  3. Monitor resource graphs (CPU, memory, disk)
  4. Watch for service degradation
  5. Worker-bee alerts you if problems occur

Use Case 4: Configure Auto-Recovery

Goal: Enable automatic service restart on failure

Configuration in worker-bee settings:
- Essential Realms (auto-restart enabled):
- Herald (critical dependency)
- PostgreSQL (critical dependency)
- Redis (critical dependency)

  • Non-Critical Realms (manual restart):
  • Dashboard features
  • Analysis tools
  • Experimental services

Important Notes

Monitoring Standards

Health Check Criteria:
- Service responds to HTTP request within 5 seconds
- PostgreSQL accepts connections
- Redis responds to PING command
- Disk space >5% available

Alert Thresholds:
- Critical: Service down 2+ minutes
- Warning: Response time >2 seconds
- Info: Resource usage >80%

Auto-Recovery Behavior

When a realm crashes:
1. Worker-bee detects failure
2. Sends alert if configured
3. If auto-restart enabled: Waits 10 seconds, restarts realm
4. If auto-restart disabled: Alerts you to manually restart
5. Logs recovery attempt with result

Critical services will auto-restart:
- Herald (all realms depend on it)
- PostgreSQL (data storage)
- Redis (message queuing)

Optional services require manual restart:
- Most realm services
- Analysis/experimentation tools

Log Retention & Compliance

Log Storage:
- Recent 30 days: Full detail, searchable
- 31-90 days: Compressed, archive access
- 90+ days: Long-term archive (rarely accessed)

Log Deletion:
- Automatically deleted after 1 year
- Manual deletion requires confirmation
- Compliance mode keeps all logs indefinitely

Search logs:

python manage.py search-logs --realm <name> --date 2026-02-17

Troubleshooting

Dashboard shows red for realm but service is running

Symptom: Realm appears down but you know it's running
Fix: Realm port may be blocked, check firewall, restart realm

Constant alerts for non-critical service

Symptom: Getting too many alerts for optional realm
Fix: Adjust alert severity rules or disable alerts for that realm

Auto-restart not working

Symptom: Realm crashes but doesn't automatically restart
Fix: Check if auto-restart is enabled for that realm in worker-bee settings

Logs growing too large

Symptom: Disk usage increasing rapidly
Fix: Run python manage.py rotate-logs to compress old logs

Resource graphs show memory leak

Symptom: Memory usage increasing over time
Fix: Check realm logs for errors, may need to restart specific service


  • herald - Worker-bee monitors herald for overall ecosystem health
  • doc-u-me - Worker-bee logs activity accessible via doc-u-me search
  • f3l1x-dashboard - Shows worker-bee health data in realm status
  • All other realms - All services monitored by worker-bee

Further Reading