hive supports two fundamentally different ways to drive an agent. Choose based on how much control you want to keep.
The agent registers its own cron job (/loop 15m …) and fires on that cadence indefinitely. The supervisor’s job is to keep the session alive and respawn it if it crashes. Low operator involvement; the agent runs autonomously.
Best for: Single-agent setups, batch jobs, anything where the cadence is fixed and you trust the agent to stay on task.
The agent starts, reads its policy, then waits at the prompt for the supervisor to send work orders via tmux send-keys. No cron, no self-scheduling. The supervisor (another Claude Code session, a script, or a human) decides when to fire and what to do.
Best for: Multi-agent setups where you want a single controller to prioritize across several agents, production workflows where you need to inspect output before triggering the next step, or any situation where the agent kept re-starting its own loop despite being told not to.
Gotcha — session restore bakes in old crons. Claude Code restores its previous conversation context on respawn. If the agent ever registered a
/loopcron before, that cron comes back in the restored context even if the newAGENT_LOOP_PROMPTsays not to. The preferred fix is to enforce EXECUTOR MODE via policy files the agent re-reads on every firing — not by having the supervisor send a cron-nuke message. Supervisor should never inspect or delete crontabs; policy is the enforcement mechanism.
Gotcha — tmux
-lmakes Enter literal. When dispatching work orders, always split text and Enter into two separatetmux send-keyscalls:tmux send-keys -t session -l "do the thing" sleep 1 tmux send-keys -t session EnterCombining them as
tmux send-keys -t session -l "do the thing" Entersends the word “Enter” as part of the literal text, leaving the agent stuck with text in its input box.
| # | Unit | Trigger | Catches |
|---|---|---|---|
| 1 | hive.service | Always running; internal poll every AGENT_POLL_SEC (default 10s) | Agent process crash, tmux session killed, TUI-ready detection for startup prompt injection, auto-approval of a known sensitive-file prompt |
| 2 | hive-renew.timer | Every 6 days + 5 min after boot | Claude Code /loop cron auto-expires at 7 days — kills the session so the supervisor re-registers a fresh. Disable this in EXECUTOR MODE — there is no cron to renew. |
| 3 | hive-healthcheck.timer | Every 20 min + 5 min after boot | Agent is “alive” but not making progress (auth loop, stuck prompt, model stuck thinking) — watches heartbeat-file mtime |
| 4 | ntfy push inside the healthcheck | On stall, on recovery, on escalation | Operator not watching the box — phone push |
When running several agents on the same machine, the EXECUTOR pattern lets a single supervisor session coordinate all of them without the agents conflicting:
┌─────────────────────────────────────┐
│ supervisor session (Mac) │
│ /loop — sweeps every 20-25 min │
│ sends tmux work orders to agents │
└──────┬──────────┬──────────┬────────┘
│ │ │
▼ ▼ ▼
scanner reviewer outreach
(Opus 4.7) (Sonnet) (Sonnet)
hive hive hive
tmux tmux tmux
Each agent:
bd / beads) using --actor <name> to claim workbd list --actor=<other> --status=in_progress)Renew timers are disabled for all agents in EXECUTOR MODE. The supervisor sends a fresh startup + cron-nuke on every respawn automatically.
agent.env. The agent should source them from its own credential store (~/.claude/.credentials.json for Claude Code, vault / secrets manager for anything else).Models A and B both put the AI agent on a periodic loop. A third pattern — used in production on KubeStellar — decouples scanning from fixing:
/loop cron, or EXECUTOR work order) and fixes what’s actionable.This is not a new scheduling model — it’s a composition of the existing patterns with a deterministic scanner in front and GitHub as an event source.
| Problem | How the hybrid solves it |
|---|---|
| AI session restarts / rate limits cause missed scans | Scanner runs independently — state is never lost |
| Scanning is deterministic but consumes LLM tokens | Scanner is pure bash — zero LLM cost |
| No audit trail of what was scanned | cycles table in SQLite records every scan |
| Workflow failures go unnoticed for days | workflow-failure-issue.yml auto-files issues within minutes |
| Fix attempts need backoff | fix_attempts counter prevents infinite retries |
┌──────────────────────┐
│ GitHub (source of │
│ truth for issues/PRs)│
└──────────┬───────────┘
│
gh issue list / gh pr list
│
┌──────────────────────────────────┼──────────────────────────────┐
│ Local machine (Mac / Linux) │ │
│ ▼ │
│ ┌─────────┐ ┌──────────┐ ┌──────────┐ ┌───────────┐ │
│ │ launchd │───▶│worker.sh │───▶│ state.db │◀───│ AI agent │ │
│ │ / cron │ │(scanner) │ │ (SQLite) │ │(reads DB, │ │
│ └─────────┘ └────┬─────┘ └──────────┘ │ fixes) │ │
│ │ └─────┬─────┘ │
│ ntfy push git push │
│ │ gh pr create │
│ ▼ │ │
│ ┌──────────┐ │ │
│ │ phone │ │ │
│ └──────────┘ │ │
└────────────────────────────────────────────────────────┼────────┘
│
mutates GitHub state (PRs, merges)
│
▼
┌──────────────────────────────────────┐
│ GitHub Actions (automated responders)│
│ │
│ workflow-failure-issue.yml │
│ → auto-files issue on failure │
│ │
│ ai-fix.yml │
│ → auto-dispatches fix on label │
└──────────────────────────────────────┘
Data flow boundary: GitHub Actions write to GitHub (issues, labels). The local scanner reads from GitHub and writes to SQLite. The AI agent reads SQLite and writes to GitHub. No component writes directly to another’s state store.
examples/worker.sh.example — the scanner scriptexamples/sqlite-state.md — SQLite schema and query patternsexamples/kubestellar-fixer.md — full case study with resultslaunchd/ — macOS plist templates for the scanner and supervisor