Autonomous Agent
Nightshift
An autonomous engineering system with two shipped loops and a self-maintaining control plane.
Run it before bed. Wake up to a reviewed worktree, a shift log, and a machine-readable record of what the agent actually did. Loop 1 hardens your codebase. Loop 2 builds new features from natural-language specs. One unified daemon auto-selects from six operators — Builder, Reviewer, Overseer, Strategist, Achiever, and a Security Checker preflight — and evaluates itself against a real target repo between sessions.
Install
One-liner
$curl -sL https://raw.githubusercontent.com/Recusive/Nightshift/main/nightshift/scripts/install.sh | bashInstalls to both ~/.codex/skills/nightshift and ~/.claude/skills/nightshift
Add runtime artifacts to .gitignore
Runtime/Nightshift/worktree-*/ Runtime/Nightshift/*.runner.log Runtime/Nightshift/*.state.json
How it works
A Python orchestrator runs the agent in cycles. Each cycle: read repo instructions, find improvements, fix or log them, verify results, record outcomes. The runner enforces policy — the agent provides intelligence.
Discover
Agent reads repo instructions and shift log, picks an area and strategy.
Fix or Log
Small fixes get committed. Larger issues get logged for human review.
Verify
Runner runs the verify command. Failed verification reverts the cycle.
Record
Shift log and state.json updated with what changed and why.
Evaluate
Runner checks guard rails — file counts, fix quality, halt conditions.
Repeat
Next cycle reads the updated log and picks different files and strategies.
Pluggable agents
Same runner, same verification, same policy enforcement. The only difference is which CLI the adapter calls.
Codex
OpenAI Codex CLI. Uses the codex adapter to construct commands and parse structured output.
$python3 -m nightshift run --agent codexClaude
Claude Code CLI. Uses the claude adapter with the same prompt, verification, and policy engine.
$python3 -m nightshift run --agent claudeUsage
Overnight run
$python3 -m nightshift run --agent claude$python3 -m nightshift run --agent codex$python3 -m nightshift test --agent claude --cycles 2$python3 -m nightshift summarize$python3 -m nightshift verify-cycle --worktree-dir PATH --pre-head HASHDefault: 8 hours, 30-minute cycles. If no --agent flag, prompts for selection.
From the installed skill bundle
$~/.codex/skills/nightshift/nightshift/scripts/run.sh --agent claude$~/.codex/skills/nightshift/nightshift/scripts/test.sh --agent claude --cycles 2 --cycle-minutes 5Multi-repo run
$python3 -m nightshift multi /repo1 /repo2 --agent claude$python3 -m nightshift multi /repo1 /repo2 --agent claude --test --cycles 1Validates all repos upfront, runs a full shift on each sequentially, prints aggregate summary.
Feature Builder (Loop 2)
$python3 -m nightshift plan "feature description"$python3 -m nightshift build "feature description" --yes$python3 -m nightshift build --status$python3 -m nightshift build --resume$python3 -m nightshift module-map --writeDaemon modes
$make daemon$make tasks$make check$make test$make dry-run$make quick-test$make cleanDaemon auto-start, task queue summary, local CI gate, full test suite, dry-run preview, quick validation, and cleanup.
Runner-enforced guard rails
Not prompt discipline — real enforcement. Seven verification stages run after every cycle. Failures revert or halt the shift.
Commit + shift log update included?
Touched blocked files or lockfiles? Instant rejection
Repo verification command pass?
Deleted files? Zero tolerance
Balanced across categories and paths?
Exploring different codebase areas?
Prompt or control-file modifications? Flagged explicitly
Configuration
Drop a .nightshift.json in your repo root to override defaults. If verify_command is omitted, Nightshift infers one from package.json, Cargo.toml, go.mod, or pyproject.toml.
.nightshift.json
{
"agent": "codex or claude",
"hours": 8,
"cycle_minutes": 30,
"verify_command": null,
"blocked_paths": [".github/", "deploy/", "deployment/", "infra/", "k8s/", "ops/", "terraform/", "vendor/"],
"blocked_globs": ["*.lock", "package-lock.json", "pnpm-lock.yaml", "yarn.lock", "bun.lockb", "Cargo.lock"],
"max_fixes_per_cycle": 3,
"max_files_per_fix": 5,
"max_files_per_cycle": 12,
"max_low_impact_fixes_per_shift": 4,
"stop_after_failed_verifications": 2,
"stop_after_empty_cycles": 2,
"score_threshold": 3,
"test_incentive_cycle": 3,
"backend_forcing_cycle": 3,
"category_balancing_cycle": 3,
"claude_model": "claude-opus-4-6",
"claude_effort": "max",
"codex_model": "gpt-5.4",
"codex_thinking": "extra_high",
"notification_webhook": null,
"readiness_checks": ["secrets", "debug_prints", "test_coverage"],
"eval_frequency": 5,
"eval_target_repo": "https://github.com/fazxes/Phractal"
}7 discovery strategies
The agent rotates through these strategies every 30-45 minutes for breadth. The shift log should read like a senior engineer explored the whole codebase, not like a linter ran on one folder.
Security
Hardcoded secrets, injection vectors, unsafe eval, path traversal, overly broad permissions
Error Resilience
Missing error boundaries, unhandled promises, API calls without try/catch, crash paths under edge cases
Test Coverage
Critical paths flying blind — business logic without tests, happy-path-only coverage, untested utilities
Accessibility
Missing aria-labels, broken keyboard navigation, focus traps, forms without labels, color-only information
Code Hygiene
Aged TODOs, debug logging in production, dead exports, convention violations, type safety gaps
Performance
Memory leaks, missing lazy loading, N+1 queries, unbounded re-renders, event listeners never cleaned up
Production Polish
Missing loading states, blank empty states, unhelpful error messages, broken responsive layouts
What you wake up to
Four artifacts — all in an isolated git worktree so your working directory is untouched.
Human-readable shift log — executive summary, numbered fixes with reasoning, logged issues that exceeded autonomous scope, and recommendations.
Machine-readable cycle state — cycle counts, categories touched, files changed, verification status, and halt reasons. Enables quick auditing.
Raw runner output — every orchestrator decision, verification result, and policy check logged for debugging.
Isolated review branch with atomic, prefixed commits. Cherry-pick individual fixes or merge the whole branch.
Morning review
Read the shift log, check the state file, review the branch, merge what you want.
$cat Runtime/Nightshift/2026-04-03.md$cat Runtime/Nightshift/2026-04-03.state.json$git log nightshift/2026-04-03 --oneline#Merge and clean up$git merge nightshift/2026-04-03$git worktree remove Runtime/Nightshift/worktree-2026-04-03$git branch -d nightshift/2026-04-03Architecture
The orchestrator is a Python package with strict typing, pluggable agent adapters, and a real control plane — not just a prompt file.
types.py, constants.py, errors.py, shell.py, state.py
config.py, eval_targets.py
cycle.py, scoring.py, readiness.py
profiler.py, planner.py, decomposer.py, subagent.py, coordination.py, integrator.py, e2e.py, summary.py, feature.py
worktree.py, multi.py, module_map.py
nightshift.schema.json, feature.schema.json, task.schema.json
cli.py, __init__.py, __main__.py
Optional per-repo configuration override
What makes it different
Real control plane, not prompt discipline
The original version relied on prompts to enforce safety. This version has a Python orchestrator with typed configs, verification gates, cycle state tracking, and halt conditions. The runner enforces policy — the agent provides intelligence.
Pluggable agents
Codex and Claude run through the same runner, same verification, same policy. The only difference is the CLI adapter. Adding a new agent means writing one adapter module.
Machine-readable output
The state.json file records every cycle — what was fixed, what categories were covered, verification results, and why the shift ended. You can audit a shift programmatically, not just by reading Markdown.
Preflight pentest
The builder starts each session with a red-team security-check preflight. Severity-classified pentest reports surface exploit paths and brittle automation edges before the fixer writes any code.
Self-maintaining control plane
Six autonomous operators — Builder, Reviewer, Overseer, Strategist, Achiever, and Security Checker — maintain task queues, documentation, learnings databases, and cost tracking between sessions. The system manages its own operations.
Session memory
Handoffs (.recursive/handoffs/) carry context between sessions. Learnings (.recursive/learnings/) accumulate 90+ hard-won patterns — "mypy rejects .get() on required TypedDict fields", "sessions die at 500 max turns without warning."
Feature Builder (Loop 2)
Beyond hardening — plan, decompose, build, and test new features overnight. Profiles the target repo, decomposes work into waves, coordinates sub-agents, and maintains build state for resumable workflows.
Security hardening
After-task injection protection via environment variables, PR title sanitization against adversarial input, XML boundary escaping for pentest reports, and a watchdog service with rate-limited auto-restart.
Requirements
Roadmap
Pluggable agent adapters (Codex, Claude)
Runner-enforced guard rails
Diff scoring (1-10) via Scoring Engine v2
Anti-tunnel-vision steering
Multi-repo support
6 operators with signal-driven role selection
Handoffs + learnings (session memory)
Self-evaluation + self-maintaining control plane
Loop 1: Hardening
Loop 2: Feature Builder pipeline
Feature CLI (plan, build, status, resume)
Prompt injection protection
Self-modification guard + snapshot recovery
Cost tracking + budget limits
Configurable models + effort per agent
Interactive daemon setup
Log rotation and cleanup
Watchdog service with rate-limited auto-restart
Security hardening (injection, sanitization, escaping)
5-agent sub-agent review pipeline
GitHub Issues auto-sync to task queue
Autonomy measurement and dependency elimination
Real-repo evaluation fidelity on rejected runs
Automated release tagging and changelog updates
Budget limiter triple-failure fix
Task queue hygiene and session-index fidelity
Monitoring and alerting integrations
847
Tests passing
155+
PRs merged
92%
Vision complete
30
Modules
Open source. MIT license.
847 tests, 155+ merged PRs, 30 modules. An engineering system, not a script. Install in 30 seconds and let it run tonight.