Autonomous Agent

Nightshift

Two products. One repo. An autonomous engineering product with two shipped loops and a portable orchestration framework for any codebase.

Run it before bed. Wake up to a reviewed worktree, a shift log, and a machine-readable record of what the agent actually did. Owl (Loop 1) hardens your codebase. Raven (Loop 2) builds new features from natural-language specs. The Recursive framework underneath auto-selects from six operators — Builder, Reviewer, Overseer, Strategist, Achiever, and a Security Checker preflight — and evaluates itself against a real target repo between sessions.

CodexClaude Code

Install

One-liner

$curl -sL https://raw.githubusercontent.com/Recusive/Nightshift/main/nightshift/scripts/install.sh | bash

Installs to both ~/.codex/skills/nightshift and ~/.claude/skills/nightshift

Add runtime artifacts to .gitignore

Runtime/Nightshift/worktree-*/
Runtime/Nightshift/*.runner.log
Runtime/Nightshift/*.state.json

How it works

A Python orchestrator runs the agent in cycles. Each cycle: read repo instructions, find improvements, fix or log them, verify results, record outcomes. The runner enforces policy — the agent provides intelligence.

Discover

Agent reads repo instructions and shift log, picks an area and strategy.

Fix or Log

Small fixes get committed. Larger issues get logged for human review.

Verify

Runner runs the verify command. Failed verification reverts the cycle.

Record

Shift log and state.json updated with what changed and why.

Evaluate

Runner checks guard rails — file counts, fix quality, halt conditions.

Repeat

Next cycle reads the updated log and picks different files and strategies.

Pluggable agents

Same runner, same verification, same policy enforcement. The only difference is which CLI the adapter calls.

Codex

OpenAI Codex CLI. Uses the codex adapter to construct commands and parse structured output.

$python3 -m nightshift run --agent codex

Claude

Claude Code CLI. Uses the claude adapter with the same prompt, verification, and policy engine.

$python3 -m nightshift run --agent claude

Usage

Overnight run

$python3 -m nightshift run --agent claude

$python3 -m nightshift run --agent codex

$python3 -m nightshift test --agent claude --cycles 2

$python3 -m nightshift summarize

$python3 -m nightshift verify-cycle --worktree-dir PATH --pre-head HASH

Default: 8 hours, 30-minute cycles. If no --agent flag, prompts for selection.

From the installed skill bundle

$~/.codex/skills/nightshift/nightshift/scripts/run.sh --agent claude

$~/.codex/skills/nightshift/nightshift/scripts/test.sh --agent claude --cycles 2 --cycle-minutes 5

Multi-repo run

$python3 -m nightshift multi /repo1 /repo2 --agent claude

$python3 -m nightshift multi /repo1 /repo2 --agent claude --test --cycles 1

Validates all repos upfront, runs a full shift on each sequentially, prints aggregate summary.

Feature Builder (Loop 2)

$python3 -m nightshift plan "feature description"

$python3 -m nightshift build "feature description" --yes

$python3 -m nightshift build --status

$python3 -m nightshift build --resume

$python3 -m nightshift module-map --write

Daemon modes

$make daemon

$make tasks

$make check

$make test

$make dry-run

$make quick-test

$make clean

Daemon auto-start, task queue summary, local CI gate, full test suite, dry-run preview, quick validation, and cleanup.

Runner-enforced guard rails

Not prompt discipline — real enforcement. Eight verification stages run after every cycle. Failures revert or halt the shift.

Commit + shift log update included?

Touched blocked files or lockfiles? Instant rejection

Repo verification command pass?

Deleted files? Zero tolerance

Balanced across categories and paths?

Exploring different codebase areas?

Prompt or control-file modifications? Flagged explicitly

Circuit breaker: stops after 3 consecutive failures

Configuration

Drop a .nightshift.json in your repo root to override defaults. If verify_command is omitted, Nightshift infers one from package.json, Cargo.toml, go.mod, or pyproject.toml.

.nightshift.json

{
  "agent": "codex or claude",
  "hours": 8,
  "cycle_minutes": 30,
  "verify_command": null,
  "blocked_paths": [".github/", "deploy/", "deployment/", "infra/", "k8s/", "ops/", "terraform/", "vendor/"],
  "blocked_globs": ["*.lock", "package-lock.json", "pnpm-lock.yaml", "yarn.lock", "bun.lockb", "Cargo.lock"],
  "max_fixes_per_cycle": 3,
  "max_files_per_fix": 5,
  "max_files_per_cycle": 12,
  "max_low_impact_fixes_per_shift": 4,
  "stop_after_failed_verifications": 2,
  "stop_after_empty_cycles": 2,
  "score_threshold": 3,
  "test_incentive_cycle": 3,
  "backend_forcing_cycle": 3,
  "category_balancing_cycle": 3,
  "claude_model": "claude-opus-4-6",
  "claude_effort": "max",
  "codex_model": "gpt-5.4",
  "codex_thinking": "extra_high",
  "notification_webhook": null,
  "readiness_checks": ["secrets", "debug_prints", "test_coverage"],
  "eval_frequency": 5,
  "eval_target_repo": "https://github.com/fazxes/Phractal"
}

7 discovery strategies

The agent rotates through these strategies every 30-45 minutes for breadth. The shift log should read like a senior engineer explored the whole codebase, not like a linter ran on one folder.

Security

Hardcoded secrets, injection vectors, unsafe eval, path traversal, overly broad permissions

Error Resilience

Missing error boundaries, unhandled promises, API calls without try/catch, crash paths under edge cases

Test Coverage

Critical paths flying blind — business logic without tests, happy-path-only coverage, untested utilities

Accessibility

Missing aria-labels, broken keyboard navigation, focus traps, forms without labels, color-only information

Code Hygiene

Aged TODOs, debug logging in production, dead exports, convention violations, type safety gaps

Performance

Memory leaks, missing lazy loading, N+1 queries, unbounded re-renders, event listeners never cleaned up

Production Polish

Missing loading states, blank empty states, unhelpful error messages, broken responsive layouts

What you wake up to

Four artifacts — all in an isolated git worktree so your working directory is untouched.

Runtime/Nightshift/YYYY-MM-DD.md

Human-readable shift log — executive summary, numbered fixes with reasoning, logged issues that exceeded autonomous scope, and recommendations.

Runtime/Nightshift/YYYY-MM-DD.state.json

Machine-readable cycle state — cycle counts, categories touched, files changed, verification status, and halt reasons. Enables quick auditing.

Runtime/Nightshift/YYYY-MM-DD.runner.log

Raw runner output — every orchestrator decision, verification result, and policy check logged for debugging.

nightshift/YYYY-MM-DD

Isolated review branch with atomic, prefixed commits. Cherry-pick individual fixes or merge the whole branch.

Morning review

Read the shift log, check the state file, review the branch, merge what you want.

$cat Runtime/Nightshift/2026-04-03.md

$cat Runtime/Nightshift/2026-04-03.state.json

$git log nightshift/2026-04-03 --oneline

#Merge and clean up

$git merge nightshift/2026-04-03

$git worktree remove Runtime/Nightshift/worktree-2026-04-03

$git branch -d nightshift/2026-04-03

Architecture

Nightshift is a Python package with strict typing and pluggable agent adapters. Recursive is the orchestration framework underneath — daemon loop, role selection, operator prompts, sub-agent reviews, and session memory.

core/

types.py, constants.py, errors.py, shell.py, state.py

settings/

config.py, eval_targets.py

owl/ (Loop 1)

cycle.py, scoring.py, readiness.py

raven/ (Loop 2)

profiler.py, planner.py, decomposer.py, subagent.py, coordination.py, integrator.py, e2e.py, summary.py, feature.py

infra/

worktree.py, multi.py, module_map.py

schemas/

nightshift.schema.json, feature.schema.json, task.schema.json

CLI surface

cli.py, __init__.py, __main__.py

.nightshift.json

Optional per-repo configuration override

.recursive/engine/

daemon.sh, lib-agent.sh, pick-role.py, watchdog.sh, format-stream.py

.recursive/operators/

build/, review/, oversee/, strategize/, achieve/, security-check/

.recursive/agents/

code-reviewer.md, architecture-reviewer.md, docs-reviewer.md, safety-reviewer.md, meta-reviewer.md

.recursive/lib/

cleanup.py, compact.py, config.py, costs.py, evaluation.py

Python 3.9+30 modulesGit worktree isolationPluggable adaptersStrict typing (mypy --strict)Ruff lintingmake check gateMIT license

What makes it different

Real control plane, not prompt discipline

The original version relied on prompts to enforce safety. This version has a Python orchestrator with typed configs, verification gates, cycle state tracking, and halt conditions. The runner enforces policy — the agent provides intelligence.

Pluggable agents

Codex and Claude run through the same runner, same verification, same policy. The only difference is the CLI adapter. Adding a new agent means writing one adapter module.

Machine-readable output

The state.json file records every cycle — what was fixed, what categories were covered, verification results, and why the shift ended. You can audit a shift programmatically, not just by reading Markdown.

Preflight pentest

The builder starts each session with a red-team security-check preflight. Severity-classified pentest reports surface exploit paths and brittle automation edges before the fixer writes any code.

Self-maintaining control plane

Six autonomous operators — Builder, Reviewer, Overseer, Strategist, Achiever, and Security Checker — maintain task queues, documentation, learnings databases, and cost tracking between sessions. A 5-agent sub-agent review pipeline (code, architecture, docs, safety, meta) reviews every PR before merge.

Session memory

Handoffs (.recursive/handoffs/) carry context between sessions. Learnings (.recursive/learnings/) accumulate 90+ hard-won patterns — "mypy rejects .get() on required TypedDict fields", "sessions die at 500 max turns without warning."

Feature Builder (Loop 2)

Beyond hardening — plan, decompose, build, and test new features overnight. Profiles the target repo, decomposes work into waves, coordinates sub-agents, and maintains build state for resumable workflows.

Security hardening

After-task injection protection via environment variables, PR title sanitization against adversarial input, XML boundary escaping for pentest reports, and a watchdog service with rate-limited auto-restart.

Requirements

Python 3.9+GitGitHub CLI (gh)Codex CLI or Claude CLItmux (optional)

Roadmap

Pluggable agent adapters (Codex, Claude)

Runner-enforced guard rails

Diff scoring (1-10) via Scoring Engine v2

Anti-tunnel-vision steering

Multi-repo support

6 operators with signal-driven role selection

Handoffs + learnings (session memory)

Self-evaluation + self-maintaining control plane

Loop 1: Hardening

Loop 2: Feature Builder pipeline

Feature CLI (plan, build, status, resume)

Prompt injection protection

Self-modification guard + snapshot recovery

Cost tracking + budget limits

Configurable models + effort per agent

Interactive daemon setup

Log rotation and cleanup

Watchdog service with rate-limited auto-restart

Security hardening (injection, sanitization, escaping)

5-agent sub-agent review pipeline

GitHub Issues auto-sync to task queue

Autonomy measurement and dependency elimination

Real-repo evaluation fidelity on rejected runs

Automated release tagging and changelog/tracker updates

Budget limiter triple-failure fix (daemon cost tracking)

Task queue hygiene and session-index fidelity

Monitoring and alerting integrations

847

Tests passing

155+

PRs merged

92%

Vision complete

Modules

Open source. MIT license.

847 tests, 155+ merged PRs, 30 modules. An engineering system, not a script. Install in 30 seconds and let it run tonight.

View on GitHub Read the blog post Browse all skills