motspilot Documentation

Multi-model consensus (3 AIs) + 5-phase pipeline for building features in existing codebases — safely, with human approval at every step.

Framework-Agnostic Security-First Human-in-the-Loop Checkpoint & Resume
3
AI Models in Consensus
5
Pipeline Phases
6
Prompt Engineering Techniques
0
Lines Written Before Reading

What Changes When You Add Structure

Most AI coding tools generate first and fix later. motspilot flips that — it understands your codebase before touching a single file.

Typical AI-assisted workflow
  • Prompt an AI, paste output, hope it works in your codebase
  • One model's blind spots become your blind spots
  • No architecture phase — straight to code that doesn't fit existing patterns
  • Security testing is an afterthought (if it happens at all)
  • No verification — the developer who wrote it also reviews it
  • Context lost between sessions — start over each time
VS
With motspilot
  • AI reads your codebase first, then designs changes that fit naturally
  • Three models cross-check each other — consensus catches what one model misses
  • Dedicated architecture phase maps blast radius before any code is written
  • Security tested first — auth bypass, CSRF, IDOR checked before happy paths
  • Independent verification phase reads actual code and quotes findings line-by-line
  • Artifacts persist on disk — checkpoint and resume across sessions
motspilot.sh
The Filing System

Shell utility that manages named tasks, requirements, state, checkpoints, and artifacts. Does not invoke AI directly.

Claude Code
The Engine

AI orchestrator that reads work orders and runs each pipeline phase as a Task subagent. You approve each phase before it proceeds.

Pipeline at a Glance

Requirements Consensus Architecture Development Testing Verification Delivery

Consensus queries 3 AIs (Claude + GPT-4o + Gemini) before pipeline starts — human approval gates between phases

Quick Start

1
Initialize (first time only)
$ ./motspilot.sh init
# Edit .motspilot/config with your language, framework, and test command
2
Create a task
$ ./motspilot.sh go --task=csv-export "Add CSV export to reports"
3
Run the pipeline (in Claude Code chat)
run motspilot pipeline

Task Lifecycle

Create Task
$ ./motspilot.sh go --task=my-feature "Description of what to build"
Creates task directory, requirements template, and work order. Status: pending
Fill Requirements

Edit tasks/<name>/01_requirements.md — describe what you want built, acceptance criteria, constraints.

Run Pipeline
run motspilot pipeline
Runs multi-model consensus (3 AIs), then Claude Code orchestrates 5 phases. Status: in_progress
Approve Each Phase

Review summary after each phase → [A]pprove, [R]eject with feedback, or [V]iew full artifact.

Auto-Archive

When delivery is approved, task moves to archive/ automatically. Status: completed

Reactivate (if needed)
$ ./motspilot.sh reactivate my-feature
$ ./motspilot.sh go --task=my-feature --from=development
Restores from archive for bug fixes. Re-runs from specified phase.

Multi-Task Scenarios

New Feature
$ ./motspilot.sh go --task=new-feature "description"
# Previous task paused automatically
Resume Paused Task
$ ./motspilot.sh go --task=old-task --from=development
Bug Fix on Archived Task
$ ./motspilot.sh reactivate task-name
$ ./motspilot.sh go --task=task-name --from=development
See All Tasks
$ ./motspilot.sh tasks # active
$ ./motspilot.sh tasks --all # + archived

Checkpointing & Resumption

After each approved phase, a checkpoint is saved to tasks/<name>/checkpoint.

  • Format: <phase>|approved (e.g., development|approved)
  • Resume from any phase: ./motspilot.sh go --task=name --from=testing
  • State persists on disk — works across sessions
  • Artifacts serve as the persistent context bridge between sessions

Before the phases begin, a multi-model consensus step queries 3 AIs in parallel and synthesizes a unified starting context. Each phase then uses a thinking framework (not a checklist) and receives the framework guide for your project's framework. Prompt engineering techniques applied across all phases: XML-tagged prompt assembly, investigate-before-acting guards, anti-overengineering clauses, phase-specific self-checks, few-shot examples, and quote-grounded findings.

01
Requirements — You write this
No code

Describe what you want built: user stories, acceptance criteria, data requirements, UI specs, security constraints, and what's out of scope.

Artifact: 01_requirements.md
Multi-Model Consensus — 3 AIs before pipeline starts
Automatic

Fans out the task requirements to Claude + GPT-4o + Gemini in parallel, collects all responses, then synthesizes via a 3-pass process (Extract → Reconcile → Synthesize) with completeness verification.

Output Files (per task)
consensus/01_claude.md — Full Claude response
consensus/02_gpt4o.md — Full GPT-4o response
consensus/03_gemini.md — Full Gemini response
consensus/04_synthesis.md — Judge-merged unified output
consensus/05_differences.md — Unique contributions per AI (what each brought that others missed)
Standalone script: bin/consensus.php — PHP 8+, no framework dependencies. API keys in .env. Fault-tolerant: works with 1-3 models. Dynamic timeouts scale with prompt length (60–300s).
02
Architecture — Design only
No code

Explores the codebase, understands existing patterns, traces blast radius, and produces a comprehensive design document. Includes <investigate_before_designing> guard, <anti_overengineering> clause, assumption-gating, and <self_check> verification before finalizing.

  1. Start with the person — Who uses this? What's their goal? What do they see?
  2. Get a feel for the codebase — Read landmark files. What conventions exist? What would look natural?
  3. Trace the blast radius — What could break? Map dependencies. Every line is a potential failure.
  4. Design by asking questions — Data, logic, security, failure. Not by following templates.
  5. Consider alternatives — Think of 2+ approaches, explain why you chose one.

  1. User Experience
  2. Codebase Analysis
  3. Blast Radius
  4. Data Design
  5. Component Design
  1. Security Design
  2. Failure Modes
  3. Alternatives Considered
  4. File Map (new + modified)
  5. Rollback Plan
Artifact: 02_architecture.md
03
Development — Build in tiny loops
Writes code

Implements the feature following the architecture document. Creates and modifies actual files in the codebase. Includes <investigate_before_coding> guard, <anti_overengineering> clause, follow-through policy, and <self_check> verification.

Loop 1 — Foundation: Schema/migration → model/entity → relationships → verify

Loop 2 — Logic: Service method → test → green → next method → test → green

Loop 3 — Interface: Controller/handler → template/view → routes → verify in browser

  • Read every file you'll modify before changing it
  • Surgical additions only — never reformat or restructure
  • Match the existing coding style exactly
  • One logical action per migration file
  • Legacy patterns: work with them, don't refactor
  • Record test baseline before any changes
Artifact: 03_development.md
04
Testing — Security-first
Writes code

Writes comprehensive tests starting with what scares you most, not what's easiest. Includes <investigate_before_testing> guard, few-shot <example> blocks for test naming, and <self_check> verification. Falls back to manual test checklists when no test framework exists.

Test Priority Order
  1. Integration safety — existing tests still pass
  2. Security — auth bypass, CSRF, mass assignment, IDOR, XSS
  3. Business logic edge cases — service layer decisions
  4. Happy path — basic "it works" confirmation
  5. Error paths — validation, duplicates, missing data
Artifact: 04_testing.md
05
Verification — Skeptical senior reviewer
No code

Reads actual code (not reports), traces data flows, checks for security issues, verifies framework API correctness. Includes <investigate_before_judging> guard requiring quoted code for every finding. Completeness contract: must read every file. Requirements coverage matrix (MET/NOT MET) per requirement. <self_check> verification.

Verification Checklist
  1. Did anything existing break? (full test suite)
  2. Does the feature meet requirements? (every criterion)
  3. Is the code secure? (validation, escaping, auth, IDOR, CSRF)
  4. Is the code clean? (readable, maintainable)
  5. Correct framework API for the project's version?
Artifact: 05_verification.md — READY or NOT READY
06
Delivery — Safe and reversible
No code

Produces deployment guide with copy-paste commands, rollback steps, smoke tests, and monitoring instructions. Includes <investigate_before_documenting> guard and <self_check> verification. Surfaces unresolved verification issues with [VERIFY] markers.

  • What changed (human summary)
  • Files (new, modified, deleted)
  • Deployment steps
  • Rollback steps
  • Configuration changes
  • Git commit message
  • What to watch after deploy
  • Known limitations / future work
Artifact: 06_delivery.md

Project Configuration

Edit .motspilot/config — this file is sourced by motspilot.sh and read by the orchestrator.

Key Description Example Default
PROJECT_ROOT Path to project root (relative to motspilot/ or absolute) ".." ..
LANGUAGE Programming language "php" "" (empty)
LANGUAGE_VERSION Language version "8.2" "" (empty)
FRAMEWORK Framework name — loads matching guide from prompts/frameworks/ "cakephp" "" (empty)
AUTO_APPROVE Skip approval gates between phases "all" or "architecture,delivery" "none"
MAX_RETRIES Retry budget per phase on failure 2 2
TEST_CMD How to run tests "pytest" "" (empty)
APP_URL URL for smoke testing in verification "http://localhost:3000" "http://localhost:8080"
DEPLOY_CMD Deployment command "./deploy.sh" "echo 'Deploy not configured'"

Example Configurations

CakePHP Project
LANGUAGE="php"
LANGUAGE_VERSION="8.2"
FRAMEWORK="cakephp"
TEST_CMD="./vendor/bin/phpunit"
APP_URL="http://localhost:8080"
Django Project
LANGUAGE="python"
LANGUAGE_VERSION="3.12"
FRAMEWORK="django"
TEST_CMD="python manage.py test"
APP_URL="http://localhost:8000"
Next.js Project
LANGUAGE="typescript"
LANGUAGE_VERSION="5.3"
FRAMEWORK="nextjs"
TEST_CMD="npm test"
APP_URL="http://localhost:3000"
Rails Project
LANGUAGE="ruby"
LANGUAGE_VERSION="3.3"
FRAMEWORK="rails"
TEST_CMD="bundle exec rspec"
APP_URL="http://localhost:3000"

Command Reference

Command Description
./motspilot.sh init First-time setup — creates config and directory structure
./motspilot.sh go --task=<name> "description" Create task + generate work order
./motspilot.sh go --task=<name> Re-prepare existing task (regenerate work order)
./motspilot.sh go --task=<name> --from=<phase> Prepare to re-run from a specific phase
./motspilot.sh tasks List active tasks with phase progress bars
./motspilot.sh tasks --all List active + archived tasks
./motspilot.sh status [--task=<name>] Detailed task status with artifact sizes
./motspilot.sh archive --task=<name> Manually archive a completed task
./motspilot.sh reactivate <name> Restore task from archive for bug fixes
./motspilot.sh reset --task=<name> Delete phase artifacts (keeps requirements)
./motspilot.sh view <phase> [--task=<name>] View a phase artifact in terminal
Phase Names & Shortcuts
Full Phase Names
architecture · development · testing · verification · delivery
View Shortcuts
req · arch · dev · test · verify · wo (workorder)

Framework Guides

Framework-specific knowledge lives in prompts/frameworks/<name>.md. When a guide exists for your configured FRAMEWORK, it's automatically included in every subagent prompt alongside the thinking framework.

Available Guides
cakephp.md
CakePHP 4.x — Version reference, naming conventions, migration/entity/table/service/controller/template/route/test patterns, verification checks, deployment commands
Adding a New Framework Guide
1
Create motspilot/prompts/frameworks/<framework-name>.md
2
Set FRAMEWORK="<framework-name>" in .motspilot/config
3
The pipeline will automatically include it in all phases
Recommended Sections for a Framework Guide
Section Purpose Used by Phase
Version Reference Correct vs incorrect API for the framework version All phases
Naming Conventions How files, classes, routes are named Architecture, Development
Files to Explore Key landmark files for codebase understanding Architecture
Migration / Schema Patterns How to create and rollback schema changes Development
Model / Entity Patterns Access control, validation, relationships Development
Service / Logic Patterns Where business logic lives Development
Controller / Handler Patterns Request handling conventions Development
Template / View Patterns Output escaping, form helpers, layout Development, Verification
Test Patterns Test setup, fixtures/factories, security test examples Testing
Verification Checks Grep patterns to catch common mistakes Verification
Deployment Commands Deploy, rollback, cache clear commands Delivery

Core Philosophy

These principles guide every phase of the pipeline. They're not rules to follow — they're a way of thinking.

Start with the person

Think about who uses the feature before touching code. What's their goal? Their emotional state? What do they see when things go wrong?

Explore before building

Read existing code. Understand patterns. Don't assume anything about the codebase structure. Your new code should feel like it was always there.

Trace the blast radius

Before changing anything, ask: "What could this break?" Map every dependency between your feature and existing code.

Build in tiny loops

Write a little → verify it works → write more. Never write 500 lines and hope for the best.

Security mindset

Think like an attacker at every step. Test auth bypass, CSRF, IDOR, mass assignment, and XSS before testing the happy path.

Never greenfield

Always integrating into existing apps. Match existing patterns. Work with legacy code, don't refactor it.

One action per migration

Never combine unrelated changes in a single migration. Each migration should do exactly one thing — easier to debug, rollback, and review.

Human in the loop

Approval gates between every phase. You review, reject, or approve. The AI proposes, you decide.

Prompt Architecture

Each subagent receives a layered prompt assembled by the orchestrator using XML tags for unambiguous section parsing:

<motspilot_phase name="[phase name]">

<thinking_framework>
[Full contents of prompts/<phase>.md]
Includes <investigate_before_*>, <anti_overengineering>, <self_check>, <example> blocks
</thinking_framework>

<framework_guide>
[Full contents of prompts/frameworks/<FRAMEWORK>.md]
Framework-specific patterns, API reference, verification checks
</framework_guide>

<project_config>
Language, version, framework, test command, app URL
</project_config>

<requirements>
Full contents of 01_requirements.md
</requirements>

<consensus>
Multi-model synthesis (omitted if unavailable)
</consensus>

<previous_phases>
Cumulative context from all completed prior phases
</previous_phases>

<task>
Phase-specific instructions
</task>

</motspilot_phase>

Why XML tags? Claude's prompt engineering best practices recommend XML tags over plain-text markers for unambiguous parsing of multi-section prompts. Each section is clearly delineated, reducing misinterpretation when mixing instructions, context, examples, and variable inputs.

Built For Developers Who Ship Into Production

motspilot isn't for greenfield prototypes. It's for adding features to real applications where things can break.

🛠
Solo Developer

You maintain a production app alone. The pipeline gives you a second pair of eyes — architecture review, security testing, and verification you'd otherwise skip.

Side projects Freelance Startups
👥
Small Team Lead

Your team uses AI for coding but quality is inconsistent. motspilot standardizes the process — same phases, same security checks, every time.

Consistency Code review Standards
🔒
Security-Conscious Dev

You've seen AI-generated code ship with auth bypasses and SQL injection. motspilot tests security before the happy path — by design, not afterthought.

OWASP Auth testing CSRF/IDOR
📚
Legacy Codebase Owner

Your app has years of conventions baked in. motspilot explores first, matches your patterns, and integrates — no random refactors or style conflicts.

Existing patterns Migration safety Rollback

Frequently Asked Questions

Common questions about how motspilot works and what it expects.

Yes. The pipeline is framework-agnostic — it works with PHP, Python, TypeScript, Ruby, Go, or anything else. You set your LANGUAGE and FRAMEWORK in the config, and the pipeline adapts. Framework-specific guides (like the included CakePHP guide) add deeper knowledge, but the pipeline runs without them.

Before the pipeline starts, your requirements are sent to Claude, GPT-4o, and Gemini in parallel. A synthesis pass merges their responses, highlights where they agree, and surfaces unique insights each model contributed. This reduces single-model blind spots and gives the pipeline a stronger starting context.

No. The consensus script is fault-tolerant — it works with one, two, or three models. If a key is missing or a model times out, the pipeline continues with whatever responses it gets. You need at least one API key (Anthropic is used by Claude Code anyway).

Yes. Use --from=<phase> to start from any phase. If you've already done architecture and just need to re-run development after a bug fix, use ./motspilot.sh go --task=name --from=development. Previous phase artifacts are preserved and fed as context.

Nothing is lost. Every approved phase writes a checkpoint to disk. When you come back, the pipeline knows which phases are done and resumes from where you left off. Artifacts serve as the persistent context bridge between sessions.

Neither. motspilot lives inside your project as a directory. You interact with it through Claude Code's CLI. The shell script (motspilot.sh) manages files and state; Claude Code is the AI engine that runs the phases. No IDE plugins, no SaaS dependency.

When you ask Claude directly, it generates code based on your prompt alone. motspilot adds structure: three models cross-check your requirements, an architecture phase maps impact before coding starts, security gets tested before happy paths, an independent verifier reads the actual code, and everything is checkpointed. It's the difference between a sketch and a blueprint.