BETA — Сайт у режимі бета-тестування. Можливі помилки та зміни.
UK | EN |
LIVE
Технології 🇺🇸 США

EvanFlow: Test-Driven Feedback Loop Transforms Claude Code Development Workflow

Hacker News evanklem2004 1 переглядів 6 хв читання

New Framework Brings Structured Iteration and Human Control to AI-Assisted Coding

A comprehensive development framework called EvanFlow has been introduced to guide software projects through a disciplined, checkpoint-driven cycle using Claude Code. The system orchestrates 16 integrated skills and two custom subagents, walking ideas from initial brainstorm through final implementation while maintaining human oversight at every critical decision point.

Rather than operating as an autopilot system, EvanFlow functions as a conductor that pauses before major transitions. The framework enforces a clear sequence: brainstorm → plan → execute → test-driven development → iterate → stop for human direction. Git operations never proceed without explicit user approval, and auto-commits are deliberately disabled.

Core Architecture and Safety Mechanisms

The feedback loop builds discipline through repeated checkpoints. Design approval gates the brainstorming phase, plan approval guards the planning phase, and a quality review process precedes any iteration completion. The execution phase stops short of every git operation, waiting for user instruction. No forced ceremony exists—skills function as tools rather than mandatory gatekeepers.

Four hard rules derived from 2025-2026 industry research on agentic coding failures are embedded throughout:

  • Never invent values such as file paths, environment variables, IDs, or function names; the agent halts and requests clarification if uncertain
  • Assertion-correctness warnings flag test cases, as research indicates 62 percent of LLM-generated assertions contain errors
  • Context-drift detection triggers when the system detects repeated questions or contradictory decisions, addressing a failure mode responsible for approximately 65 percent of enterprise AI coding failures
  • Five Failure Modes checklist screens for hallucinated actions, scope creep, cascading errors, context loss, and tool misuse

Skill Organization and Capabilities

Five core skills drive the default loop. evanflow-brainstorming clarifies intent and proposes 2–3 approaches with embedded stress-testing. evanflow-writing-plans structures file organization into bite-sized tasks and offers parallelization when appropriate. evanflow-executing-plans runs tasks sequentially with inline verification. evanflow-tdd enforces vertical-slice test-driven development with one failing test leading to minimal implementation. evanflow-iterate performs self-review with a hard cap of 5 iterations, running quality checks and visual verification of UI changes via headless Chromium.

Eight special-purpose skills address specific needs. evanflow-go serves as the single entry point—users simply say "let's evanflow this" to activate the conductor. evanflow-glossary extracts canonical domain terms into documentation. evanflow-improve-architecture surfaces refactoring opportunities. evanflow-design-interface spawns parallel sub-agents with radically different constraints to compare approaches. evanflow-debug enforces root-cause discipline with explicit hypotheses. evanflow-review handles both code review giving and receiving. evanflow-prd synthesizes product requirement documents from existing context. evanflow-qa converts conversational bug discovery into issue drafts.

Two cross-cutting and meta skills provide support. evanflow-compact manages long-session context through proactive summarization at clean boundaries. The evanflow skill serves as an index documenting shared vocabulary and invocation timing.

Parallel Execution and Integration Testing

For plans containing three or more truly independent units, the loop forks into parallel execution. A dedicated coder handles each unit using vertical-slice TDD with a RED checkpoint ensuring all coders write failing tests before implementation begins. Per-coder overseers perform read-only review without modification capability. An integration overseer runs named integration tests at every touchpoint, with these executable contracts preventing interface drift—both sides must satisfy the same passing tests.

Installation and Deployment

Three installation paths exist in order of recommendation. The plugin marketplace path provides the cleanest integration, automatically installing skills, agents, and the bundled guardrail hook through Claude Code's native plugin system. The npx skills@latest add command offers an alternative that installs skills via CLI without hook automation. A manual copy path grants full control for users preferring to avoid CLI dependencies.

All paths require Claude Code and Bash. The jq utility is necessary for the guardrail hook to parse JSON tool input; installation occurs via package manager on Linux and macOS systems. Optional but recommended is Chromium or Google Chrome for visual verification during iteration phases.

Key Requirements and Customization

Four absolute rules apply to every skill: never auto-commit or auto-stage; never invent values without asking; avoid skill-invocation taxes on ad-hoc questions; and verify completion through quality checks before declaring success. Project-specific customization involves replacing placeholder paths in skills like evanflow-writing-plans, documenting exact typecheck and lint commands in project documentation, and adapting the visual verification step if Chromium remains unavailable.

Every skill is designed as a starting point for adaptation rather than immutable instruction. Users can fork variants for specific vendors or organizational needs, and the framework actively welcomes proposals to reduce ceremony or add evidence-backed improvements.

End-to-End Workflow

Initiating the workflow requires a single user statement: "let's evanflow this—I want to add a feature that does X." The evanflow-go conductor then manages six phases. Phase 0 restates the idea and checks scope. Phase 1 engages brainstorming with design approval checkpoint. Phase 2 creates plans with plan approval checkpoint and optional parallelization assessment. Phase 3 executes sequentially or in parallel via custom subagents. Phase 4 iterates with the Five Failure Modes checklist and 5-iteration hard cap. Phase 5 stops execution, generates a report, and awaits user direction. Throughout, evanflow-compact can activate at clean session boundaries when context becomes heavy.

Foundation and Research Basis

The framework synthesizes concepts from established methodologies including vertical-slice TDD, deep modules vocabulary, the deletion test pattern, design-it-twice approaches, and ubiquitous language principles. Design also incorporates patterns for parallel agent dispatch and code review disciplines. The hard rules derive validation from Anthropic's 2026 Agentic Coding Trends Report, Columbia University's DAPLab research on nine critical failure patterns, and academic findings on assertion correctness in test-driven code generation.

Поділитися

Схожі новини