Open Framework · 9 Core Principles

Principled Scaffolding
for Reliable AI Agents

A structured framework for designing agentic workflows that are deterministic, verifiable, and scalable. Move beyond prompt engineering — engineer the process.

Workflow.md — Agentic Code Assurance
# Root Workflow Decision Tree
 
Q1 Does docs/codearch/overall_report.md exist and pass validation?
  → YES: proceed to Q2
  → NO: execute Stage 1 (Code Cognition)
 
Q2 Does docs/risk_tasks/task_list.md exist and is non-empty?
  → YES: proceed to Q3
  → NO: execute Stage 2 (Risk Assessment)
 
Q3 Is build_success=true AND tests_runnable=true?
  → YES: execute Stage 3 (Bug Remediation)
  → NO: ⛔ HALT — circuit breaker triggered
9
Core Design Principles
3
Workflow Stages
0%
Ambiguous Decision Points
Domains Applicable

9 Core Design Principles

Each principle addresses a specific failure mode of naive agentic systems. Together, they form a complete framework for reliable, production-grade workflows.

PRINCIPLE 01 · PROCESS MANAGEMENT
🌲
Deterministic Decision Trees
Every stage entry is governed by an explicit Q&A decision tree. Each question has a binary Yes/No outcome determined by verifiable file-system checks or shell commands — never by the LLM's subjective judgment. This eliminates ambiguity, enables idempotent re-runs, and enforces prerequisite ordering.
Eliminates Ambiguity
PRINCIPLE 02 · PROCESS MANAGEMENT
📄
Explicit State & Contracts
All workflow state is persisted to the file system. Inter-stage communication is governed by formal contracts — predefined directory structures and file schemas. docs/codearch/ is the contract between Stage 1 and Stage 2; task_list.md is the contract between Stage 2 and Stage 3.
Observable & Decoupled
PRINCIPLE 03 · COGNITIVE ARCHITECTURE
🎯
Minimum Context Principle
Workflow instructions explicitly forbid loading all documents upfront. Summary reports link to detailed sub-reports. Skills are loaded on-demand only when the current Phase requires them. A 100-module project is analyzed with ~12k tokens per query instead of millions.
Scales to Any Codebase
PRINCIPLE 04 · COGNITIVE ARCHITECTURE
🏗️
Layered Abstraction
Instructions are organized into four layers: Workflow (the Why — orchestration), Phase (the What — goals & acceptance criteria), Skill (the How — step-by-step execution), and Definitions (shared knowledge, referenced by link). Modify any layer without touching the others.
Maintainable & Reusable
PRINCIPLE 05 · QUALITY ASSURANCE
🧠
Knowledge-Driven Analysis
Stage 1 builds a rich semantic knowledge base — capturing ownership models, data flow paths, concurrency invariants. Stage 2 then uses this knowledge to perform targeted, semantic analysis rather than keyword scanning. To find a data race, the agent reads the declared mutex from the module report and traces unprotected access paths.
Low False-Positive Rate
PRINCIPLE 06 · QUALITY ASSURANCE
🧪
Test-Driven Verification & Remediation
A bug is only "confirmed" when a failing test case reproduces it. A fix is only "accepted" when the verification test passes AND the full regression suite shows no new failures. Confirmed tests are permanently integrated into the test suite, accumulating coverage with every fix.
Objective Correctness
PRINCIPLE 07 · QUALITY ASSURANCE
🔄
Iteration & Convergence Control
Controlled feedback loops allow the workflow to self-correct. After module decomposition, a structured review checks quality and can trigger a rollback. After a remediation cycle, a new, deeper cycle can target high-risk modules. Explicit convergence criteria (iteration caps, empty change lists) prevent infinite loops.
Progressive Refinement
PRINCIPLE 08 · QUALITY ASSURANCE
🚦
Gatekeeping & Circuit Breakers
Hard quality gates at critical junctures prevent wasted effort. If build_success=false or tests_runnable=false, the entire workflow halts immediately with a clear error report. There is no point analyzing code that cannot be compiled or tested.
Fail Fast & Clearly
PRINCIPLE 09 · CROSS-STAGE COORDINATION
🔁
Feedback Loop & Self-Healing Knowledge
Later stages can correct the knowledge base built in earlier stages. When Stage 2 or Stage 3 discovers that a module report is inaccurate, a unified feedback protocol allows immediate in-place correction. The knowledge base heals itself as the agent digs deeper, rather than accumulating stale information.
Self-Correcting System

Principles in Detail

A comprehensive examination of each design principle — the rationale, design significance, and how it manifests in a real workflow.

From the Reference Implementation: All examples and file references below are drawn from the Agentic Code Assurance open-source project — a concrete instantiation of these 9 principles for C/C++ code quality analysis. Links point directly to the relevant source files so you can inspect the real implementation.
PRINCIPLE 01 · PROCESS MANAGEMENT
🌲 Deterministic Decision Trees
The core scheduling mechanism is a deterministic decision tree. Every stage's entry Workflow.md begins with a quick decision tree (Q1, Q2, Q3...) where each question provides clear Yes/No branches that map directly to specific actions. Each question is paired with an independent "Judgment Basis" section containing exhaustive checklists and executable verification commands.
Design Significance
  • Eliminates ambiguity: Agent decisions are strictly constrained to objective verification of checklists and command outputs — no open-ended reasoning required. This provides better compatibility across different model versions and capabilities.
  • Breakpoint resumption: Decisions are based on persistent file-system artifacts. The workflow can be interrupted at any step; on restart, re-traversing the decision tree automatically locates the interruption point, achieving idempotency.
  • Enforced prerequisites: The sequential ordering (Q1 → Q2 → Q3) guarantees task dependencies — e.g., bug fixing (Stage 3) cannot start until risk assessment (Stage 2) is complete.
  • On-demand navigation: The decision tree enables the agent to load only documents relevant to the current task, saving token consumption and improving accuracy.
Example from Reference Implementation

In 1-code-cognition/Workflow.md, Q1 "Does the overall report exist?" has four explicit checks: (1) docs/codearch/overall_report.md exists; (2) it contains a non-empty "Project Goals" section; (3) it has "Inputs", "Outputs", "Main Flow" sections; (4) it has an "Information Sources" section. If any check fails, Phase 01 is triggered — zero ambiguity.

PRINCIPLE 02 · PROCESS MANAGEMENT
📄 Explicit State & Contracts
All workflow state is explicitly persisted to the file system. Inter-stage handoffs strictly follow predefined input/output "contracts" — docs/codearch/ represents Stage 1 completion, docs/risk_tasks/ represents Stage 2 output, and docs/remediation/ represents Stage 3 artifacts. Every key artifact has a corresponding structure definition document specifying required sections, fields, and downstream usage conventions.
Design Significance
  • Observability: Any person or agent can inspect the file system to understand workflow progress. Issues can be precisely traced back to the non-compliant artifact.
  • Stage decoupling: Clear contracts allow stages to be independently developed and improved. As long as input/output agreements are honored, internal implementations (models, skills) can be swapped without affecting other stages.
  • Collaborative execution: Different agents or human experts can work in parallel — one agent produces the knowledge base in Stage 1, another consumes it in Stage 2.
Example from Reference Implementation

task_output_structure.md defines not only the required fields for each task record (location, description, risk type, related module, reasoning chain, excluded protections, preconditions to verify, impact level), but also a "Downstream Usage Conventions" section that explicitly guides Stage 3 on how to consume this information — how to locate code, understand reasoning chains, and design targeted verification tests, ensuring lossless and efficient cross-stage information transfer.

PRINCIPLE 03 · COGNITIVE ARCHITECTURE
🎯 Minimum Context Principle
Under the constraint of limited LLM context windows, the minimum context principle is the key design enabling complex workflows to run efficiently. It operates at three levels: at the document level, summary reports contain only module lists and links without embedding full reports; at the instruction level, Workflow and Phase documents explicitly forbid pre-loading all Skill documents; at the execution level, the agent processes modules in small batches, loading only relevant reports per batch.
Design Significance
  • Reduced token consumption: Ensures each LLM interaction stays within context window limits, preventing truncation or execution failures.
  • Improved focus: Limiting per-interaction information forces the agent to concentrate fully on the current task, improving analysis depth and quality.
  • Faster execution: Loading and processing less data means faster execution cycles and lower cost.
Example from Reference Implementation

A 100-module project might exceed hundreds of thousands of tokens if all module reports are combined. Under this principle, when analyzing a specific bug, the agent loads only overall_report.md (~2k tokens) and 1–2 relevant module reports (~5k tokens each) — keeping total consumption extremely low at approximately 12k tokens per query.

PRINCIPLE 04 · COGNITIVE ARCHITECTURE
🏗️ Layered Abstraction & Shared Definitions
The workflow achieves separation of concerns through a Workflow → Phase → Skill → Definition multi-layer structure, while eliminating redundancy through cross-stage shared definitions. Workflow (Why) defines orchestration; Phase (What) defines goals and acceptance criteria; Skill (How) provides step-by-step execution; Definitions (Knowledge) store stable conventions referenced by link from any layer.
Design Significance
  • Maintainability: Modifying specific operations requires updating only the corresponding Skill. Adjusting stage goals doesn't require changing lower-level implementations — consistent with interface-implementation separation in software engineering.
  • Readability: Humans or agents can drill down layer by layer — Workflow for the big picture, Phase for stage details, Skill for specific operations.
  • Consistency & reuse: Shared definitions ensure uniform terminology, operational standards, and quality criteria throughout the workflow, avoiding inconsistencies from duplicated maintenance.
Example from Reference Implementation

02-review.md (Phase) states the task is "perform deep review." The review method is detailed in skill-02-review.md (Skill) — covering batching strategy, depth determination, and pattern application. "Review patterns" are abstracted into review_patterns.md (Definition), loaded on demand by the Skill. Feedback conventions are defined centrally in the root definitions/feedback_protocol.md, referenced by all stages via link.

PRINCIPLE 05 · QUALITY ASSURANCE
🧠 Knowledge-Driven Analysis
The core analysis strategy is "deep understanding first, targeted analysis second" — not blind keyword scanning. Stage 1 requires module reports to capture deep semantic information (key data flow paths, lifecycle and ownership models, concurrency invariants). Stage 2's review patterns then guide the agent along code paths using this semantic context, performing path-aware analysis rather than pattern matching.
Design Significance
  • Low false-positive rate: Traditional static analysis produces excessive false positives due to lack of understanding of code intent and ownership models. Knowledge-driven methods let the agent understand contexts like "pointer ownership has been transferred" or "shared variable is protected by a specific lock."
  • Deep logic defect discovery: Semantic path-tracing can reveal issues that simple pattern matching cannot — deadlocks (inconsistent lock acquisition order across paths), use-after-free (object destroyed before callback fires).
  • Multi-dimensional risk coverage: Review patterns cover security and correctness (M-1 through E-3), performance (PERF-1 through PERF-3), maintainability (MAINT-1 through MAINT-3), and portability (PORT-1 through PORT-2).
Example from Reference Implementation

When reviewing concurrency race conditions (Pattern C-1: Lock Order Consistency), the agent does not simply search for lock keywords. Instead, it loads "concurrency invariants" from the module report, identifies the agreed lock acquisition order (e.g., lock A before lock B), then searches the code for reverse acquisition paths. This agreement-based check is far more precise than undirected scanning.

PRINCIPLE 06 · QUALITY ASSURANCE
🧪 Test-Driven Verification & Remediation
Stage 3 (Bug Remediation) adopts a test-driven methodology to ensure objective verification and safe remediation. A bug is only "confirmed" when a failing test reproduces it. A fix is only "accepted" when the verification test passes AND the full regression suite shows no new failures. Confirmed tests are permanently integrated into the test suite, and impact levels (Critical / High / Medium / Low) guide fix priority.
Design Significance
  • Objective verification: Transforms "does the bug exist?" from the agent's subjective judgment to an objective test result (Pass/Fail), dramatically increasing reliability.
  • Side-effect prevention: Mandatory full regression testing ensures fixes don't break other functionality.
  • Accumulated test assets: Effective verification tests are permanently integrated, continuously enhancing project test coverage in a positive feedback loop.
Example from Reference Implementation

For a suspected buffer overflow, the agent writes a test_buffer_overflow test passing an overly long string. If the program crashes, the bug is "confirmed." After the fix (e.g., adding length validation), the verification test passes. A full regression suite run confirms no side effects, and the test is permanently integrated into the official test suite.

PRINCIPLE 07 · QUALITY ASSURANCE
🔄 Iteration & Convergence Control
The workflow transforms a linear process into a self-improving closed loop through two layers of iteration. Within Stage 1, a decomposition review checks module partitioning quality and can trigger rollback. Between Stages 2 and 3, iterative deepening narrows scope with each round — the second round focuses on high-risk modules identified in the first, the third on cross-module interaction verification. Explicit convergence criteria (iteration caps, empty change lists) prevent infinite loops.
Design Significance
  • Progressive refinement: For complex projects, a perfect first-pass analysis is nearly impossible. Iteration allows starting from an initial version and converging toward high quality through feedback and correction.
  • Resource focusing: Iterative deepening avoids wasting compute on low-risk areas, concentrating analysis resources where they yield the most value.
  • Infinite loop prevention: Explicit convergence criteria (iteration caps of 2–3 rounds, empty task lists) ensure the workflow always terminates.
Example from Reference Implementation

After the agent's first module decomposition, the structured review discovers that the utils module is overly broad. The review fails, generating a change list: "split utils into string_utils, net_utils, math_utils." The workflow rolls back to Phase 02 to regenerate reports for the three new modules, then reviews again — repeating until the decomposition is sound.

PRINCIPLE 08 · QUALITY ASSURANCE
🚦 Gatekeeping & Circuit Breakers
Hard quality gates at critical junctures prevent the workflow from investing effort on an unstable foundation. The canonical gate is in Stage 1 Phase 03 (Build & Test System): if the project cannot be compiled (build_success=false) or unit tests cannot run (tests_runnable=false), the entire workflow halts immediately with a clear error report and blocking reason.
Design Significance
  • Fail fast: Performing code analysis and bug fixing on a project that cannot compile or run tests is meaningless and resource-wasteful. The gate ensures the agent only proceeds when basic conditions are met.
  • Clear accountability: Halting exposes the problem clearly — "environment configuration issue" or "basic build broken" — prompting the user to fix foundational issues before re-executing.
Example from Reference Implementation

During Stage 1 Phase 03, the agent attempts compilation but fails due to a missing dependency. build_and_tests.md marks build_success as false. The decision tree evaluates Q3 as "No," triggering the hard gate. The agent reports the error and halts the workflow, waiting for the user to fix the build environment before re-executing.

PRINCIPLE 09 · CROSS-STAGE COORDINATION
🔁 Feedback Loop & Self-Healing Knowledge
A unified feedback protocol enables later stages to correct the knowledge base built in earlier stages. When Stage 2 or Stage 3 discovers discrepancies between documentation and actual code (e.g., inaccurate module descriptions, missing external dependencies, misdrawn module boundaries), the agent updates the corresponding knowledge base documents in-place following standardized feedback operations, optionally recording feedback type and modification summary in the change log.
Design Significance
  • Self-healing knowledge base: Understanding gaps discovered during deeper code exploration in later stages are immediately corrected, rather than accumulating stale information. This maintains the timeliness and accuracy of the knowledge base throughout the workflow.
  • Standardized operations: A unified feedback protocol prevents each stage from modifying the knowledge base in ad-hoc ways, ensuring modifications are consistent and auditable.
Example from Reference Implementation

During Stage 2 review, the agent discovers that a module's concurrency model description doesn't match the actual code — the report marks it as "single-threaded," but the code uses std::thread. The agent immediately updates the module report's "Concurrency Model" section and records this feedback in the change log, keeping the knowledge base accurate for downstream consumers.

Three-Stage Pipeline

The Agentic Code Assurance workflow instantiates all 9 principles across three sequential, contract-bound stages.

STAGE 01
Code Cognition
Build a rich semantic knowledge base of the target project. Decompose it into modules, then generate detailed reports capturing responsibilities, API surfaces, memory ownership models, data flow paths, and concurrency invariants. A structured decomposition review ensures quality before proceeding.
→ docs/codearch/
STAGE 02
Risk Assessment
Consume the knowledge base to perform targeted, semantic bug hunting. Apply a library of review patterns (memory safety, concurrency, I/O, error handling, performance) guided by the semantic context from Stage 1. Output a structured, prioritized task list of confirmed risk items.
→ docs/risk_tasks/task_list.md
STAGE 03
Bug Remediation
For each task: first write a failing test to confirm the bug exists, then implement the fix, then verify the test passes and the full regression suite is clean. Confirmed tests are permanently integrated. The circuit breaker ensures this stage only runs on a compilable, testable project.
→ docs/remediation/
1-code-cognition/Workflow.md — Decision Tree Example Markdown
## Quick Decision Tree Q1: Does `docs/codearch/overall_report.md` exist and pass validation? Checklist: - [ ] File exists at the specified path - [ ] Contains a non-empty "Project Goals" section - [ ] Contains "Inputs", "Outputs", "Main Flow" sections - [ ] Contains "Information Sources" section → ALL PASS: proceed to Q2 → ANY FAIL: execute Phase 01 (Project Overview) Q2: Do module reports exist for all modules in the overall report? Command: grep -c "^## " docs/codearch/overall_report.md Check: count of module files in docs/codearch/modules/ matches → PASS: proceed to Q2b → FAIL: execute Phase 02 (Module Analysis) Q2b: Does the decomposition pass the structured review? Run: skill-decomposition-review checklist → PASS: proceed to Q3 → FAIL: rollback per decomposition_review.md rules

How It Compares

Agentic workflows with principled scaffolding address the fundamental limitations of existing approaches.

Capability Traditional Static Analysis
(Coverity, Clang-Tidy)
Monolithic LLM Agent
(SWE-agent, OpenDevin)
Principled Agentic Workflow
(This Framework)
Semantic Understanding Pattern-based only ~ Context-limited Deep semantic knowledge base
False Positive Rate High — no intent model Medium — hallucination risk Low — knowledge-grounded analysis
Deterministic Behavior Rule-based Non-deterministic Decision-tree driven
Verifiable Correctness ~ Alerts only No test enforcement TDD-based verification
Scales to Large Codebases File-by-file Context window limits Minimum context principle
Resumable / Idempotent Stateless per session File-system state
Detects Logic Bugs Syntax/pattern only ~ Unreliable Semantic path tracing
Debuggable Process ~ Report only Black box Every decision is traceable
Accumulates Test Assets Tests integrated permanently

See It in Action

Explore a real-world implementation of these principles applied to C/C++ code quality and test refactoring.

⚙️
GITHUB · OPEN SOURCE
refactor-skills-for-tests
A practical implementation of the Agentic Workflow framework applied to C/C++ projects. Demonstrates how the 9 design principles translate into a concrete, file-based workflow specification that any LLM-based coding agent (Claude Code, OpenCode, etc.) can execute to analyze and improve code quality.
C/C++ Agentic Workflow Code Quality Claude Code OpenCode
Open on GitHub →
Recommended directory layout Shell
# Root entry point Workflow.md # Global decision tree & stage orchestration # Stage 1: Code Cognition 1-code-cognition/ Workflow.md # Stage decision tree (Q1 → Q2 → Q2b → Q3) phases/ # 01-overview, 02-modules, 03-build-tests, 04-reports skills/ # Detailed how-to for each phase task definitions/ # complexity_levels, output_structure, cpp_notes… # Stage 2: Risk Assessment 2-risk-assessment/ Workflow.md phases/ # 01-scope, 02-review, 03-summary skills/ definitions/ # risk_types, review_patterns, task_output_structure # Stage 3: Bug Remediation 3-bug-remediation/ Workflow.md phases/ # 01-verify, 02-fix, 03-regression skills/ definitions/ # remediation_output_structure # Generated knowledge base (output) docs/ codearch/overall_report.md # Stage 1 → Stage 2 contract codearch/modules/ # Per-module semantic reports risk_tasks/task_list.md # Stage 2 → Stage 3 contract remediation/ # Fix reports & integrated tests

Apply This Framework

Adopt these principles to build your own domain-specific agentic workflow in four steps.

1
Define Your Stages & Contracts
Break your task into sequential stages. For each stage, define its input contract (what files must exist and what they must contain) and its output contract (what files it will produce). These contracts are the backbone of your workflow.
Stage 1 Output Contract: docs/knowledge_base/*.md
Stage 2 Input Contract: docs/knowledge_base/ must exist & be valid
Stage 2 Output Contract: docs/task_list.md
2
Write the Decision Tree for Each Stage
For each stage, write a Workflow.md with a Q&A decision tree. Each question must be answerable by a shell command or file check — never by the LLM's opinion. Include the exact verification commands in a "Judgment Basis" subsection.
Q1: Does docs/knowledge_base/index.md exist?
Command: [ -f docs/knowledge_base/index.md ] && echo YES || echo NO
→ YES: proceed to Q2
→ NO: execute Phase 01
3
Decompose into Phases and Skills
For each stage, write Phase documents (what to do, acceptance criteria) and Skill documents (how to do it, step by step). Extract reusable knowledge into Definition documents. Keep Skills focused — they should be loadable on-demand, not preloaded.
4
Add Gatekeepers & Feedback Loops
Identify the critical preconditions for your workflow (e.g., "data must be valid", "environment must be configured"). Add hard circuit breakers that halt the workflow with a clear error if these fail. Add structured review checkpoints that can trigger controlled rollbacks to refine earlier outputs.

Frequently Asked Questions

Is this framework specific to C/C++ code analysis?
No. The 9 design principles are domain-agnostic. The Agentic Code Assurance workflow is one instantiation applied to C/C++ security analysis. The same framework can be applied to scientific literature review, legal contract analysis, data pipeline validation, or any complex multi-step task requiring reliability.
Which LLM agents are compatible with this workflow format?
Any agent that can read files and execute shell commands. This includes Claude Code, OpenCode, SWE-agent, OpenDevin, Devin, Cursor, and similar tools. The workflow is defined in plain Markdown files, making it universally compatible. The agent reads the Workflow.md, follows the decision tree, and loads Phase/Skill documents as instructed.
How does this differ from a simple system prompt?
A system prompt is a monolithic instruction loaded once. This framework is a structured, multi-file specification where instructions are loaded on-demand based on the current state. This enables the Minimum Context Principle, allows the workflow to scale to arbitrarily complex tasks, and makes the process transparent, debuggable, and resumable — none of which are possible with a single prompt.
What is the upfront cost of designing a workflow this way?
Significant domain expertise is required to define the stages, contracts, and review patterns. This is not a plug-and-play solution. It is a framework for experts to encode their process knowledge so that an agent can execute it reliably and repeatedly. The investment pays off when the workflow needs to run many times, or when reliability and verifiability are non-negotiable requirements.
Can multiple agents collaborate on a single workflow?
Yes. Because all state is persisted to the file system via explicit contracts, different agents (or humans) can work on different stages independently. Agent A can execute Stage 1 and produce the knowledge base; Agent B can then consume it to execute Stage 2. The contract-based design enables this kind of parallel and collaborative execution naturally.
What happens when the LLM makes a mistake mid-workflow?
The decision tree is idempotent. If a stage produces incorrect output, the output file either fails the next decision tree check (triggering a re-run) or is corrected via the feedback loop mechanism. The workflow can always be re-entered from the beginning; it will automatically detect the current state and resume from the correct point. No work is lost.

Start Building Reliable Agents

Explore the open-source example, study the workflow structure, and apply these principles to your own domain.