Open Framework · 9 Core Principles

Principled Scaffolding
for Reliable AI Agents

A structured framework for designing agentic workflows that are deterministic, verifiable, and scalable. Move beyond prompt engineering — engineer the process.

Explore Principles View Example Project

Workflow.md — Agentic Code Assurance

# Root Workflow Decision Tree

Q1 Does docs/codearch/overall_report.md exist and pass validation?

→ YES: proceed to Q2

→ NO: execute Stage 1 (Code Cognition)

Q2 Does docs/risk_tasks/task_list.md exist and is non-empty?

→ YES: proceed to Q3

→ NO: execute Stage 2 (Risk Assessment)

Q3 Is build_success=true AND tests_runnable=true?

→ YES: execute Stage 3 (Bug Remediation)

→ NO: ⛔ HALT — circuit breaker triggered

Core Design Principles

Workflow Stages

Ambiguous Decision Points

∞

Domains Applicable

Deep Dive

Principles in Detail

A comprehensive examination of each design principle — the rationale, design significance, and how it manifests in a real workflow.

From the Reference Implementation: All examples and file references below are drawn from the Agentic Code Assurance open-source project — a concrete instantiation of these 9 principles for C/C++ code quality analysis. Links point directly to the relevant source files so you can inspect the real implementation.

PRINCIPLE 01 · PROCESS MANAGEMENT

🌲 Deterministic Decision Trees

The core scheduling mechanism is a deterministic decision tree. Every stage's entry Workflow.md begins with a quick decision tree (Q1, Q2, Q3...) where each question provides clear Yes/No branches that map directly to specific actions. Each question is paired with an independent "Judgment Basis" section containing exhaustive checklists and executable verification commands.

Design Significance

Eliminates ambiguity: Agent decisions are strictly constrained to objective verification of checklists and command outputs — no open-ended reasoning required. This provides better compatibility across different model versions and capabilities.
Breakpoint resumption: Decisions are based on persistent file-system artifacts. The workflow can be interrupted at any step; on restart, re-traversing the decision tree automatically locates the interruption point, achieving idempotency.
Enforced prerequisites: The sequential ordering (Q1 → Q2 → Q3) guarantees task dependencies — e.g., bug fixing (Stage 3) cannot start until risk assessment (Stage 2) is complete.
On-demand navigation: The decision tree enables the agent to load only documents relevant to the current task, saving token consumption and improving accuracy.

Example from Reference Implementation

In 1-code-cognition/Workflow.md, Q1 "Does the overall report exist?" has four explicit checks: (1) docs/codearch/overall_report.md exists; (2) it contains a non-empty "Project Goals" section; (3) it has "Inputs", "Outputs", "Main Flow" sections; (4) it has an "Information Sources" section. If any check fails, Phase 01 is triggered — zero ambiguity.

1-code-cognition/Workflow.md

PRINCIPLE 02 · PROCESS MANAGEMENT

📄 Explicit State & Contracts

All workflow state is explicitly persisted to the file system. Inter-stage handoffs strictly follow predefined input/output "contracts" — docs/codearch/ represents Stage 1 completion, docs/risk_tasks/ represents Stage 2 output, and docs/remediation/ represents Stage 3 artifacts. Every key artifact has a corresponding structure definition document specifying required sections, fields, and downstream usage conventions.

Design Significance

Observability: Any person or agent can inspect the file system to understand workflow progress. Issues can be precisely traced back to the non-compliant artifact.
Stage decoupling: Clear contracts allow stages to be independently developed and improved. As long as input/output agreements are honored, internal implementations (models, skills) can be swapped without affecting other stages.
Collaborative execution: Different agents or human experts can work in parallel — one agent produces the knowledge base in Stage 1, another consumes it in Stage 2.

Example from Reference Implementation

task_output_structure.md defines not only the required fields for each task record (location, description, risk type, related module, reasoning chain, excluded protections, preconditions to verify, impact level), but also a "Downstream Usage Conventions" section that explicitly guides Stage 3 on how to consume this information — how to locate code, understand reasoning chains, and design targeted verification tests, ensuring lossless and efficient cross-stage information transfer.

task_output_structure.md output_structure.md

PRINCIPLE 03 · COGNITIVE ARCHITECTURE

🎯 Minimum Context Principle

Under the constraint of limited LLM context windows, the minimum context principle is the key design enabling complex workflows to run efficiently. It operates at three levels: at the document level, summary reports contain only module lists and links without embedding full reports; at the instruction level, Workflow and Phase documents explicitly forbid pre-loading all Skill documents; at the execution level, the agent processes modules in small batches, loading only relevant reports per batch.

Design Significance

Reduced token consumption: Ensures each LLM interaction stays within context window limits, preventing truncation or execution failures.
Improved focus: Limiting per-interaction information forces the agent to concentrate fully on the current task, improving analysis depth and quality.
Faster execution: Loading and processing less data means faster execution cycles and lower cost.

Example from Reference Implementation

A 100-module project might exceed hundreds of thousands of tokens if all module reports are combined. Under this principle, when analyzing a specific bug, the agent loads only overall_report.md (~2k tokens) and 1–2 relevant module reports (~5k tokens each) — keeping total consumption extremely low at approximately 12k tokens per query.

output_structure.md Root Workflow.md

PRINCIPLE 04 · COGNITIVE ARCHITECTURE

🏗️ Layered Abstraction & Shared Definitions

The workflow achieves separation of concerns through a Workflow → Phase → Skill → Definition multi-layer structure, while eliminating redundancy through cross-stage shared definitions. Workflow (Why) defines orchestration; Phase (What) defines goals and acceptance criteria; Skill (How) provides step-by-step execution; Definitions (Knowledge) store stable conventions referenced by link from any layer.

Design Significance

Maintainability: Modifying specific operations requires updating only the corresponding Skill. Adjusting stage goals doesn't require changing lower-level implementations — consistent with interface-implementation separation in software engineering.
Readability: Humans or agents can drill down layer by layer — Workflow for the big picture, Phase for stage details, Skill for specific operations.
Consistency & reuse: Shared definitions ensure uniform terminology, operational standards, and quality criteria throughout the workflow, avoiding inconsistencies from duplicated maintenance.

Example from Reference Implementation

02-review.md (Phase) states the task is "perform deep review." The review method is detailed in skill-02-review.md (Skill) — covering batching strategy, depth determination, and pattern application. "Review patterns" are abstracted into review_patterns.md (Definition), loaded on demand by the Skill. Feedback conventions are defined centrally in the root definitions/feedback_protocol.md, referenced by all stages via link.

phases/02-review.md skill-02-review.md review_patterns.md feedback_protocol.md

PRINCIPLE 05 · QUALITY ASSURANCE

🧠 Knowledge-Driven Analysis

The core analysis strategy is "deep understanding first, targeted analysis second" — not blind keyword scanning. Stage 1 requires module reports to capture deep semantic information (key data flow paths, lifecycle and ownership models, concurrency invariants). Stage 2's review patterns then guide the agent along code paths using this semantic context, performing path-aware analysis rather than pattern matching.

Design Significance

Low false-positive rate: Traditional static analysis produces excessive false positives due to lack of understanding of code intent and ownership models. Knowledge-driven methods let the agent understand contexts like "pointer ownership has been transferred" or "shared variable is protected by a specific lock."
Deep logic defect discovery: Semantic path-tracing can reveal issues that simple pattern matching cannot — deadlocks (inconsistent lock acquisition order across paths), use-after-free (object destroyed before callback fires).
Multi-dimensional risk coverage: Review patterns cover security and correctness (M-1 through E-3), performance (PERF-1 through PERF-3), maintainability (MAINT-1 through MAINT-3), and portability (PORT-1 through PORT-2).

Example from Reference Implementation

When reviewing concurrency race conditions (Pattern C-1: Lock Order Consistency), the agent does not simply search for lock keywords. Instead, it loads "concurrency invariants" from the module report, identifies the agreed lock acquisition order (e.g., lock A before lock B), then searches the code for reverse acquisition paths. This agreement-based check is far more precise than undirected scanning.

output_structure.md review_patterns.md

PRINCIPLE 06 · QUALITY ASSURANCE

🧪 Test-Driven Verification & Remediation

Stage 3 (Bug Remediation) adopts a test-driven methodology to ensure objective verification and safe remediation. A bug is only "confirmed" when a failing test reproduces it. A fix is only "accepted" when the verification test passes AND the full regression suite shows no new failures. Confirmed tests are permanently integrated into the test suite, and impact levels (Critical / High / Medium / Low) guide fix priority.

Design Significance

Objective verification: Transforms "does the bug exist?" from the agent's subjective judgment to an objective test result (Pass/Fail), dramatically increasing reliability.
Side-effect prevention: Mandatory full regression testing ensures fixes don't break other functionality.
Accumulated test assets: Effective verification tests are permanently integrated, continuously enhancing project test coverage in a positive feedback loop.

Example from Reference Implementation

For a suspected buffer overflow, the agent writes a test_buffer_overflow test passing an overly long string. If the program crashes, the bug is "confirmed." After the fix (e.g., adding length validation), the verification test passes. A full regression suite run confirms no side effects, and the test is permanently integrated into the official test suite.

3-bug-remediation/Workflow.md phases/01-verify.md remediation_output_structure.md

PRINCIPLE 07 · QUALITY ASSURANCE

🔄 Iteration & Convergence Control

The workflow transforms a linear process into a self-improving closed loop through two layers of iteration. Within Stage 1, a decomposition review checks module partitioning quality and can trigger rollback. Between Stages 2 and 3, iterative deepening narrows scope with each round — the second round focuses on high-risk modules identified in the first, the third on cross-module interaction verification. Explicit convergence criteria (iteration caps, empty change lists) prevent infinite loops.

Design Significance

Progressive refinement: For complex projects, a perfect first-pass analysis is nearly impossible. Iteration allows starting from an initial version and converging toward high quality through feedback and correction.
Resource focusing: Iterative deepening avoids wasting compute on low-risk areas, concentrating analysis resources where they yield the most value.
Infinite loop prevention: Explicit convergence criteria (iteration caps of 2–3 rounds, empty task lists) ensure the workflow always terminates.

Example from Reference Implementation

After the agent's first module decomposition, the structured review discovers that the utils module is overly broad. The review fails, generating a change list: "split utils into string_utils, net_utils, math_utils." The workflow rolls back to Phase 02 to regenerate reports for the three new modules, then reviews again — repeating until the decomposition is sound.

decomposition_review.md 1-code-cognition/Workflow.md

PRINCIPLE 08 · QUALITY ASSURANCE

🚦 Gatekeeping & Circuit Breakers

Hard quality gates at critical junctures prevent the workflow from investing effort on an unstable foundation. The canonical gate is in Stage 1 Phase 03 (Build & Test System): if the project cannot be compiled (build_success=false) or unit tests cannot run (tests_runnable=false), the entire workflow halts immediately with a clear error report and blocking reason.

Design Significance

Fail fast: Performing code analysis and bug fixing on a project that cannot compile or run tests is meaningless and resource-wasteful. The gate ensures the agent only proceeds when basic conditions are met.
Clear accountability: Halting exposes the problem clearly — "environment configuration issue" or "basic build broken" — prompting the user to fix foundational issues before re-executing.

Example from Reference Implementation

During Stage 1 Phase 03, the agent attempts compilation but fails due to a missing dependency. build_and_tests.md marks build_success as false. The decision tree evaluates Q3 as "No," triggering the hard gate. The agent reports the error and halts the workflow, waiting for the user to fix the build environment before re-executing.

phases/03-build-and-tests.md 1-code-cognition/Workflow.md

PRINCIPLE 09 · CROSS-STAGE COORDINATION

🔁 Feedback Loop & Self-Healing Knowledge

A unified feedback protocol enables later stages to correct the knowledge base built in earlier stages. When Stage 2 or Stage 3 discovers discrepancies between documentation and actual code (e.g., inaccurate module descriptions, missing external dependencies, misdrawn module boundaries), the agent updates the corresponding knowledge base documents in-place following standardized feedback operations, optionally recording feedback type and modification summary in the change log.

Design Significance

Self-healing knowledge base: Understanding gaps discovered during deeper code exploration in later stages are immediately corrected, rather than accumulating stale information. This maintains the timeliness and accuracy of the knowledge base throughout the workflow.
Standardized operations: A unified feedback protocol prevents each stage from modifying the knowledge base in ad-hoc ways, ensuring modifications are consistent and auditable.

Example from Reference Implementation

During Stage 2 review, the agent discovers that a module's concurrency model description doesn't match the actual code — the report marks it as "single-threaded," but the code uses std::thread. The agent immediately updates the module report's "Concurrency Model" section and records this feedback in the change log, keeping the knowledge base accurate for downstream consumers.

feedback_protocol.md execution_principles.md

Architecture

Three-Stage Pipeline

The Agentic Code Assurance workflow instantiates all 9 principles across three sequential, contract-bound stages.

STAGE 01

Code Cognition

Build a rich semantic knowledge base of the target project. Decompose it into modules, then generate detailed reports capturing responsibilities, API surfaces, memory ownership models, data flow paths, and concurrency invariants. A structured decomposition review ensures quality before proceeding.

→ docs/codearch/

›

STAGE 02

Risk Assessment

Consume the knowledge base to perform targeted, semantic bug hunting. Apply a library of review patterns (memory safety, concurrency, I/O, error handling, performance) guided by the semantic context from Stage 1. Output a structured, prioritized task list of confirmed risk items.

→ docs/risk_tasks/task_list.md

›

STAGE 03

Bug Remediation

For each task: first write a failing test to confirm the bug exists, then implement the fix, then verify the test passes and the full regression suite is clean. Confirmed tests are permanently integrated. The circuit breaker ensures this stage only runs on a compilable, testable project.

→ docs/remediation/

          1-code-cognition/Workflow.md — Decision Tree Example
          Markdown
        
## Quick Decision Tree

Q1: Does `docs/codearch/overall_report.md` exist and pass validation?
   Checklist:
   - [ ] File exists at the specified path
   - [ ] Contains a non-empty "Project Goals" section
   - [ ] Contains "Inputs", "Outputs", "Main Flow" sections
   - [ ] Contains "Information Sources" section
   → ALL PASS: proceed to Q2
   → ANY FAIL: execute Phase 01 (Project Overview)

Q2: Do module reports exist for all modules in the overall report?
   Command: grep -c "^## " docs/codearch/overall_report.md
   Check: count of module files in docs/codearch/modules/ matches
   → PASS: proceed to Q2b
   → FAIL: execute Phase 02 (Module Analysis)

Q2b: Does the decomposition pass the structured review?
   Run: skill-decomposition-review checklist
   → PASS: proceed to Q3
   → FAIL: rollback per decomposition_review.md rules

Comparison

How It Compares

Agentic workflows with principled scaffolding address the fundamental limitations of existing approaches.

Capability	Traditional Static Analysis (Coverity, Clang-Tidy)	Monolithic LLM Agent (SWE-agent, OpenDevin)	Principled Agentic Workflow (This Framework)
Semantic Understanding	✗ Pattern-based only	~ Context-limited	✓ Deep semantic knowledge base
False Positive Rate	High — no intent model	Medium — hallucination risk	Low — knowledge-grounded analysis
Deterministic Behavior	✓ Rule-based	✗ Non-deterministic	✓ Decision-tree driven
Verifiable Correctness	~ Alerts only	✗ No test enforcement	✓ TDD-based verification
Scales to Large Codebases	✓ File-by-file	✗ Context window limits	✓ Minimum context principle
Resumable / Idempotent	✓	✗ Stateless per session	✓ File-system state
Detects Logic Bugs	✗ Syntax/pattern only	~ Unreliable	✓ Semantic path tracing
Debuggable Process	~ Report only	✗ Black box	✓ Every decision is traceable
Accumulates Test Assets	✗	✗	✓ Tests integrated permanently

Open Source Example

See It in Action

Explore a real-world implementation of these principles applied to C/C++ code quality and test refactoring.

⚙️

GITHUB · OPEN SOURCE

refactor-skills-for-tests

A practical implementation of the Agentic Workflow framework applied to C/C++ projects. Demonstrates how the 9 design principles translate into a concrete, file-based workflow specification that any LLM-based coding agent (Claude Code, OpenCode, etc.) can execute to analyze and improve code quality.

C/C++ Agentic Workflow Code Quality Claude Code OpenCode

Open on GitHub →

Workflow File Structure

          Recommended directory layout
          Shell
        
# Root entry point
Workflow.md                     # Global decision tree & stage orchestration

# Stage 1: Code Cognition
1-code-cognition/
  Workflow.md                   # Stage decision tree (Q1 → Q2 → Q2b → Q3)
  phases/                       # 01-overview, 02-modules, 03-build-tests, 04-reports
  skills/                       # Detailed how-to for each phase task
  definitions/                  # complexity_levels, output_structure, cpp_notes…

# Stage 2: Risk Assessment
2-risk-assessment/
  Workflow.md
  phases/                       # 01-scope, 02-review, 03-summary
  skills/
  definitions/                  # risk_types, review_patterns, task_output_structure

# Stage 3: Bug Remediation
3-bug-remediation/
  Workflow.md
  phases/                       # 01-verify, 02-fix, 03-regression
  skills/
  definitions/                  # remediation_output_structure

# Generated knowledge base (output)
docs/
  codearch/overall_report.md    # Stage 1 → Stage 2 contract
  codearch/modules/             # Per-module semantic reports
  risk_tasks/task_list.md       # Stage 2 → Stage 3 contract
  remediation/                  # Fix reports & integrated tests

Getting Started

Apply This Framework

Adopt these principles to build your own domain-specific agentic workflow in four steps.

Define Your Stages & Contracts

Break your task into sequential stages. For each stage, define its input contract (what files must exist and what they must contain) and its output contract (what files it will produce). These contracts are the backbone of your workflow.

Stage 1 Output Contract: docs/knowledge_base/*.md
Stage 2 Input Contract: docs/knowledge_base/ must exist & be valid
Stage 2 Output Contract: docs/task_list.md

Write the Decision Tree for Each Stage

For each stage, write a Workflow.md with a Q&A decision tree. Each question must be answerable by a shell command or file check — never by the LLM's opinion. Include the exact verification commands in a "Judgment Basis" subsection.

Q1: Does docs/knowledge_base/index.md exist?
Command: [ -f docs/knowledge_base/index.md ] && echo YES || echo NO
→ YES: proceed to Q2
→ NO: execute Phase 01

Decompose into Phases and Skills

For each stage, write Phase documents (what to do, acceptance criteria) and Skill documents (how to do it, step by step). Extract reusable knowledge into Definition documents. Keep Skills focused — they should be loadable on-demand, not preloaded.

Add Gatekeepers & Feedback Loops

Identify the critical preconditions for your workflow (e.g., "data must be valid", "environment must be configured"). Add hard circuit breakers that halt the workflow with a clear error if these fail. Add structured review checkpoints that can trigger controlled rollbacks to refine earlier outputs.

FAQ

Frequently Asked Questions

Is this framework specific to C/C++ code analysis?

No. The 9 design principles are domain-agnostic. The Agentic Code Assurance workflow is one instantiation applied to C/C++ security analysis. The same framework can be applied to scientific literature review, legal contract analysis, data pipeline validation, or any complex multi-step task requiring reliability.

Which LLM agents are compatible with this workflow format?

Any agent that can read files and execute shell commands. This includes Claude Code, OpenCode, SWE-agent, OpenDevin, Devin, Cursor, and similar tools. The workflow is defined in plain Markdown files, making it universally compatible. The agent reads the Workflow.md, follows the decision tree, and loads Phase/Skill documents as instructed.

How does this differ from a simple system prompt?

A system prompt is a monolithic instruction loaded once. This framework is a structured, multi-file specification where instructions are loaded on-demand based on the current state. This enables the Minimum Context Principle, allows the workflow to scale to arbitrarily complex tasks, and makes the process transparent, debuggable, and resumable — none of which are possible with a single prompt.

What is the upfront cost of designing a workflow this way?

Significant domain expertise is required to define the stages, contracts, and review patterns. This is not a plug-and-play solution. It is a framework for experts to encode their process knowledge so that an agent can execute it reliably and repeatedly. The investment pays off when the workflow needs to run many times, or when reliability and verifiability are non-negotiable requirements.

Can multiple agents collaborate on a single workflow?

Yes. Because all state is persisted to the file system via explicit contracts, different agents (or humans) can work on different stages independently. Agent A can execute Stage 1 and produce the knowledge base; Agent B can then consume it to execute Stage 2. The contract-based design enables this kind of parallel and collaborative execution naturally.

What happens when the LLM makes a mistake mid-workflow?

The decision tree is idempotent. If a stage produces incorrect output, the output file either fails the next decision tree check (triggering a re-run) or is corrected via the feedback loop mechanism. The workflow can always be re-entered from the beginning; it will automatically detect the current state and resume from the correct point. No work is lost.

Open Source

Start Building Reliable Agents

Explore the open-source example, study the workflow structure, and apply these principles to your own domain.

View Example on GitHub Read the Principles

Principled Scaffolding for Reliable AI Agents

9 Core Design Principles

Principles in Detail

Three-Stage Pipeline

How It Compares

See It in Action

Apply This Framework

Frequently Asked Questions

Start Building Reliable Agents

Principled Scaffolding
for Reliable AI Agents