Execution Model

Anvil executes declarative task workflows across one or more AWS organizations, many accounts within each organization, and one or more configured AWS regions.

At a high level:

Each target is defined independently in YAML.
Each target declares its own profile, role, regions, worker limits, region concurrency, task graph, account filters, dry-run behavior, fail-fast behavior, and metadata.
Each YAML can declare max_parallel_targets to bound how many configured targets may execute at once.
Anvil validates YAML against the packaged JSON Schema and semantic target rules before execution starts.
Organization targets authenticate, create an organization-scoped base session, discover eligible accounts, discover region statuses, validate configured regions, and build the effective account execution set.
Accounts execute in parallel within a target, bounded by max_workers.
Within an account, tasks execute in dependency order for each effective region, optionally bounded by max_parallel_regions.
Results are captured at task, account, target, and engine scope.

This model is designed for workflows that need consistent execution across multiple AWS organizations while still respecting account boundaries, region-specific service presence, and per-organization execution settings.

Flow

flowchart TD
    A["Run command"] --> B["Load YAML"]
    B --> C["Start target pipeline"]

    C --> D["Prepare targets in parallel<br/>bounded by<br/>max parallel targets"]
    D --> E{"Target prepared"}
    E --> F["Auth check"]
    F --> G{"Auth OK?"}

    G -->|No| H["Record auth result<br/>skip execution"]
    G -->|Yes| I["Apply run-time overrides"]
    I --> J["Resolve task graph"]
    J --> K["Build execution context"]
    K --> L["Ready queue"]

    L --> M{"Execution slot open<br/>and org not already active?"}
    M -->|No| N["Wait in ready queue"]
    M -->|Yes| O{"Target type?"}

    O -->|Organization| P1
    O -->|Accounts| Q1

    subgraph LEFT["Organization target"]
        direction TD
        P1["Create base session"]
        P1 --> P2["Read org identity"]
        P2 --> P3["Discover active accounts"]
        P3 --> P4["Discover region statuses"]
        P4 --> P5["Validate configured regions"]
        P5 --> P6["Apply include/exclude filters"]
        P6 --> P7["Build account list"]
    end

    subgraph RIGHT["Explicit accounts target"]
        direction TD
        Q1["Create base session"]
        Q1 --> Q2["Read explicit account list"]
        Q2 --> Q3["Build account list"]
    end

    P7 --> R["Create account worker pool"]
    Q3 --> R

    R --> S["Dispatch accounts in parallel<br/>bounded by per-target max workers"]
    S --> T["Worker executes one account"]

    T --> U{"Management account?"}
    U -->|Yes| V["Reuse worker session<br/>for region"]
    U -->|No| W["Assume role once<br/>for account"]
    W --> X["Create region session<br/>from assumed credentials"]

    V --> C1["Wrap account-region session<br/>with lazy client cache"]
    X --> C1
    C1 --> Y["Run tasks by region<br/>in dependency order"]

    Y --> YA{"More tasks or regions?"}
    YA -->|Yes| Y
    YA -->|No| Z{"Failure with fail-fast?"}

    Z -->|No| AA["Continue account work"]
    Z -->|Yes| AB["Set cancellation signal"]
    AB --> AC["Stop pending account work"]

    AA --> AD["Account result"]
    AC --> AD

    AD --> AE["Target result"]
    AE --> AF["Release org slot if needed"]
    AF --> AG["Record target result<br/>in input order"]

    H --> AH{"More prep or<br/>execution work?"}
    N --> AH
    AG --> AH
    AH -->|Yes| E
    AH -->|No| AI["Build ordered auth results"]
    AI --> AJ["Build ordered target results"]
    AJ --> AK["Compute engine state"]
    AK --> AL["Return engine result"]

Multi-Organization Execution

Anvil supports multiple organizations in a single run. Each target is treated as an independent execution context with its own:

AWS profile
target regions
role name
include or exclude account filters
target-level YAML concurrency through max_parallel_targets
worker concurrency
region concurrency through max_parallel_regions
dry-run behavior
fail-fast setting
task definitions
metadata

This allows one execution to coordinate work across separate AWS environments without forcing them into a shared credential model or shared runtime configuration.

When one YAML contains multiple targets that resolve to the same AWS organization, Anvil reuses organization discovery results during that run. The first target to discover active accounts and region statuses populates a run-local cache keyed by organization ID. Concurrent preparation for the same organization waits for that in-flight discovery instead of issuing duplicate list_accounts and list_regions calls.

Target execution is still serialized per organization later in the pipeline, so two same-organization targets do not execute account work at the same time.

Multi-Region Execution

Configured regions are part of the execution scope rather than a single global default. During organization startup, Anvil validates configured regions against the regions enabled for that organization and only executes in effective configured regions that remain after validation.

Task execution occurs per account and per region, and task results include the region they ran in. This makes region-specific inventory, validation, enforcement, and reporting workflows easier to reason about and audit from structured output.

By default, regions execute serially within each account. A target can set max_parallel_regions from 1 through 4 to run multiple regions for the same account concurrently while preserving task dependency order inside each region.

Use parallel regions for workloads where each region has enough independent work to benefit from overlap, such as long paginated inventory, deep regional checks, slow service-specific scans, or multiple regional tasks that call different AWS services.

For lightweight describe/list tasks across many accounts, region parallelism can increase AWS API pressure enough that each regional call slows down. In those cases, leave max_parallel_regions at 1 and rely first on account-level concurrency.

Region scheduling is intentionally strict. Anvil only starts up to max_parallel_regions regions at a time for one account. If a non-optional task fails in one region, regions that have not started are left unstarted, while already-running regions stop cooperatively before their next task. Even when regions finish out of order, task results are returned in configured region order and task order.

Account Selection

After discovering active accounts in an organization, Anvil applies optional include or exclude filters to determine the final execution set.

If an include or exclude list references unknown account IDs, Anvil warns but continues with valid discovered accounts that remain. This helps catch stale configuration without turning harmless selection drift into a hard failure.

Bounded Parallel Account Execution

Accounts execute concurrently within an organization through a bounded worker pool controlled by target configuration. The max_workers setting controls how many account executions may run at the same time for a target.

Account work is submitted to the worker pool up front, and the executor runs up to max_workers accounts at a time. If fail-fast is enabled, Anvil signals cancellation and cancels pending account futures where possible. Accounts already running stop cooperatively when they observe the cancellation signal before starting another task.

When max_parallel_regions is greater than 1, approximate account-region task streams per target are:

max_workers * max_parallel_regions

Across multiple targets, the rough upper bound is:

max_parallel_targets * max_workers * max_parallel_regions

Benchmark concurrency changes with the same target count and task mix you plan to run in production.

Fail-Fast and Cancellation

When fail-fast is enabled, the first unsuccessful account result causes Anvil to signal cancellation to the rest of that organization run and cancel pending work where possible.

Cancellation is cooperative rather than forceful. Accounts already in progress continue only until they observe the shared cancellation signal, then stop early instead of continuing unnecessary work.

For example, in a run with 50 accounts, 3 regions, and 5 tasks per account:

Full run without fail-fast: 50 x 3 x 5 = 750 task runs.
Fail-fast enabled: Anvil signals cancellation across the organization, and running accounts check that signal before starting the next task.

Session and Credential Model

Anvil separates organization-level session creation, worker-session reuse, and member-account role assumption.

Organization-Scoped Session Setup

Each organization creates a base boto3 session for organization-level control-plane work such as account discovery, region validation, and management-account lookup. This base session is not the account execution session; it is the organization-scoped entry point for discovery and orchestration.

Thread-Local Worker Sessions

For worker execution, Anvil uses thread-local boto3 sessions keyed by profile and region. This allows worker threads to reuse appropriately scoped sessions without sharing session objects across threads and without mixing profile or region context between organizations.

Thread-local worker sessions:

prevent profile or region context from being mixed together
avoid recreating the same worker session repeatedly inside the same worker thread
keep threading concerns in the session layer instead of spreading them across organization and account execution code

Member-Account Role Assumption

For member accounts, Anvil assumes the configured role once per account execution and reuses the returned temporary credentials to construct region-scoped sessions for each effective region. This avoids repeating STS role assumption for every region while still giving each region run its own correctly scoped boto3 session.

Before each member-account region starts, Anvil checks whether shared assumed-role credentials are expired or too close to expiration. The safety window starts at five minutes, then expands during the account run based on the longest completed region duration plus a small buffer.

If credentials are inside that safety window, Anvil refreshes them before constructing the region session. Parallel region execution coordinates this refresh with a per-account lock so multiple region workers do not all re-assume the role at the same time.

Management-Account Execution

Management accounts do not require role assumption. They execute directly with the organization/profile-backed worker session for each region.

Account-Region Client Caching

For task execution, Anvil wraps each account-region session with a small lazy client cache before passing it to tasks.

The cache scope is intentionally narrow: one account, one region, one ordered task stream. If two tasks in the same account-region both call session.client("ec2"), the first call creates the EC2 client and the second call reuses it. If a task calls a different service, or calls the same service with different client arguments, Anvil creates a separate client for that distinct call shape.

Client caching reduces repeated boto3 client construction, service model setup, endpoint setup, and connection pool churn. It does not reduce AWS API calls.

Result Model

Anvil records structured results at four layers:

Task result: includes region-specific task outcome data.
Account result: summarizes task outcomes for one account.
Target result: summarizes the selected accounts for one organization or account group.
Engine result: summarizes the entire multi-target run.

This helps humans review outcomes and makes downstream machine processing easier.