Implementation Plan
This page mirrors
bifrost-implementation-plan.mdat the repo root, which remains the canonical source.
Table of contents
- 1. Product principles (non-negotiables)
- 2. What we reuse vs what we build
- 3. Architecture (three planes)
- 4. Tech stack + rationale
- 5. Conversion + LLM pipeline (the core loop)
- 6. Risk model (deterministic, explainable)
- 7. Validation / equivalence (the credibility feature)
- 8. Build sequence (milestones)
- 9. Driving this in Claude Code
- 10. Open decisions (resolve before/early in M2)
Working codename: Bifrost (the bridge between worlds).
One-line thesis: An orchestration + intelligence layer that sits on top of GitHub’s existing
migration CLIs (gh actions-importer, GEI/ado2gh) and turns one-pipeline-at-a-time, syntactic,
CLI-only conversion into a portfolio-scale, semantically-reviewed, human-approved,
fully-documented migration from Azure DevOps to GitHub Actions — with a pluggable multi-model LLM
layer that can run fully air-gapped.
We do not rebuild repo migration or YAML translation. GitHub already does ~90% of the syntactic conversion. Bifrost owns the other 10%, the review/approval workflow, the portfolio orchestration, semantic-equivalence validation, and the audit trail — the parts Microsoft is structurally disincentivised to build well.
1. Product principles (non-negotiables)
- Review-first, not autonomous. The MVP recommends and explains; it never silently rewrites production CI. Auto-commit is opt-in and gated behind approval + validation.
- Air-gap capable. Pipeline YAML leaks infra topology, secret names, and internal hostnames. A customer must be able to run Bifrost with a local model (Ollama / llama.cpp) so no pipeline definition ever leaves their network. This is a headline selling point.
- Platform-agnostic by design. The Actions Importer already converts Jenkins, GitLab, Bamboo, Bitbucket, CircleCI, Travis. Bifrost’s source adapter is an interface; ADO is just the first implementation.
- Attestation-native. Every decision (who approved what, what changed, why, validation result) is recorded as a signed attestation. Compliance/audit export is a first-class feature.
- Wrap, don’t fork. We shell out to
gh actions-importerandgh/ado2gh. We track their versions and parse their outputs. We never reimplement their conversion logic.
2. What we reuse vs what we build
| Capability | Owner | Bifrost’s role |
|---|---|---|
| Repo migration (history, branches, metadata) | GEI / ado2gh |
Orchestrate + track only |
| Near-zero-downtime repo sync | ELM (MS, preview) | Out of scope v1; integrate later |
| Pipeline syntactic conversion (~90%) | gh actions-importer |
Wrap audit/forecast/dry-run/migrate |
| Audit footprint report | gh actions-importer audit |
Parse + aggregate to portfolio view |
| The unconverted ~10% | nobody | LLM-augmented gap fill + explanation |
| Semantic-equivalence validation | nobody | Trigger + diff against ADO baseline |
| Portfolio-scale orchestration | nobody | Job state machine + dashboard |
| Review / approve / edit UI | nobody | React portal w/ side-by-side diff |
| Migration documentation + attestation | nobody | Auto-generated runbook + signed record |
3. Architecture (three planes)
┌─────────────────────────────────────────────────────────────────────┐
│ PORTAL (React/TS) │
│ portfolio heatmap · side-by-side diff · approve/edit · runbook view │
└───────────────▲───────────────────────────────────────────────────────┘
│ REST/SSE
┌───────────────┴───────────────────────────────────────────────────────┐
│ CONTROL PLANE (Rust / axum + tokio) │
│ job state machine (PG) · conversion orchestrator · risk model · │
│ attestation + audit log │
│ │ LLM provider trait │
│ Anthropic (Claude) │ Gemini │ Copilot/llama.cpp │ Ollama (air-gap)│
└───────────────▲───────────────────────────────────────────────────────┘
│ shell-out (Docker) │ HTTP
┌───────────────┴───────────────┐ ┌─────────┴──────────────────────────┐
│ INGESTION ADAPTERS │ │ EXTERNAL │
│ gh actions-importer (Docker) │ │ ADO REST API · GitHub API · GEI │
│ source adapter trait (ADO →…) │ │ │
└────────────────────────────────┘ └────────────────────────────────────┘
Components
- Source adapter trait —
discover(),enumerate_pipelines(),fetch_definition(),fetch_service_connections(),fetch_variable_groups(),task_inventory(). First impl:AzureDevOpsAdapter(ADO REST API). Distinguish classic/designer pipelines from YAML pipelines — classic ones are the painful tail. - Importer wrapper — runs the official Docker image; parses
audit_summary.md(Successful / Partially successful / Unsupported counts, Manual-tasks list, Unsupported-steps list, actions allowlist) and per-pipelinedry-runYAML + logs. - Conversion orchestrator — for each pipeline: run dry-run → diff Importer output against source → identify gaps → dispatch gaps to LLM → assemble augmented workflow + rationale + risk.
- Risk model — deterministic scoring (see §6) → Green / Amber / Red.
- Attestation + audit log — append-only record of every state transition and human action.
4. Tech stack + rationale
- Control plane: Rust (axum, tokio, sqlx). Strong for a long-running orchestrator shelling out to containers and fanning out concurrent LLM calls. SSE for live job progress to the portal.
- Persistence: Postgres (multi-tenant/server), SQLite (single-tenant/local/air-gap).
sqlxover both. - Ingestion: official
gh actions-importerDocker image +ado2gh/ghinvoked as subprocesses. Pin and record tool versions per job. - LLM abstraction:
LlmProvidertrait with impls for Anthropic, Google Gemini, GitHub Copilot/Models, and an OpenAI-compatible local impl (Ollama / llama.cpp). Structured JSON output, prompt templates versioned in-repo. - Portal: React + TypeScript + Vite + Tailwind, Monaco editor for the three-pane diff (ADO YAML | generated Actions YAML | LLM rationale).
- Auth: GitHub App (installation tokens, least-privilege) + Entra ID OIDC for portal SSO.
- Packaging: Docker Compose (self-host v1) → Helm chart later.
- Licence: MIT. Consider open-core: OSS engine, commercial compliance/attestation + multi-tenant control plane.
5. Conversion + LLM pipeline (the core loop)
For each pipeline:
dry-runvia Importer → baseline GHA YAML + conversion log.- Parse log for unsupported steps, partial constructs, manual tasks (secrets, service connections, self-hosted agents, environments/approval gates).
- For each gap, build a grounded LLM request:
- Input: source task/snippet (ADO) + Importer’s converted YAML + the specific failure from the log + repo context (languages, detected build tools).
- Output (structured JSON):
{ proposed_yaml, rationale, risk_flags[], verify_steps[], confidence }. - The model fills only the gap, working from the Importer’s diff — never converts from scratch. Keeps it grounded, cheap, and auditable.
- Model routing policy: bulk/cheap classification + drafting → local model (Ollama) or Haiku-class; hard semantic reasoning + documentation → Claude/Gemini frontier. In air-gap mode, everything routes local and frontier providers are disabled by config.
- Assemble augmented workflow; attach rationale + risk; persist as a proposal awaiting review.
6. Risk model (deterministic, explainable)
Score is computed from factors, not from the LLM (the LLM explains; it does not score):
- % of steps the Importer could not convert
- presence of secrets / variable groups (→ repo/org secrets to provision)
- service connections (→ OIDC federation to GitHub required)
- self-hosted agent pools (→ runner strategy decision)
- deployment/approval gates (→ Environments + required reviewers)
- matrix/parallelism semantics differences
- custom or marketplace tasks with no GHA equivalent
- artifact-passing semantics (publish/download →
actions/upload|download-artifact) - complex conditional expressions / template expansion
Weighted sum → Green (mechanical, low risk) / Amber (needs human verification) / Red (manual rework / architectural decision required). Score + factor breakdown shown in the portal so reviewers see why.
7. Validation / equivalence (the credibility feature)
Full proof of equivalence is impossible; we do smoke parity:
- Push the converted workflow to a sandbox branch/repo.
- Trigger via
workflow_dispatch; capture status, jobs, produced artifacts, key outputs. - Compare against the last successful ADO run for the same pipeline (status, artifact set, declared outputs).
- Emit a parity report + attestation. Reviewer sees pass/divergence before approving commit.
8. Build sequence (milestones)
- M0 — Foundations. Repo, monorepo layout, CI, licence, devcontainer, CLAUDE.md, docs site.
- M1 — Audit MVP. ADO adapter + Importer audit wrapper + portfolio heatmap (CLI + minimal read-only portal). This alone demos the whole thesis.
- M2 — Conversion + LLM. dry-run wrapping, gap detection, LLM provider trait + ≥2 impls (Anthropic + Ollama), grounded gap-fill, risk model.
- M3 — Review portal. Three-pane diff, approve/edit, proposal lifecycle.
- M4 — Commit + PR.
migrate/push, manual-task checklists (secrets/runners/envs), PR open. - M5 — Validation. Sandbox trigger + parity report.
- M6 — Compliance + deploy. Attestation export, GitHub App + Entra OIDC, multi-tenant, Helm.
Recommended first cut to ship/demo: M0 → M1 → the read-only recommendation slice of M2 (LLM explanation + risk score, no auto-commit). Buildable in weeks on top of existing tools.
9. Driving this in Claude Code
- Drop
CLAUDE.mdand this plan at the repo root. - Run
./seed-issues.sh(needsghauthenticated) to create milestones, labels, epics, issues. - Work one epic at a time, milestone order. For each issue: let Claude Code read the issue + relevant plan section, implement, open a PR referencing the issue, and tick the epic checklist.
- Keep
gh actions-importerandado2ghversions pinned in a.tool-versions-style file so conversions are reproducible and attestable.
10. Open decisions (resolve before/early in M2)
- Open-core boundary: what’s MIT vs commercial? (Suggest: engine OSS; attestation + multi-tenant control plane commercial.)
- Copilot as a provider: via GitHub Models API, or treat “Copilot” as a positioning term and route to Claude/Gemini under the hood? (Confirm API access + ToS.)
- Classic pipelines: how far do we support designer/classic ADO pipelines vs YAML-only in v1? (Suggest YAML-first; classic = Amber/Red, flagged for manual.)
- Hosted vs self-host first: self-host/air-gap is the differentiator — lead with it.
- Defensibility vs Microsoft: assume MS bolts Copilot onto the Importer; double down on air-gap, multi-model, validation, attestation, and multi-platform sources.