Skip to main content
The term “dark factory” comes from manufacturing. Since 2001, FANUC has operated a factory near Mt. Fuji where robots build other robots — running unsupervised for up to 30 days at a time. The factory is “dark” because no humans are present and robots don’t need light. In software, the dark factory concept is different. It doesn’t mean zero human involvement — it means minimal human interaction with the code itself. Humans supervise the specs, guardrails, and outcomes, not each line of code. Engineers shift from writing and reviewing code to defining what should be built, how quality is measured, and when to intervene. This is an aspirational concept, and getting there is iterative.

From coding to orchestrating

Most teams today have AI writing code while humans review it line by line. The hardest transition is moving beyond that — replacing ad-hoc human review with structured, repeatable verification that you actually trust. Dan Shapiro’s five-level framework describes this progression well.

What makes it work

The dark factory isn’t a single tool or practice. It’s a set of capabilities that compound: Declarative workflows over imperative prompts. When the process is a version-controlled graph — not a chat transcript — you can review, iterate, and share it like any other source file. The workflow itself becomes the specification of how work gets done. Deterministic verification over human review. Test suites, linters, type checkers, and LLM-as-judge evaluations replace line-by-line code review. Failures route back to fix loops automatically. Humans define the criteria; the system enforces them. Multi-model ensembles over single-model dependence. Using different models for implementation and verification breaks the circularity problem — where the builder and inspector share the same blind spots. Cross-critique with fresh eyes catches what self-review misses. Checkpointed execution over black-box runs. Git commits after every stage create an audit trail. When something goes wrong, you can inspect, revert, or fork from any point — without having watched the run live. Observability over black-box automation. Durable event streams, checkpoints, conclusions, and stage outputs make each run inspectable after the fact. Workflows get better when teams can see what happened and adjust the graph.

The human role in a dark factory

The dark factory doesn’t eliminate engineering judgment. It redirects it:
BeforeAfter
Writing codeDefining workflows and prompts
Reviewing diffsDefining verification criteria
Debugging test failuresDesigning fix loops
Watching agent sessionsInspecting run traces
Manual quality checksTuning goal gates and evals
The goal is to spend your time on the parts that require human judgment — what to build, how to verify it, and when something doesn’t look right — while the factory handles the rest.

Further reading

Workflows

Learn how workflow graphs orchestrate agents, commands, and human gates.

Human-in-the-Loop

Control where and how humans intervene in workflows.

Quality Verification

Build verification into your workflows.

Observability

Inspect event streams, logs, and exported run state.