LegoFactory

Towards Fully Agentic Workflows
for Code Intelligence

LegoFactory is a fully agentic workflow for building software-engineering data end to end — mining real GitHub instances, rolling out agent trajectories, then fine-tuning and evaluating the model that results. Its block-wise design lets any stage be extended, swapped, or handed off to another agent without disturbing the rest — so you extend the pipeline instead of rewriting it.

Live SWE data, collected automatically.

Continuously mines real GitHub pull requests into executable, validated tasks — the dataset grows itself and never needs hand-labeling.

One workflow, from raw PRs to a measured model.

Instance collection, trajectory rollout, SFT, and evaluation run end to end as a single agentic loop — the model you train is scored on the task family it learned from.

Built from blocks — extend, don’t rewrite.

Every stage is one config, one contract, one command, wired together by name and handed off agent-to-agent. Swap or add a stage without disturbing the rest.

▶ Demo

See the workflow run, end to end.

video demo — coming soondrop the embed or <video> here