LegoFactory
Towards Fully Agentic Workflows
for Code Intelligence
LegoFactory is a fully agentic workflow for building software-engineering data end to end — mining real GitHub instances, rolling out agent trajectories, then fine-tuning and evaluating the model that results. Its block-wise design lets any stage be extended, swapped, or handed off to another agent without disturbing the rest — so you extend the pipeline instead of rewriting it.
Live SWE data, collected automatically.
Continuously mines real GitHub pull requests into executable, validated tasks — the dataset grows itself and never needs hand-labeling.
One workflow, from raw PRs to a measured model.
Instance collection, trajectory rollout, SFT, and evaluation run end to end as a single agentic loop — the model you train is scored on the task family it learned from.
Built from blocks — extend, don’t rewrite.
Every stage is one config, one contract, one command, wired together by name and handed off agent-to-agent. Swap or add a stage without disturbing the rest.
▶ Demo