Announcing Our $8M Seed Round
Polymath has raised $8M to build the environment layer for training and evaluating autonomous AI agents.
Introducing Horizon-SWE: Evaluating the Performance of AI Agents on End-to-End Software Engineering Workflows
A benchmark for multi-tool, long-horizon software engineering tasks in production grade systems.
Towards Greater Reliability and Autonomy in Software Engineering Agents
AI coding agents have become remarkably capable at programming - starting from autocomplete, to single-file edits, to making changes across the entire repository. However, true software engineering requires more than just writing code.