olmo-eval: An evaluation workbench for the model development loop
Allen AI released olmo-eval, an open-source evaluation workbench that extends OLMES to cover the iterative model development loop — not just final-model benchmarking. It emphasizes modularity, pairwise checkpoint comparison, and flexible sandboxing.