Building Reward Signals for LLM Agents
Lessons from designing evaluation frameworks that actually measure what matters — not just what's easy to measure.
Writing
Notes on building AI systems, the math underneath, and the human side of engineering work.
Lessons from designing evaluation frameworks that actually measure what matters — not just what's easy to measure.
How a Ph.D. in pure math turned into a career building AI systems at scale, and why the transition was less linear than it sounds.
Why human evaluation doesn't scale for agentic systems, and how to build automated evaluation that you can actually trust.
✦ More posts coming soon. In the meantime, you can find my thinking scattered across commit messages and design docs.