Max Kaufmann

Last updated Jan 13th, 2025

Email|LinkedIn|Github|Resume

Currently Hosted Projects

AI Self Eval
Generates many responses from an LLM to a user's prompt. Then these responses are graded by another LLM. The results are sorted and the grading is explained. This is useful for generating many ideas or checking if the responses are unanimous.

Reports

Minimum response length for reward models
This explored how truncating responses of an LLM affects the accuracy of reward models. It found reward models using only a sentence could accurately predict pairwise comparisons from the Reward Bench Dataset.