Max Kaufmann
Last updated Jan 13th, 2025
Currently Hosted Projects
AI Self Eval
Generates many responses from an LLM to a user's prompt. Then these responses are graded by another LLM. The results are sorted and the grading is explained. This is useful for generating many ideas or checking if the responses are unanimous.
Reports
Minimum response length for reward models
This explored how truncating responses of an LLM affects the accuracy of reward models. It found reward models using only a sentence could accurately predict pairwise comparisons from the Reward Bench Dataset.