Elizabeth Donoway

I'm Elizabeth, an Anthropic Fellow working with Jan Leike on quantifying elicitation and learning in LLMs. I use information theoretic approaches combined with interventions on model internals to understand and formalize what it means to elicit latent capabilities in language models vs. teach entirely new skills that are absent following pretraining. I previously did MATS with Marius Hobbhahn creating evals of long-horizon goal-directedness in LLMs and LLM agents. Prior to that, I developed evals to assess LLMs' physical reasoning skills as part of BIG Bench.

I'm also finishing a physics PhD in Mike DeWeese's group at UC Berkeley using techniques from statistical mechanics and information geometry to understand how machines learn. I'm grateful to be supported by both NSF GRFP and Ford fellowships. I previously spent a good deal of time trying to find and describe exotic phases of matter in quantum systems.