Research interests

“Don’t work on dumb things!” - Lennart Heim

I’m researching how to reduce AI Risk.

Ongoing projects:

  • LLM Reasoning consistency benchmarking
  • How well does Sparse Autoencoder (SAE) work? Also studying:
  • Evaluation for risks from multi-agent interactions
  • Communicating affordance for general AI tools
  • Mitigating Over-reliance
  • AI Literacy
  • Games that change the games we play Want to learn more about:
  • Identifying and simulating sociotechnical AI risks
  • Privacy-preserving model evaluation
  • Resilience Engineering
  • Science of LLM capability emergence
  • Contextual Bandit

Previous iterations of research interests

Notes mentioning this note