Research interests
“Don’t work on dumb things!” - Lennart Heim
I’m researching how to reduce AI Risk.
Ongoing projects:
- LLM Reasoning consistency benchmarking
- How well does Sparse Autoencoder (SAE) work? Also studying:
- Evaluation for risks from multi-agent interactions
- Communicating affordance for general AI tools
- Mitigating Over-reliance
- AI Literacy
- Games that change the games we play Want to learn more about:
- Identifying and simulating sociotechnical AI risks
- Privacy-preserving model evaluation
- Resilience Engineering
- Science of LLM capability emergence
- Contextual Bandit
Notes mentioning this note
Research interests (archive)
This page archives my research interests back then, and what happened to them. Archiving them allow me to trace the...
Research interests 2023 winter
This page archives my research interests back then, and what happened to them. Archiving them allow me to trace the...