Research interests

“Don’t work on dumb things!” - Lennart Heim

I divide my research into two categories: tractable, and moonshot.

Tractable researches are the ones I know for sure will help with reducing AI Risk. Moonshot ones are more ambitious and uncertain, but could be quite impactful if it succeeds.

Tractable -

Literature review of mechanistic interpretability - theory of change

Analyze theory of change for mech interp research agendas. Inform governing bodies on how it will help AI regulation.

Moonshot -

✨ Vibe-based research ✨ + Cyborgism

Forgive the title. When I talk about vibe, I’m thinking about our nervous system’s innate ability to pattern-match and intuit conclusions; System 1 thinking (Daniel Kahneman, Thinking, Fast and Slow).

I want to note that I’m not suggesting we determine whether an AI is aligned by talking to it and “feeling out” how trustworthy it is. Rather, I think human intuition is a powerful process that we can harness or enhance. A success story here may look like exploring the neuron semantics of a LLM with a visualizer, noticing strange patterns, and in turn make conjectures about its mechanisms. Then, we use rigorous system 2 thinking to verify those conjectures.

Similar ideas appear in Cyborgism:

The object level plan of creating cyborgs for alignment boils down to two main directions: […] 2. Train alignment researchers to use these tools, develop a better intuitive understanding of how GPT behaves, leverage that understanding to exert fine-grained control over the model, and to do important cognitive work while staying grounded to the problem of solving alignment.

Finally I think it’s also valuable to better understand ✨vibes✨. There are superficial similarities (or are they that superficial?) between artificial neural networks and system 1 thinking with our own organic neural network. We should keep our mind open and be inspired by neuroscience and other disciplines.

“a vibe is a compression scheme is a probabilistic model.” - janus

“To think a lot but all at once, we have to think associatively, self-referentially, vividly, temporally” - Peli Grietzer, A Theory of Vibes

