Research interests 2023 winter
This page archives my research interests back then, and what happened to them. Archiving them allow me to trace the changes in my thinking and my situations, and to make the Research Interests page more focused on what I actively work on or think about. I might pick those back up someday. Or maybe someone better suited can.
Literature review of mechanistic interpretability - theory of change
Analyze theory of change for mech interp research agendas. Inform governing bodies on how it will help AI regulation.
Result: A full literature review ended up proving to be beyond my ability and time budget. I did think about this a bit. See Quick thoughts on theory of changes for interpretability
✨ Vibe-based research ✨ + Cyborgism
Forgive the title. When I talk about vibe, I’m thinking about our nervous system’s innate ability to pattern-match and intuit conclusions; System 1 thinking (Daniel Kahneman, Thinking, Fast and Slow).
I want to note that I’m not suggesting we determine whether an AI is aligned by talking to it and “feeling out” how trustworthy it is. Rather, I think human intuition is a powerful process that we can harness or enhance. A success story here may look like exploring the neuron semantics of a LLM with a visualizer, noticing strange patterns, and in turn make conjectures about its mechanisms. Then, we use rigorous system 2 thinking to verify those conjectures.
Similar ideas appear in Cyborgism:
The object level plan of creating cyborgs for alignment boils down to two main directions: […] 2. Train alignment researchers to use these tools, develop a better intuitive understanding of how GPT behaves, leverage that understanding to exert fine-grained control over the model, and to do important cognitive work while staying grounded to the problem of solving alignment.
Finally I think it’s also valuable to better understand ✨vibes✨. There are superficial similarities (or are they that superficial?) between artificial neural networks and system 1 thinking with our own organic neural network. We should keep our mind open and be inspired by neuroscience and other disciplines.
“a vibe is a compression scheme is a probabilistic model.” - janus
“To think a lot but all at once, we have to think associatively, self-referentially, vividly, temporally” - Peli Grietzer, A Theory of Vibes
Result: This agenda is far less concrete and more of a ✨vibe✨. I still think there’s something to the parallel between deep learning and our system 1 thinking, and it’s very much worth studying this. However that job is probably better left for a neuroscience researcher. And the “success story” in the original proposal probably would never work, given that 1. there’s no guarantee that the same trained intuitions can work for different LLMs, and 2. the AI would probably be operating magnitudes faster than we do, so it’s unlikely that the human can react fast enough against a coup, even if it’s something they could identify. Perhaps automation can solve no.2.