All Feeds


A community blog devoted to refining the art of rationality

A Mechanistic Interpretability Analysis of Grokking09/20/22
What do ML researchers think about AI in 2022?09/20/22
A Mechanistic Interpretability Analysis of Grokking09/20/22
DeepMind alignment team opinions on AGI ruin arguments09/17/22
(My understanding of) What Everyone in Technical Alignment is Doing and Why09/13/22
Unifying Bargaining Notions (1/2)09/10/22
Some conceptual alignment research projects09/01/22
Survey advice08/27/22
Toni Kurz and the Insanity of Climbing Mountains08/23/22
Deliberate Grieving08/19/22
Language models seem to be much better than humans at next-token prediction08/16/22
Humans provide an untapped wealth of evidence about alignment08/14/22
Changing the world through slack & hobbies08/10/22
Ā«BoundariesĀ», Part 1: a key missing concept from utility theory08/04/22
ITT-passing and civility are good; "charity" is bad; steelmanning is niche07/26/22
What should you change in response to an "emergency"? And AI risk07/23/22
On how various plans miss the hard bits of the alignment challenge07/20/22
Humans are very reliable agents07/14/22
Looking back on my alignment PhD07/09/22
Prev | Next