AI — ev1.ch

Mechanistic Interpretability

Understanding what computations are actually happening inside neural networks — circuits, features, superposition. Anthropic's work here is the most compelling thread in AI safety research.

Anthropic's approach to training helpful, harmless, and honest models through self-critique. The paper is dense but the core idea is elegant — having models evaluate their own outputs against a set of principles.

Language Model Evaluations

How do we know if a model is actually better? The gap between benchmark performance and real-world usefulness is wide and interesting. Most evals measure the wrong things.

Prompting as Programming

Prompt engineering is converging on something that looks like software engineering — composition, abstraction, debugging, testing. What does that mean for tooling and interfaces?

Local Inference

Running small models locally via llama.cpp, Ollama, MLX on Apple Silicon. The capability gap with frontier models is large but shrinking faster than expected.

AI Interests

Mechanistic Interpretability

Constitutional AI

Language Model Evaluations

Prompting as Programming

Local Inference