Summer research fellowship for computer science students

The Cambridge AI Safety Hub would like to invite exceptional computer science students at South Carolina to apply to the upcoming iteration of the Mentorship for Alignment Researchers (MARS), an AI safety fellowship that matches exceptional students and early-career researchers with experienced researchers and academics from AI labs, think tanks, and academia. In July we will be flying out promising students and working professionals to the United Kingdom to participate in a "sprint week" where they will begin a research project that they'll subsequently carry out remotely through September.

We'll have more than 20 projects spanning multiple disciplines, but a few projects we think especially interesting to computer science students are:

• Research with Yossi Gandelsman (Reve) on whether LLMs can predict the layer at which their own neurons appear, detect polysemantic neurons, identify causal connections between two neurons in their own architecture, or anticipate their own attention patterns.
• A project with Lindley Lentati (Cambridge Inference) on reproducible white-box jailbreak monitoring, covering automated attack generation, multi-layer probe aggregation, and streaming token-by-token detection.
• An investigation with Rhea Karty and Jacob Davis (ERA; LASR Labs) of whether steering vectors for traits like confidence and honesty are context-independent or persona-dependent, using LoRA adapters for character-trained models and tracking trait geometry across training checkpoints.
• Work with James Lucassen (Redwood Research) on deferral protocols for AI control — implementing defer-to-trusted in BashArena, developing usefulness monitors, and building methodology to evaluate them.
• Work with Shivam Raval and Luiza Corpaci (Harvard; AMD) on detecting unfaithful formal translations, using Lean-verified equational theories as ground truth and mech-interp methods to locate where translation failures occur.

Applications close on May 3rd. Students can find more information on our program's webpage.

Go Gamecocks!
Justin Dollman
Co-Director @ Cambridge AI Safety Hub

My Computer Science and Engineering Department

22

Learning and Exploiting Causal Structure for Robust and Transferable Configuration Optimization in Cyber-Physical Systems