November 2024 Newsletter

Dear reader

Thanks for joining us again as our organization evolves! This edition of our (somewhat) monthly newsletter will include a quick update on our team and our reflections from our reading group.

A Brief Update from our Organization

My co-founder, Ben, and I recently attended the Alignment Research Engineer Accelerator (ARENA) over September/October. This was an intense 5-week upskilling program at the London Initiative for Safe AI (LISA). I'm very grateful to have been able to attend this program, as it was an invaluable learning experience. It was such a pleasure to meet the super smart and talented participants who were also invited to the program. Every day we jointly experienced the strange euphoria of working as hard as we could with great confidence that this was the best thing we could be spending our time on.. This community aspect was vital, as even though the ARENA materials are available online, having people to chat with about one's work and to problem-solve with was crucial for a good learning experience.

The curriculum was excellent as I was able to dive into the most promising technical research agendas in AI Safety, as well as work on my programming fundamentals. To give you a taste of what we got up to, some highlights from each week were: building GPT-2 from scratch, understanding basic induction circuits in transformers, building LLM agents, and implementing a basic RLHF training regime. Lastly, in the project phase, I was also able to finally start doing the vision interpretability work I have been meaning to do for ages! I hope to update on the latter soon.

Aside from this, being exposed to the experts in this space enabled me to develop a better sense of the key bottlenecks in AI safety, and what stood out was the need for organizational capacity. There is plenty of low-hanging fruit in terms of ideas and research agendas, and there is lots of talent. What seems to be needed the most are structures to absorb capital and talent and make structured progress toward existing agendas. If you’re an entrepreneurial type, this space is begging for high-agency people to take action and bring ideas, funding, and people together. I think the bar for technical ability to do this kind of work is low, and there is a lot of room for people with organizational, entrepreneurial, and relational skills.

Reflections from our Reading Group - Can LLMs Reason?

On the technical side of things, we recently discussed GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models in our reading group. This paper has been blowing up, people have been citing it as evidence that language models cannot do formal reasoning. This paper provides some useful pointers for improving standard model evaluations, and shows that LLMs do struggle with some aspects of reasoning, but also some of its claims may have been overblown.

The researchers aim to demonstrate that models are unable to generalise by taking the Grade School Math 8K or GSM8K dataset, which consists of basic math word problems, and creating new synthetic versions of the questions to test the extent to which models may have just been trained on the dataset itself (data contamination). These synthetic versions are made by altering the numbers or scenarios while keeping the problems broadly the same.

They find that for many smaller models data contamination is evident in that they perform much worse on the new synthetic benchmark, but their ultimate conclusion is undermined by the fact that GPT4 does only 0.3% worse on the new benchmark and scores about 95% on the old and new benchmarks. This implies that the issue is more one of scale and not a fundamental inability to generalise in LLMs. However, the researchers then go on to use the templates they developed in GSM symbolic to create GSM no-op where they add in information or additional calculations that don’t impact the final result, and find that this catastrophically drops the performance of the models, in the case of the phi-3 models reducing performance by 65%, and even O1-preview suffering a 17.5% drop. This is an interesting result that provides evidence that there are some aspects of reasoning - like ignoring irrelevant information - that LLMs particularly struggle with.

So while GSM-Symbolic and NoOp provide some interesting insight into the nature of LLM reasoning, it would be too strong to say that they can’t reason. It seems like reasoning may improve with scale, and perhaps this applies to inhibiting irrelevant information too. We look forward to diving deeper into these debates in future reading groups!

What’s Coming up Next?

We are excited to be growing our community and our organization. Currently, we are looking for people to join our team in field-building roles. This means helping us run events and engage our community. If you’re interested in such a position, please email me at leo@aisafetyct.com.

We are also hosting a reading group and dinner this upcoming Wednesday the 13th of November. The dinner will be catered for and you can apply to attend here. Lastly, we have our retreat coming up in two weeks! We’re excited to say that we have collected some extremely talented folks to join us, both in terms of participants and speakers.

I look forward to updating you all with more news in the future!