AI has seen exciting advances recently. If progress continues at a rapid rate, AI is positioned to be an impactful technology in the coming decades, potentially even giving rise to systems that can outperform humans in most or all intellectual domains (though one cannot make predictions about AI progress with confidence). Such advances could greatly improve the world, resolving long-standing scientific challenges and bringing about unprecedented economic prosperity; however, they also pose risks. For example, advanced AI could be weaponized, concentrate power, cause overdependence, or optimize for objectives that don’t align with human wellbeing.
We think that technical ML safety research can reduce these hazards, especially those that could arise from the unintended behavior of AI systems. For example, future systems trained to optimize for open-ended objectives may learn to seek power, potentially causing them to get out of control and create large amounts of damage — or in the most extreme case, disempower humanity completely. An example of a current research direction that may reduce this risk is to train text agents to be aware of and averse to power. Though these risks are speculative, we think caution is warranted, and it is worth investing in research that could help to mitigate them.
Since this competition is motivated by high-consequence risks from advanced AI, we’ve listed background resources on them below. Note that discussions about AI risk in the ML community are nascent, so these resources should not be taken to be definitive. Researchers have widely differing views about what issues should be prioritized and how severe they are; however, these resources roughly represent the concerns that are most salient to the judges and are helpful for understanding how submissions will be evaluated.
- Short talk describing x-risk from power-seeking AI
- Full report examining x-risk from power-seeking AI
- An explanation of how a dozen empirical ML research areas relate to AI risk
The SafeBench Competition
SafeBench is a project of Center for AI Safety, a field-building and research non-profit that focuses on reducing high-consequence risks from AI systems. To view more competitions like this one, visit safe.ai/competitions. For ML Safety resources and news, visit mlsafety.org.
For questions, concerns, and feedback, please contact email@example.com.