🤔 FindTheFlaws

The FindTheFlaws datasets are designed to advance research on scalable oversight by offering (mostly) challenging questions paired with flawed answers across multiple domains, including STEM, law, medicine, and constructed languages. The flaws in each dataset have largely been explicitly designed to be difficult to detect, although their difficulty varies substantially by dataset and item. By providing annotated flaws in long-form responses, we hope to support scalable oversight research that aims to develop protocols which enable 'weak' AI models (which do not necessarily have the capability to generate correct answers to highly challenging questions) to effectively identify and analyze subtle errors in reasoning and explanations such as might be generated by stronger models. Some of our datasets may also support research in process-oriented learning, although not all datasets within our benchmark are appropriate for this purpose.

The datasets can be found at this repository, and the paper is available here.

Submit your e-mail here to be notified when Modulo Research releases new research.

BibTex: @article{recchia2025findtheflaws, title={FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research}, author={Recchia, Gabriel and Mangat, Chatrik Singh and Li, Issac and Krishnakumar, Gayatri}, journal={arXiv preprint arXiv:2503.22989}, year={2025}, url={https://arxiv.org/abs/2503.22989} }