We are excited to announce that Paul Röttger and Qixiang Fang have accepted our invitation to give a keynote talk at CPSS!
![]() |
The 5th Workshop on Computational Linguistics Co-located with KONVENS 2025 in Hildesheim, Germany – Sep 2025 |
We are excited to announce that Paul Röttger and Qixiang Fang have accepted our invitation to give a keynote talk at CPSS!
Large language models (LLMs) are helping millions of users to learn and write about a diversity of issues. In doing so, LLMs may expose users to new ideas and perspectives, or reinforce existing knowledge and user opinions. This creates concerns about political bias in LLMs, and how such bias might influence LLM users and society. In my talk, I will discuss why measuring political bias in LLMs is difficult, and why we should be skeptical about most evidence so far. Then, I will present our approach to building a more meaningful evaluation dataset called IssueBench, to measure biases in how LLMs write about political issues. I will describe the steps we took to make IssueBench realistic and robust. Then, I will outline our results from testing state-of-the-art LLMs with IssueBench, including clear evidence for issue bias, striking similarities in biases across models, and strong alignment with Democrat over Republican voter positions on a subset of issues.
Paul is a postdoctoral researcher in the MilaNLP Lab at Bocconi University, working on evaluating and improving the alignment and safety of large language models, as well as measuring their societal impacts. For his recent work in this area, he won Outstanding Paper at ACL and Best Paper at NeurIPS D&B. Before coming to Milan, Paul completed his PhD at the University of Oxford, where he worked on LLMs for hate speech detection. During his PhD, Paul also co-founded a start-up building AI for content moderation, which was acquired by another large online safety company in 2023.
The growing use of NLP in political and social science offers exciting opportunities, but also raises concerns about validity, reproducibility, and bias. In this talk, I discuss how psychometric principles can strengthen the evaluation of text-based measures, from construct validity in embeddings to benchmarking large language models with the PATCH framework. I also highlight challenges in reproducibility, showing how common flaws in human evaluation undermine trust in findings, and I reflect on the ethical risks of emerging applications such as personality inference. Building on these insights, I propose best practices to align NLP with social science standards. By integrating psychometric rigor with computational methods, we can make NLP a more reliable tool for understanding political and social phenomena.
Qixiang Fang is a postdoctoral researcher at Utrecht University and a senior member of the ODISSEI SoDa Team. He advises social science and humanities researchers in the Netherlands on integrating NLP and other computational methods into their work. He organizes a recurring workshop on using large language models for data collection and on addressing measurement error in LLM-generated labels for downstream modeling. He obtained his PhD at Utrecht University, where his research explored how measurement theory can strengthen the application of NLP in the social sciences and humanities.