Workshop 2: Using LLMs for content generation and evaluation in assessment development
Andrew Runge, Yena Park & Yigal Attali
Participants will gain an understanding of how LLMs work beyond the ChatGPT interface and engage in hands-on activities that involve tweaking and adding to Python code to interact with GPT programmatically to generate and evaluate test content, including passages and items (keys and distractors).
Intended learning outcomes
- The ability to adapt the code to generate expository and argumentative passages with desired attributes
- The ability to filter passages appropriately using self-drafted content review guidelines
- The ability to generate candidates for options to be used in multiple-choice questions
- Understanding of the limitations of using LLMs for content generation
- Understanding of different methods to evaluate generated key and distractor candidates
Content of the workshop
- Passage generation based on different criteria (e.g., genre, level of lexical/syntactic complexity)
- Passage evaluation using generic and custom filters
- Item generation of main point and inference questions
- Item evaluation based on NLP-driven metrics
Engagement methods
We will have a Jupyter notebook of multi-step prompting with GPT using Python where workshop participants can click on code blocks to execute the commands, make minimal changes to the existing code that follow the pre-established pattern requiring no prior coding experience, and see the outcomes of their changes.
Participant background
- Ability to recognize patterns
- Ability to follow existing patterns
Pre-workshop activities
None








Andrew Runge is an AI Research Engineer at Duolingo. He works in test development for the Duolingo English Test, where he specializes in automatic item generation. He holds a master’s in Language Technologies from Carnegie Mellon University.
Yena Park is an assessment scientist at Duolingo, working in test development for the Duolingo English Test. She holds a PhD in second language studies specializing in language assessment.
Yigal Attali is an assessment scientist at Duolingo, specializing in AIG and automatic scoring. He holds a PhD in Cognitive Psychology from the Hebrew University of Jerusalem.