Workshop 2: Using LLMs for content generation and evaluation in assessment development

Andrew Runge, Yena Park & Yigal Attali

Participants will gain an understanding of how LLMs work beyond the ChatGPT interface and engage in hands-on activities that involve tweaking and adding to Python code to interact with GPT programmatically to generate and evaluate test content, including passages and items (keys and distractors).

Intended learning outcomes

The ability to adapt the code to generate expository and argumentative passages with desired attributes
The ability to filter passages appropriately using self-drafted content review guidelines
The ability to generate candidates for options to be used in multiple-choice questions
Understanding of the limitations of using LLMs for content generation
Understanding of different methods to evaluate generated key and distractor candidates

Content of the workshop

Passage generation based on different criteria (e.g., genre, level of lexical/syntactic complexity)
Passage evaluation using generic and custom filters
Item generation of main point and inference questions
Item evaluation based on NLP-driven metrics

Engagement methods

We will have a Jupyter notebook of multi-step prompting with GPT using Python where workshop participants can click on code blocks to execute the commands, make minimal changes to the existing code that follow the pre-established pattern requiring no prior coding experience, and see the outcomes of their changes.

Participant background

Ability to recognize patterns
Ability to follow existing patterns

Pre-workshop activities

None