Journal of Educational Data Mining
Large language models (LLMs) are flexible, personalizable, and available, which makes their use within Intelligent Tutoring Systems (ITSs) appealing. However, their flexibility creates risks: inaccuracies, harmful content, and non-curricular material. Ethically deploying LLM-backed ITSs requires designing safeguards that ensure positive experiences for students. We describe the design of a conversational system integrated into an ITS that uses safety guardrails and retrieval-augmented generation to support middle-grade math learning. We evaluated this system using red-teaming, offline analyses, an in-classroom usability test, and a field deployment. We present empirical data from more than 8,000 student conversations designed to encourage a growth mindset, finding that the GPT-3.5 LLM rarely generates inappropriate messages and that retrieval-augmented generation improves response quality. The student interaction behaviors we observe provide implications for designers---to focus on student inputs as a content moderation problem---and implications for researchers---to focus on subtle forms of bad content and creating metrics and evaluation processes.Code and data are available at https://www.github.com/DigitalHarborFoundation/chatbot-safety and https://www.github.com/DigitalHarborFoundation/rag-for-math-qa.