Large Language Models and Education: Opportunities and Challenges

By Jeremy Roschelle

Jeremy Roschelle chats with Satabdi Basu and Nikhil Kandpal about what people need to know about large language models and the opportunities or challenges that they present for education.

Key Ideas:

People should have a basic understanding of how large language models (LLMs) work so that they have reasonable expectations for their capabilities and limitations.
AI offers many opportunities for education, including adjusting content to meet learners’ needs; providing greater representation and a richer set of historical characters; and enabling new approaches to assessment.
AI researchers, education researchers, and educators need to collaborate to identify and design AI solutions that will be most relevant and valuable for improving education.

Introductions

Satabdi Basu: I’m a computer scientist by training, and I work as a computer science education researcher at SRI International. I think about the challenges that teachers and students might have when it comes to teaching and learning computer science, computational thinking, and AI. I investigate how best to help educators teach those concepts and how best to help students learn those concepts.

Nikhil Kandpal: I’m a Ph.D. candidate at the University of North Carolina. My research focuses on different types of generative AI models and understanding their behavior through the lens of their training data. I think there is a lot of value in trying to understand these AI models, how they get trained, and why certain behaviors emerge while others do not. Models are simply representations of their training data and so pretty much everything about them can be explained by knowing more about the training data and process.

Jeremy Roschelle: What do we need people to understand about these large language models? How do we deepen the conversation?

SB: First, I’ll say that I think people do need to learn about the models. The model should not be a black box; we need to lift the hood to see what’s inside.

NK: I totally agree. One thing that I have found is that it is very difficult to explain what a language model is and how it was trained without anthropomorphizing the model. And unfortunately, anthropomorphizing the model is contrary to what I’m trying to achieve. We don’t want to turn the model into a human-like black box.

SB: You know, the initial reaction for everybody using ChatGPT is, “Oh, this is so cool. This is great. I’m going to use it for writing essays, lesson plans, manuals.” And there is a blind trust in the technology. People think that computers are always right. And so, it’s important to help [people] understand that the program does what you tell it to do. And then I also worry about the other side – there are people who don’t trust large-language models at all because they can give out wrong information. And they think, “I’m just going to stay away from it entirely.” If the model remains a black box, then you cannot understand its limitations nor really trust it.

NK: We’re trying to look inside. Anyone who is interacting with these models should understand, at least from a high-level perspective, how the model was trained and how it operates. For example, we have learned that language models are better at working with information (facts, numbers) that they have encountered many thousands of times within the training data; hence we can look at the training data to anticipate a model’s capabilities and limitations. Ultimately, it is important for people to understand that the model is a representation of some data set, and it has been trained to do well at a particular task related to that data set.

SB: It’s also important to understand how mistakes can arise. Why does the program give me incorrect answers? Sometimes it’s because of the algorithm or the training. But sometimes it’s because of the data – if there is divergent data from which you’re trying to predict or infer, that might be a reason why the model is giving you incorrect information. Tinkering with the prompts that you use to get the information you need is important; trying out different prompts and verifying the information you receive.

JR: What pedagogy is available to help us to explain large language models to people?

NK: I would love to explain to people what the model is, functionally, based on input-output examples. We could explore, “Why is the model able to do this task?”

SB: Yes, and I think some examples of that kind of pedagogy do exist. For example, in upper-elementary school classes, teachers use simple unplugged activities to help students learn what machine learning is and how it works. They might also explore different features of a model using technology. For example, they might explore how an algorithm may initially misclassify an image, but then with additional data the computer understands the image and classifies it properly. That kind of instruction is helpful in computer science and for understanding AI.

JR: What are some of the challenges and opportunities that you see around using AI as part of teaching and learning?
NK: One nice aspect of these models is that for things that the model does know, it’s able to modulate the level or depth at which it describes those topics or concepts. For

instance, if you tell the program that you’re a ten-year-old, you get a very different explanation than if you tell it to explain something in great technical depth.

JR: One of the big challenges that I see with using AI as a learning tutor is that these models are oriented to produce answers, not questions. But for tutoring purposes, you don’t want to give the student the answers, you want to ask really good questions and provide feedback that supports learning.

SB: Regarding tutoring, I do a lot of work with assessments. Right now, we assess the products that students create. But there are opportunities to do assessment in other ways. We could give students a product produced by ChatGPT and ask them to improve it. From an assessment standpoint, we have an opportunity to rethink how we are asking questions and what we are measuring and that’s exciting.

JR: How do you think about the role of AI in supporting equity in education?

For example, some teachers are also using the AI to bring a richer set of historical figures into the classroom. The textbook may just have a token example of a female scientist. But the teacher might query the program and say, “Who are three women scientists who worked with X-rays? I want these three women scientists to have a debate. And I want to use that at the beginning of my class.” They are using the technology to bring forth additional stories related to the subject matter they are teaching. In a related way, the Engage AI institute is creating customizable stories for game-like learning experiences, so teachers can customize to their students’ assets.

SB: Yeah, I was thinking about the culturally responsive aspect of teaching with AI. I think that is a great usage, so long as we are co-designing with people in that community to make sure the model is genuinely culturally sensitive and not just doing something superficial.

NK: Yes, another thing to consider is the values that are reflected within these models. We’ve gotten to the point where natural language is almost solved. Now the focus is on: What are the values being trained into these models? We need to consider equity and representation from the very beginning when we are thinking about who is producing the feedback for these models.

JR: Given where we are right now, what do you want to see happen? What do we need to be doing?

SB: Yes, right from the beginning, designers working on LLM applications should be working with education researchers and with educators, not as an afterthought or only for evaluation. Educational experts need to be involved in shaping the models.

NK: I agree that education researchers and AI researchers need to work together to hone in values in the education space and how to formulate educational tasks to be amenable to language model training. For example, people have found that an effective way to steer these language models towards the right tasks is to train them with human feedback. And so, it would be important to have education researchers guiding the models toward the salient tasks and desired behaviors. I think the kind of collaboration that you are talking about, Satabdi, would be super cool.

If you’re interested in hearing more conversation about LLMs and the future work of the EngageAI Institute, please sign up for the EngageAI mailing list to join this important conversation.