Corpus
A large collection of text used to train language-based AI models.
Help me explain to...
K–5th
A corpus is like a giant bookshelf filled with stories and books. Computers read from it to learn how people talk, write, and ask questions.
6–8th
Think of a corpus as a huge library that AI reads to learn language. It might include books, websites, and even schoolwork, helping AI understand words, grammar, and ideas.
9–12th
A corpus is a massive dataset of written or spoken language that is used to train natural language processing models. The size and diversity of a corpus directly affect the accuracy and fairness of AI language tools.
Expeditions
K–5th
Create a classroom “mini-corpus” by collecting student-written sentences about a topic. Use them to see how often certain words appear.
6–8th
Ask students to gather text samples from different genres (news, fiction, instructions). Discuss how each one teaches something different to AI.
9–12th
Have students explore the impact of biased or limited corpora by analyzing sample text datasets. Ask how the composition of a corpus affects AI outcomes.
Share Term
