Fixed wording

This commit is contained in:
aethrvmn 2024-10-04 14:40:50 +02:00
parent c9a82e3c27
commit c6ec8aca36

View file

@ -1,7 +1,6 @@
--- ---
title: Lyceum title: Lyceum
type: docs type: docs
bookToC: false
--- ---
# Welcome to Lyceum # Welcome to Lyceum
@ -10,8 +9,6 @@ bookToC: false
Lyceum is a Reinforcement Learning (RL) playground designed for natural language processing (NLP) experimentation. Just as Gymnasium provides environments for RL agents to train, Lyceum is the place where RL agents learn. Imagine a school where RL agents enroll in classes to master subjects like language, math, even philosophy, and then unwind at after-school clubs like chess, where they can test their decision-making skills. Lyceum is a Reinforcement Learning (RL) playground designed for natural language processing (NLP) experimentation. Just as Gymnasium provides environments for RL agents to train, Lyceum is the place where RL agents learn. Imagine a school where RL agents enroll in classes to master subjects like language, math, even philosophy, and then unwind at after-school clubs like chess, where they can test their decision-making skills.
Welcome to Lyceum!
## Why Lyceum? ## Why Lyceum?
*** ***
Modern NLP solutions like GPTs and BERTs have made great strides in language processing and generations, however they come with serious limitations. Even though a LLM can describe or make a game of chess, and even justify moved made, it is unable to *play* it. Why? Because there's no underlying mechanism for decision-making or reward-incentives during training. Transformers rely on static token distributions without real-time feedback, limiting their capacity to *actively* learn. Modern NLP solutions like GPTs and BERTs have made great strides in language processing and generations, however they come with serious limitations. Even though a LLM can describe or make a game of chess, and even justify moved made, it is unable to *play* it. Why? Because there's no underlying mechanism for decision-making or reward-incentives during training. Transformers rely on static token distributions without real-time feedback, limiting their capacity to *actively* learn.