2024-11-10 01:36:00 +00:00
<!DOCTYPE html>
< html lang = "en" dir = "ltr" >
< head > < script src = "/livereload.js?mindelay=10&v=2&port=1313&path=livereload" data-no-instant defer > < / script >
< meta charset = "UTF-8" >
< meta name = "viewport" content = "width=device-width, initial-scale=1.0" >
< meta name = "description" content = "Reinforcement LearningTheory and Implementation in a Custom Environment # you can find the thesis here and the code here
Abstract # Reinforcement Learning (RL) is a subcategory of Machine Learning that consis- tently surpasses human performance and demonstrates superhuman understand- ing in various environments and datasets. Its applications span from master- ing games like Go and Chess to optimizing real-world operations in robotics, fi- nance, and healthcare. The adaptability and efficiency of RL algorithms in dynamic and complex scenarios highlight their transformative potential across multiple do- mains.">
< meta name = "theme-color" media = "(prefers-color-scheme: light)" content = "#ffffff" >
< meta name = "theme-color" media = "(prefers-color-scheme: dark)" content = "#343a40" >
< meta name = "color-scheme" content = "light dark" > < meta property = "og:url" content = "http://localhost:1313/theses/master-thesis/" >
< meta property = "og:site_name" content = "aethrvmn" >
< meta property = "og:title" content = "masters thesis" >
< meta property = "og:description" content = "Reinforcement LearningTheory and Implementation in a Custom Environment # you can find the thesis here and the code here
Abstract # Reinforcement Learning (RL) is a subcategory of Machine Learning that consis- tently surpasses human performance and demonstrates superhuman understand- ing in various environments and datasets. Its applications span from master- ing games like Go and Chess to optimizing real-world operations in robotics, fi- nance, and healthcare. The adaptability and efficiency of RL algorithms in dynamic and complex scenarios highlight their transformative potential across multiple do- mains.">
< meta property = "og:locale" content = "en" >
< meta property = "og:type" content = "article" >
< meta property = "article:section" content = "theses" >
< meta property = "article:modified_time" content = "2024-11-10T01:34:22+01:00" >
< title > masters thesis | aethrvmn< / title >
< link rel = "manifest" href = "/manifest.json" >
< link rel = "icon" href = "/favicon.ico" >
< link rel = "canonical" href = "http://localhost:1313/theses/master-thesis/" >
< link href = "https://fonts.googleapis.com/css2?family=GFS+Didot&display=swap" rel = "stylesheet" type = "text/css" >
2024-11-10 13:58:15 +00:00
< link rel = "stylesheet" href = "/book.min.513c13c916552a34ea7ca41aa85ef450a4baa1f891fc5dae71c60e6026983163.css" integrity = "sha256-UTwTyRZVKjTqfKQaqF70UKS6ofiR/F2uccYOYCaYMWM=" crossorigin = "anonymous" >
2024-11-10 01:36:00 +00:00
< script defer src = "/sw.min.6f6f90fcb8eb1c49ec389838e6b801d0de19430b8e516902f8d75c3c8bd98739.js" integrity = "sha256-b2+Q/LjrHEnsOJg45rgB0N4ZQwuOUWkC+NdcPIvZhzk=" crossorigin = "anonymous" > < / script >
<!--
Made with Book Theme
https://github.com/alex-shpak/hugo-book
-->
< / head >
< body dir = "ltr" >
< input type = "checkbox" class = "hidden toggle" id = "menu-control" / >
< input type = "checkbox" class = "hidden toggle" id = "toc-control" / >
< main class = "container flex" >
< aside class = "book-menu" >
< div class = "book-menu-content" >
< nav >
< h2 class = "book-brand" >
< a class = "flex align-center" href = "/" > < img src = "/logo.png" alt = "Logo" / > < span > aethrvmn< / span >
< / a >
< / h2 >
< ul >
< li >
< a href = "/pdf/cv.pdf" target = "_blank" rel = "noopener" >
cv
< / a >
< / li >
< / ul >
< ul >
< li >
< a href = "/theses/" class = "" > theses< / a >
< ul >
< li >
< a href = "/theses/master-thesis/" class = "active" > masters thesis< / a >
< / li >
< li >
< a href = "/theses/bachelor-thesis/" class = "" > bachelor thesis< / a >
< / li >
< / ul >
< / li >
< li >
< a href = "/nimphs/" class = "" > nimphs< / a >
< ul >
< / ul >
< / li >
< li >
< a href = "/misc/" class = "" > misc< / a >
< ul >
< / ul >
< / li >
< li >
< a href = "/setup/" class = "" > setup< / a >
< ul >
< / ul >
< / li >
< li >
< a href = "/license/" class = "" > license< / a >
< ul >
< / ul >
< / li >
< / ul >
< div style = "bottom:0; position:fixed;" >
< h3 > < u > contact< / u > < / h3 >
< ul >
< li >
< a href = "mailto:aethrvmn@apotheke.earth" target = "_blank" rel = "noopener" >
mail
< / a >
< / li >
< li >
< a href = "https://t.me/aethrvmn" target = "_blank" rel = "noopener" >
t.me/aethrvmn
< / a >
< / li >
< li >
< a href = "https://sigmoid.social/@aethrvmn" target = "_blank" rel = "noopener" >
@aethrvmn@sigmoid.social
< / a >
< / li >
< / ul >
< / div >
< / nav >
< script > ( function ( ) { var e = document . querySelector ( "aside .book-menu-content" ) ; addEventListener ( "beforeunload" , function ( ) { localStorage . setItem ( "menu.scrollTop" , e . scrollTop ) } ) , e . scrollTop = localStorage . getItem ( "menu.scrollTop" ) } ) ( ) < / script >
< / div >
< / aside >
< div class = "book-page" >
< header class = "book-header" >
< div class = "flex align-center justify-between" >
< label for = "menu-control" >
< img src = "/svg/menu.svg" class = "book-icon" alt = "Menu" / >
< / label >
< label for = "toc-control" >
< / label >
< / div >
< / header >
< article class = "markdown book-article" > < h1 id = "reinforcement-learningbrtheory-and-implementation-in-a-custom-environment" >
Reinforcement Learning< br / > Theory and Implementation in a Custom Environment
< a class = "anchor" href = "#reinforcement-learningbrtheory-and-implementation-in-a-custom-environment" > #< / a >
< / h1 >
< hr >
< p > you can find the thesis
< a href = "/pdf/mthesis.pdf" target = "_blank" rel = "me" style = "color:#AC9C6D" > here< / a > and the code
< a href = "https://github.com/aethrvmn/GodotPneumaRL" target = "_blank" rel = "me" style = "color:#AC9C6D" > here< / a > < / p >
< h2 id = "abstract" >
Abstract
< a class = "anchor" href = "#abstract" > #< / a >
< / h2 >
< p > Reinforcement Learning (RL) is a subcategory of Machine Learning that consis-
tently surpasses human performance and demonstrates superhuman understand-
ing in various environments and datasets. Its applications span from master-
ing games like Go and Chess to optimizing real-world operations in robotics, fi-
nance, and healthcare. The adaptability and efficiency of RL algorithms in dynamic
and complex scenarios highlight their transformative potential across multiple do-
mains.< / p >
< p > In this thesis, we present some core concepts of Reinforcement Learning.< / p >
< p > First, we introduce the mathematical foundation of Reinforcement Learning
(RL) through Markov Decision Processes (MDPs), which provide a formal frame-
work for modeling decision-making problems where outcomes are partly random
and partly under the control of a decision-maker, involving state transitions influ-
enced by actions. Then, we give an overview of the two main branches of Rein-
forcement Learning: value-based methods, which focus on estimating the value of
states or state-action pairs, and policy-based methods, which directly optimize the
policy that dictates the agent’ s actions.< / p >
< p > We focus on Proximal Policy Optimization (PPO), which is the de facto baseline
algorithm in modern RL literature due to its robustness and ease of implementa-
tion, and discuss its potential advantages, such as improved sample efficiency and
stability, as well as its disadvantages, including sensitivity to hyper-parameters
and computational overhead. We emphasize the importance of fine-tuning PPO to
achieve optimal performance.< / p >
< p > We demonstrate the application of these concepts within Pneuma, a custom-
made environment specifically designed for this thesis. Pneuma aims to become
a research base for independent Multi-Agent Reinforcement Learning (MARL),
where multiple agents learn and interact within the same environment. We outline
the requirements for such environments to support MARL effectively and detail
the modifications we made to the baseline PPO method, as presented by OpenAI,
to facilitate agent convergence for a single-agent level.< / p >
< p > Finally, we discuss the potential for future enhancements to the Pneuma envi-
ronment to increase its complexity and realism, aiming to create a more RPG-like
setting, optimal for training agents in complex, multi-objective, and multi-step
tasks.< / p >
< / article >
< footer class = "book-footer" >
< div class = "flex flex-wrap justify-between" >
< div class = "info-container" >
< div class = "commit-info" >
< span > Page last edited on 10/11/2024< / span >
< br / >
< span >
title: moved theses to own page
< / span >
< br / >
< span >
2024-11-10 13:58:15 +00:00
commit: < a href = "https://git.apotheke.earth/aethrvmn/home/commit/b092deebed5fa48727eeb2fdaa819b3c575fdb53" target = "_blank" style = "color: #AC9C6D" > b092dee< / a >
2024-11-10 01:36:00 +00:00
< / span >
< br / >
< span >
author: aethrvmn
< / span >
< br / >
< span >
< aethrvmn@apotheke.earth>
< / div >
< / div >
< script > ( function ( ) { function e ( e ) { const t = window . getSelection ( ) , n = document . createRange ( ) ; n . selectNodeContents ( e ) , t . removeAllRanges ( ) , t . addRange ( n ) } document . querySelectorAll ( "pre code" ) . forEach ( t => { t . addEventListener ( "click" , function ( ) { if ( window . getSelection ( ) . toString ( ) ) return ; e ( t . parentElement ) , navigator . clipboard && navigator . clipboard . writeText ( t . parentElement . textContent ) } ) } ) } ) ( ) < / script >
< / footer >
< div class = "book-comments" >
< / div >
< label for = "menu-control" class = "hidden book-menu-overlay" > < / label >
< / div >
< / main >
< / body >
< / html >