382 lines
9.7 KiB
HTML
382 lines
9.7 KiB
HTML
<!DOCTYPE html>
|
||
<html lang="en" dir="ltr">
|
||
<head><script src="/livereload.js?mindelay=10&v=2&port=1313&path=livereload" data-no-instant defer></script>
|
||
<meta charset="UTF-8">
|
||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||
<meta name="description" content="Reinforcement LearningTheory and Implementation in a Custom Environment # you can find the thesis here and the code here
|
||
Abstract # Reinforcement Learning (RL) is a subcategory of Machine Learning that consis- tently surpasses human performance and demonstrates superhuman understand- ing in various environments and datasets. Its applications span from master- ing games like Go and Chess to optimizing real-world operations in robotics, fi- nance, and healthcare. The adaptability and efficiency of RL algorithms in dynamic and complex scenarios highlight their transformative potential across multiple do- mains.">
|
||
<meta name="theme-color" media="(prefers-color-scheme: light)" content="#ffffff">
|
||
<meta name="theme-color" media="(prefers-color-scheme: dark)" content="#343a40">
|
||
<meta name="color-scheme" content="light dark"><meta property="og:url" content="http://localhost:1313/theses/master-thesis/">
|
||
<meta property="og:site_name" content="aethrvmn">
|
||
<meta property="og:title" content="masters thesis">
|
||
<meta property="og:description" content="Reinforcement LearningTheory and Implementation in a Custom Environment # you can find the thesis here and the code here
|
||
Abstract # Reinforcement Learning (RL) is a subcategory of Machine Learning that consis- tently surpasses human performance and demonstrates superhuman understand- ing in various environments and datasets. Its applications span from master- ing games like Go and Chess to optimizing real-world operations in robotics, fi- nance, and healthcare. The adaptability and efficiency of RL algorithms in dynamic and complex scenarios highlight their transformative potential across multiple do- mains.">
|
||
<meta property="og:locale" content="en">
|
||
<meta property="og:type" content="article">
|
||
<meta property="article:section" content="theses">
|
||
<meta property="article:modified_time" content="2024-11-10T01:34:22+01:00">
|
||
<title>masters thesis | aethrvmn</title>
|
||
<link rel="manifest" href="/manifest.json">
|
||
<link rel="icon" href="/favicon.ico" >
|
||
<link rel="canonical" href="http://localhost:1313/theses/master-thesis/">
|
||
<link href="https://fonts.googleapis.com/css2?family=GFS+Didot&display=swap" rel="stylesheet" type="text/css">
|
||
|
||
|
||
<link rel="stylesheet" href="/book.min.513c13c916552a34ea7ca41aa85ef450a4baa1f891fc5dae71c60e6026983163.css" integrity="sha256-UTwTyRZVKjTqfKQaqF70UKS6ofiR/F2uccYOYCaYMWM=" crossorigin="anonymous">
|
||
<script defer src="/sw.min.6f6f90fcb8eb1c49ec389838e6b801d0de19430b8e516902f8d75c3c8bd98739.js" integrity="sha256-b2+Q/LjrHEnsOJg45rgB0N4ZQwuOUWkC+NdcPIvZhzk=" crossorigin="anonymous"></script>
|
||
|
||
|
||
|
||
<!--
|
||
Made with Book Theme
|
||
https://github.com/alex-shpak/hugo-book
|
||
-->
|
||
|
||
</head>
|
||
<body dir="ltr">
|
||
<input type="checkbox" class="hidden toggle" id="menu-control" />
|
||
<input type="checkbox" class="hidden toggle" id="toc-control" />
|
||
<main class="container flex">
|
||
<aside class="book-menu">
|
||
<div class="book-menu-content">
|
||
|
||
<nav>
|
||
<h2 class="book-brand">
|
||
<a class="flex align-center" href="/"><img src="/logo.png" alt="Logo" /><span>aethrvmn</span>
|
||
</a>
|
||
</h2>
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
<ul>
|
||
|
||
<li>
|
||
<a href="/pdf/cv.pdf" target="_blank" rel="noopener">
|
||
cv
|
||
</a>
|
||
</li>
|
||
|
||
</ul>
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
<ul>
|
||
|
||
|
||
<li>
|
||
|
||
|
||
|
||
|
||
|
||
<a href="/theses/" class="">theses</a>
|
||
|
||
|
||
|
||
<ul>
|
||
|
||
|
||
<li>
|
||
|
||
|
||
|
||
|
||
|
||
<a href="/theses/master-thesis/" class="active">masters thesis</a>
|
||
|
||
|
||
</li>
|
||
|
||
|
||
|
||
<li>
|
||
|
||
|
||
|
||
|
||
|
||
<a href="/theses/bachelor-thesis/" class="">bachelor thesis</a>
|
||
|
||
|
||
</li>
|
||
|
||
|
||
</ul>
|
||
|
||
</li>
|
||
|
||
|
||
|
||
<li>
|
||
|
||
|
||
|
||
|
||
|
||
<a href="/nimphs/" class="">nimphs</a>
|
||
|
||
|
||
|
||
<ul>
|
||
|
||
</ul>
|
||
|
||
</li>
|
||
|
||
|
||
|
||
<li>
|
||
|
||
|
||
|
||
|
||
|
||
<a href="/misc/" class="">misc</a>
|
||
|
||
|
||
|
||
<ul>
|
||
|
||
</ul>
|
||
|
||
</li>
|
||
|
||
|
||
|
||
<li>
|
||
|
||
|
||
|
||
|
||
|
||
<a href="/setup/" class="">setup</a>
|
||
|
||
|
||
|
||
<ul>
|
||
|
||
</ul>
|
||
|
||
</li>
|
||
|
||
|
||
|
||
<li>
|
||
|
||
|
||
|
||
|
||
|
||
<a href="/license/" class="">license</a>
|
||
|
||
|
||
|
||
<ul>
|
||
|
||
</ul>
|
||
|
||
</li>
|
||
|
||
|
||
</ul>
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
<div style="bottom:0; position:fixed;">
|
||
<h3><u>contact</u></h3>
|
||
|
||
|
||
|
||
|
||
<ul>
|
||
|
||
<li>
|
||
<a href="mailto:aethrvmn@apotheke.earth" target="_blank" rel="noopener">
|
||
mail
|
||
</a>
|
||
</li>
|
||
|
||
<li>
|
||
<a href="https://t.me/aethrvmn" target="_blank" rel="noopener">
|
||
t.me/aethrvmn
|
||
</a>
|
||
</li>
|
||
|
||
<li>
|
||
<a href="https://sigmoid.social/@aethrvmn" target="_blank" rel="noopener">
|
||
@aethrvmn@sigmoid.social
|
||
</a>
|
||
</li>
|
||
|
||
</ul>
|
||
|
||
|
||
|
||
|
||
|
||
</div>
|
||
</nav>
|
||
|
||
|
||
|
||
|
||
<script>(function(){var e=document.querySelector("aside .book-menu-content");addEventListener("beforeunload",function(){localStorage.setItem("menu.scrollTop",e.scrollTop)}),e.scrollTop=localStorage.getItem("menu.scrollTop")})()</script>
|
||
|
||
|
||
|
||
</div>
|
||
</aside>
|
||
|
||
<div class="book-page">
|
||
<header class="book-header">
|
||
|
||
<div class="flex align-center justify-between">
|
||
<label for="menu-control">
|
||
<img src="/svg/menu.svg" class="book-icon" alt="Menu" />
|
||
</label>
|
||
|
||
<label for="toc-control">
|
||
|
||
</label>
|
||
</div>
|
||
|
||
|
||
|
||
|
||
</header>
|
||
|
||
|
||
|
||
<article class="markdown book-article"><h1 id="reinforcement-learningbrtheory-and-implementation-in-a-custom-environment">
|
||
Reinforcement Learning<br/>Theory and Implementation in a Custom Environment
|
||
<a class="anchor" href="#reinforcement-learningbrtheory-and-implementation-in-a-custom-environment">#</a>
|
||
</h1>
|
||
<hr>
|
||
<p>you can find the thesis
|
||
<a href="/pdf/mthesis.pdf" target="_blank" rel="me" style="color:#AC9C6D">here</a> and the code
|
||
<a href="https://github.com/aethrvmn/GodotPneumaRL" target="_blank" rel="me" style="color:#AC9C6D">here</a></p>
|
||
<h2 id="abstract">
|
||
Abstract
|
||
<a class="anchor" href="#abstract">#</a>
|
||
</h2>
|
||
<p>Reinforcement Learning (RL) is a subcategory of Machine Learning that consis-
|
||
tently surpasses human performance and demonstrates superhuman understand-
|
||
ing in various environments and datasets. Its applications span from master-
|
||
ing games like Go and Chess to optimizing real-world operations in robotics, fi-
|
||
nance, and healthcare. The adaptability and efficiency of RL algorithms in dynamic
|
||
and complex scenarios highlight their transformative potential across multiple do-
|
||
mains.</p>
|
||
<p>In this thesis, we present some core concepts of Reinforcement Learning.</p>
|
||
<p>First, we introduce the mathematical foundation of Reinforcement Learning
|
||
(RL) through Markov Decision Processes (MDPs), which provide a formal frame-
|
||
work for modeling decision-making problems where outcomes are partly random
|
||
and partly under the control of a decision-maker, involving state transitions influ-
|
||
enced by actions. Then, we give an overview of the two main branches of Rein-
|
||
forcement Learning: value-based methods, which focus on estimating the value of
|
||
states or state-action pairs, and policy-based methods, which directly optimize the
|
||
policy that dictates the agent’s actions.</p>
|
||
<p>We focus on Proximal Policy Optimization (PPO), which is the de facto baseline
|
||
algorithm in modern RL literature due to its robustness and ease of implementa-
|
||
tion, and discuss its potential advantages, such as improved sample efficiency and
|
||
stability, as well as its disadvantages, including sensitivity to hyper-parameters
|
||
and computational overhead. We emphasize the importance of fine-tuning PPO to
|
||
achieve optimal performance.</p>
|
||
<p>We demonstrate the application of these concepts within Pneuma, a custom-
|
||
made environment specifically designed for this thesis. Pneuma aims to become
|
||
a research base for independent Multi-Agent Reinforcement Learning (MARL),
|
||
where multiple agents learn and interact within the same environment. We outline
|
||
the requirements for such environments to support MARL effectively and detail
|
||
the modifications we made to the baseline PPO method, as presented by OpenAI,
|
||
to facilitate agent convergence for a single-agent level.</p>
|
||
<p>Finally, we discuss the potential for future enhancements to the Pneuma envi-
|
||
ronment to increase its complexity and realism, aiming to create a more RPG-like
|
||
setting, optimal for training agents in complex, multi-objective, and multi-step
|
||
tasks.</p>
|
||
</article>
|
||
|
||
|
||
|
||
<footer class="book-footer">
|
||
|
||
<div class="flex flex-wrap justify-between">
|
||
|
||
<div class="info-container">
|
||
<div class="commit-info">
|
||
<span>Page last edited on 10/11/2024</span>
|
||
<br/>
|
||
<span>
|
||
title: moved theses to own page
|
||
</span>
|
||
<br/>
|
||
<span>
|
||
commit: <a href="https://git.apotheke.earth/aethrvmn/home/commit/b092deebed5fa48727eeb2fdaa819b3c575fdb53" target="_blank" style="color: #AC9C6D">b092dee</a>
|
||
</span>
|
||
<br/>
|
||
<span>
|
||
author: aethrvmn
|
||
</span>
|
||
<br/>
|
||
<span>
|
||
<aethrvmn@apotheke.earth>
|
||
</div>
|
||
</div>
|
||
|
||
|
||
|
||
<script>(function(){function e(e){const t=window.getSelection(),n=document.createRange();n.selectNodeContents(e),t.removeAllRanges(),t.addRange(n)}document.querySelectorAll("pre code").forEach(t=>{t.addEventListener("click",function(){if(window.getSelection().toString())return;e(t.parentElement),navigator.clipboard&&navigator.clipboard.writeText(t.parentElement.textContent)})})})()</script>
|
||
|
||
|
||
|
||
|
||
</footer>
|
||
|
||
|
||
|
||
<div class="book-comments">
|
||
|
||
</div>
|
||
|
||
|
||
|
||
<label for="menu-control" class="hidden book-menu-overlay"></label>
|
||
</div>
|
||
|
||
|
||
</main>
|
||
|
||
|
||
</body>
|
||
</html>
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|