removed theses and nimphs

This commit is contained in:
aethrvmn 2024-11-10 01:36:11 +01:00
parent 8f1c0d2e63
commit ff123fd07e
5 changed files with 9 additions and 514 deletions

View file

@ -1,22 +1,14 @@
--- ---
title: "misc" title: "misc projects"
weight: 30
--- ---
# misc projects
---
this is the place for my miscalleneous projects. here you'll find a collection of different things that piqued my interest.
this is the place for my miscalleneous projects. Here you'll find a collection of my work: from research and theses to explorations in mathematics, physics, and computer science. - [super-mario-ddqn](https://github.com/aethrvmn/supermarioddqn): implementing an agent to autonomously play super mario bros.
- [ising-model-tax-evasion](https://github.com/aethrvmn/Ising-Tax-Evasion): simulating the ising model from physics as a means of estimating the probability of tax evasion in a given population.
### contents - [human-diffusion](https://github.com/aethrvmn/human-diffusion): a simulation of the Out of Africa (OOA) event in prehistory.
- [black-scholes](https://github.com/aethrvmn/black-scholes-model): exploring the black-scholes equation.
#### theses
- [master-thesis](/pdf/mthesis.pdf): reinforcement learning theory and implementation in a custom environment.
- [bachelor-thesis](/pdf/bthesis.pdf): random af spin-1/2 heisenberg model and the sdrg method.
#### personal projects
- <a>nyrids</a> is a collection of closed source nlp models I am working on, either for fun (nimertes), research (melite), or work (panope).
- <a>melite</a> : solo research project into neurosymbolic ai using transformers and reinforcement learning methods.
- <a>nimertes</a> : making a transformer from scratch in pytorch and then nim, a system's language.
- <a>panope</a> : a foundational gpt based on the [nanogpt](https://github.com/aethrvmn/nanoGPT) architecture
#### university projects
- [super mario network](https://github.com/aethrvmn/supermarioddqn): implementing an agent to autonomously play super mario bros.
- [human-diffusion](human-diffusion): a simulation of the Out of Africa (OOA) event in prehistory.
- [black-scholes](black-scholes): exploring the black-scholes equation.
feel free to explore and delve into the details. feel free to explore and delve into the details.

View file

@ -1,17 +0,0 @@
---
type: "page"
showTableOfContents: true
---
# The OneDimensional Heisenberg Model, RG Methods and Numerical Simulation of the SDRG Process
You can find the thesis [here](/pdfs/bthesis.pdf) and the code [here](https://github.com/aethrvmn/1d-RandAFHeisenberg-SDRG)
## Abstract
The Strong Disorder Renormalisation Group (SDRG) method, first introduced by Dasgupta, Ma and Hu, and later greatly expanded by Fisher, yields asymptotically exact results in distributions where the disorder grows without limit in large scales, whilst Fisher also calculated limit values as well as scaling factors for random spin chains.
These results where the first of many, yielded through the intense research that followed afterwards, firstly in random quantum systems, and later expanded in classically disordered systems as well. The previous Real Space RG methods that were used treated the whole space as homogenous, allowing the grouping of spins into super-spins, and although in systems absent of randomness this homogenity is physically verifiable, it comes into question in the presence of disordered systems. The SDRG method has the property of renormalising space in a non-homogenous way so it can better handle local disorders.
More specifically, the XX chain, presented by Fisher, can be used to obtain exact results for the behaviour of phases dominated by randomness, as well as the critical behaviour near the various zero temperature phase transitions that occur. Studying the properties of antiferromagnetic Heisenberg spin-1/2 chains with random bonds, we analyse the low-energy behaviour, by decimating the strongest bond, replacing it with a new effective bond between the nearest neighbours. Repeating the procedure, the distribution becomes extremely broad improving the accuracy of the approximation.
The structure of the thesis is this. First we introduce the Heisenberg model, it's relation to the Ising and Free Fermion models, solve it exactly for the ferromagnetic case using the Bethe Ansatz and introduce the Block RG method for the antiferromagnetic case. Afterwards we present the Strong Disorder RG method, using a modernised version of Fisher's process to solve the random AF XX chain. Finally, we present the methods we created to simulate the process.

View file

@ -1,216 +0,0 @@
---
title: Black-Scholes Model
---
You can find the repository [here](https://github.com/aethrvmn/Black-Scholes-Model)
To start off, we first install all the required modules, (We have the cell commented, but in the case that one or more modules aren't installed, uncomment and run once the cell below.)
```python
# pip install -r "requirements.txt"
```
where the module scipy.stats is not imported here but is needed in the class that we import.
```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from BlackScholes import Model
```
### Section 1
#### Testing
To check whether the class functions work, as well as to explain the process, we can create a Brownian motion to simulate a possilbe stock price over four years (1460 days) and calculate all the needed variables.
```python
def brownian(time_range, mean=0, sd=1):
time = np.linspace(0,1,time_range)
path = np.zeros(time_range)
for i in np.arange(1, time_range):
path[i] = path[i-1]+np.random.normal(mean*time[i], sd*time[i])
return time, path
```
where time has a range from 0 to 1, because we can always scale the dates of the hypothetical stock to fit those values with an appropriate transformation. Below we see the graph of the motion
```python
time_range = 1460 #in days
walk = brownian(time_range)
time = walk[0]
stock_price = walk[1] + np.abs(min(walk[1])) +1
plt.plot(time, stock_price)
plt.show()
```
Next, we define the exercise price, where here we choose a "smooth" deviation from the original stock price, essentially taking a stable function and adding noise, defined by a uniform distribution over an arbitrary, probably non-realistic range. Likewise, the interest rate should float between 0% and 1%, and finally the range of the time to expiration was made 1, 2, 3, 4, and 5 months.
```python
variance=np.random.uniform(0.005, 0.01)
exercise_price = stock_price + 0.1*stock_price
interest = np.random.uniform(0.005,0.01, time_range)
expiration_date = 30 #days
volatility = 0.23
```
Finally, we can initialize the Model class, using the above as inputs and executing the call and put functions.
```python
option = Model(stock_price, exercise_price, interest, expiration_date, volatility)
option.call()
option.put()
```
```python
plt.plot(stock_price)
plt.plot(option.callval)
plt.plot(option.putval)
plt.legend(['Stock Price', ' Call Price', 'Put Price'])
plt.savefig('example.png')
plt.show()
```
Since we see that the behaviour is as expected and since we want to reuse the names of variables, we clear everything and move to Section 2. (We could just skip this step since the variables would simply get replaced, but it is easier during working to just clear everything. There were many instances where we unknowingly used variables that we didn't replace and were puzzled at the plots no changing etc.)
```python
del option, stock_price, exercise_price, interest, variance, volatility, walk, time_range, time, expiration_date
```
### Section 2
#### Application using real-world data
Having checked that the idea works, we begin by opening the .xlsx file we wish to use and defining our parameters before initializing our class.
```python
df = pd.read_excel('ZNGA.xlsx', usecols=['Date', 'High (in $)', 'Low (in $)', 'Close/Last', 'Difference of High and Low'])
high = df['High (in $)'].array
low = df['Low (in $)'].array
close = df['Close/Last'].array
date = df.Date.array
interest = np.random.uniform(0.005,0.01, len(close))
volatility = (np.average(high - low, weights = close))
expiration_date = 60
```
Because we want to check what happens for an exercise price both higher and lower than the original stock price (say ±10%), we create a vector containing those two values and we initialize both instances with a simple forloop
```python
exercise_price = [close-(0.1*close), close+(0.1*close)]
option = []
for change in exercise_price:
option.append(Model(close, change, interest, expiration_date, volatility))
```
Now we call the call() and put() functions for each of the instances to generate the data, which we have plotted below.
```python
for model in option:
model.call()
model.put()
plt.plot(date, close)
plt.plot(date, model.callval)
plt.plot(date, model.putval)
plt.legend(['Stock Price', ' Call Price', 'Put Price'])
plt.xlabel('time')
plt.ylabel('price')
if model.ex_p[0] < model.stock_p[0]:
plt.title('Behaviour with a lower exercise price than stock price')
plt.savefig('lower.png')
else:
plt.title('Behaviour with higher exercise price than stock price')
plt.savefig('higher.png')
plt.show()
```
Finally, we use pandas to export all of the information we have gathered and created into an excel file. The model genrates the data frame of values of change in option prices which can be converted into an xlsx file which can then be
passed onto/be communicated to a non python user for further interpretation and decision making.
```python
data = []
for i in np.arange(len(exercise_price)):
dict = {'Date': date,
'Stock Price': option[i].stock_p,
'Exercise Price': option[i].ex_p,
'Interest': option[i].int_rate,
'Time to Expiration': option[i].time_to_exp,
'Volatility': option[i].vol,
'Call Price': option[i].callval,
'Put Price': option[i].putval}
data.append(pd.DataFrame(dict, index = None))
with pd.ExcelWriter('assignment.xlsx') as writer:
data[0].to_excel(writer, sheet_name='Sheet 1', header = True, index = False)
data[1].to_excel(writer, sheet_name='Sheet 2', header = True, index = False)
```
```python
time_range = 1460 #in days
walk = brownian(time_range)
time = walk[0]
stock_price = walk[1] + np.abs(min(walk[1])) +1
plt.plot(time, stock_price)
```
Next, we define the exercise price, where here we choose a "smooth" deviation from the original stock price, essentially taking a stable function and adding noise, defined by a uniform distribution over an arbitrary, probably non-realistic range. Likewise, the interest rate should float between 0% and 1%, and finally the range of the time to expiration was made 1, 2, 3, 4, and 5 months.
```python
variance=np.random.uniform(0.005, 0.01)
exercise_price = stock_price + 0.1*stock_price
interest = np.random.uniform(0.005,0.01, time_range)
expiration_date = 30 #days
volatility = 0.23
```
Finally, we can initialize the Model class, using the above as inputs and executing the call and put functions.
```python
option = Model(stock_price, exercise_price, interest, expiration_date, volatility)
option.call()
option.put()
```
```python
plt.plot(stock_price)
plt.plot(option.callval)
plt.plot(option.putval)
plt.legend(['Stock Price', ' Call Price', 'Put Price'])
plt.savefig('example.png')
plt.show()
```
Since we see that the behaviour is as expected and since we want to reuse the names of variables, we clear everything and move to Section 2. (We could just skip this step since the variables would simply get replaced, but it is easier during working to just clear everything. There were many instances where we unknowingly used variables that we didn't replace and were puzzled at the plots no changing etc.)
```python
del option, stock_price, exercise_price, interest, variance, volatility, walk, time_range, time, expiration_date
```

View file

@ -1,243 +0,0 @@
---
title: Human Diffusion
---
You can find the repository [here](https://github.com/aethrvmn/Human-Diffusion)
## A Q-Learning Process About The Human Migration From Africa
We start by importing the proper modules.
These are
- NumPy
- MatPlotLib
- Pandas
- PIL, (Pillow) an image handler
- tqdm, (pronounced ta-qa-dum) from Arabic (taqadum, تقدّم) meaning *progress*, is a simple progress bar to be able to estimate the time for each task
```python
#pip install -r requirements.txt
```
```python
from earth import Earth
```
### Generating the Map
We initialise the picture that we want to use, and convert it into pixel values, so we can have a pure black and white image of the earth to use.
```python
stage = Earth()
```
The following forloop checks each individual pixel and the converts it to black or white. The threshold was found through running the loop many times and picking a number that looked good enough.
```python
stage.black_and_white('earth.jpg', 'newPixels.csv', 'pure-bw-earth.jpg')
```
We then generate the new picture and save it before we convert it into an array.
```python
stage.generate_image('pure-bw-earth.jpg')
```
We are now ready to create the map we will need.
```python
stage.plot('map.jpg')
```
Now that we have our map ready, we can move on to the implementation of the algorithm.
### Application of the Q-Learning Algorithm
We import the necessary libraries
```python
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm
np.random.seed(1)
```
and define the actions that the agent is able to perform
```python
actions = ['west', 'east', 'north', 'south']
#coded to 0, 1, 2, 3
```
Then we can generate the Q-map, which gives the rewards.
```python
q_values = np.random.uniform(-1, 1, size=(stage.height,stage.width, len(actions)))
```
After, we define the functions that we will use, namely tone to generate our starting position, one for the agent to take action either randomly or by checking the Q-table, and one to define the result of the action taken.
```python
def starting_area(column, row):
col = np.random.randint(column[0], column[1])
row = np.random.randint(row[0], row[1])
return col, row
def next_action(current_height, current_width, epsilon):
if np.random.random() < epsilon:
move = np.argmax(q_values[current_height, current_width])
else:
move = np.random.randint(4)
return move
def next_location(height, width, action):
new_width = width
new_height = height
if actions[action] == 'west' and width > -1:
new_width = width - 1
if actions[action] == 'east' and width < stage.width - 1:
new_width = width + 1
if actions[action] == 'north' and height > 1:
new_height = height -1
if actions[action] == 'south' and height < stage.height:
new_height = height +1
return new_height, new_width
```
Now we are ready to run the algorithm for the number of episodes we need
```python
reward_map = np.zeros(shape=(stage.height,stage.width))
reward_map[np.where(stage.map > 0)] = -10
reward_map[:10, :] = -15
reward_map[610:, :] = -15
reward_map[:, 720:] = -15
reward_map[:, :10] = -15
#Arabian bridge
# Gulf of Aden
reward_map[350:388, 250:282] = 0
# Hormuz
reward_map[300:340, 290:315] = 0
# Indonesian bridge
# Sumatra
reward_map[417:433, 485:495] = 0
# Java
reward_map[450:455, 495:505] = 0
# Brunei
reward_map[430:465, 525:530] = 0
# New Guinea
reward_map[460:465, 525:645] = 0
# Australia
reward_map[460:505, 525:605] = 0
# Bering Straight
reward_map[30:60, 580:610] = 50
# Australia
reward_map[510:540, 580:610] = 50
real_map = np.ones(shape=(stage.height,stage.width))*10
real_map[np.where(stage.map > 0)] = -10
timeline = np.arange(0, 5000)
episodes = np.arange(0, 200000)
reward_per_episode = np.zeros(len(episodes))
lifetime = np.zeros(len(episodes))
ims = []
for episode in tqdm(episodes): #30k
epsilon = 0.7
discount_factor = 0.3
learning_rate = 1
rewards = np.zeros(len(timeline))
if episode >= 195000: # This statement is the way we destabilise the system to get more natural motion
# India
reward_map[390, 388] = 20
# New Guinea Papua
reward_map[455, 650] = 20
# Brunei
reward_map[425, 540] = 20
#Australia
reward_map[510:540, 580:610] = 50
old_height, old_width = 400, 230
height, width = starting_area([old_height-5, old_height+5], [old_width-5, old_width+5])
for year in timeline:
try:
action = next_action(height, width, epsilon)
old_height, old_width = height, width
height, width = next_location(height, width, action)
reward = reward_map[height, width]
rewards[year] = reward
old_q_value = q_values[old_height, old_width, action]
temporal_difference = reward + (discount_factor*np.max(q_values[height, width])) - old_q_value
new_q_value = old_q_value + (learning_rate * temporal_difference)
q_values[old_height, old_width, action] = new_q_value
if reward_map[old_height, old_width] > 0:
reward_map[old_height, old_width] = 0
real_map[old_height, old_width] = 5
except IndexError as e:
break
if year == timeline[-1]:
lifetime[episode] = year
if reward_map[old_height, old_width] <= -10 and reward_map[height, width] <= -10:
lifetime[episode] = year
break
reward_per_episode[episode] = np.mean(rewards)
if reward_map[510:540, 580:610].all() == 0:
#Australia
reward_map[510:540, 580:610] = 50
if reward_map[30:60, 580:610].all() == 0:
# Bering Straight
reward_map[30:60, 580:610] = 50
plt.figure(figsize = (10,10))
plt.ylabel('Latitude')
plt.xlabel('Logntitude')
plt.xticks([])
plt.yticks([])
plt.imshow(real_map, cmap = 'ocean_r')
plt.show()
```

View file

@ -1,21 +0,0 @@
---
type: "page"
showTableOfContents: true
---
# Reinforcement Learning: Theory and Implementation in a Custom Environment
You can find the thesis [here](/pdfs/mthesis.pdf) and the code [here](https://github.com/aethrvmn/GodotPneumaRL)
## Abstract
Reinforcement Learning (RL) is a subcategory of Machine Learning that consistently surpasses human performance and demonstrates superhuman understanding in various environments and datasets. Its applications span from mastering games like Go and Chess to optimizing real-world operations in robotics, finance, and healthcare. The adaptability and efficiency of RL algorithms in dynamic and complex scenarios highlight their transformative potential across multiple domains.
In this thesis, we present some core concepts of Reinforcement Learning.
First, we introduce the mathematical foundation of Reinforcement Learning (RL) through the Multi-Armed Bandit (MAB) problem, which serves as a simplified model for decision-making problems without state transitions, focusing solely on the trade-off between exploration and exploitation. We then extend the discussion to the more complex Markov Decision Processes (MDPs), which provide a formal framework for modeling decision-making problems where outcomes are partly random and partly under the control of a decision-maker, involving state transitions influenced by actions. Finally, we give an overview of the two main branches of Reinforcement Learning: value-based methods, which focus on estimating the value of states or state-action pairs, and policy-based methods, which directly optimize the policy that dictates the agent's actions.
We focus on Proximal Policy Optimization (PPO), which is the *de facto* baseline algorithm in modern RL literature due to its robustness and ease of implementation. We discuss its potential advantages, such as improved sample efficiency and stability, as well as its disadvantages, including sensitivity to hyper-parameters and computational overhead. We emphasize the importance of fine-tuning PPO to achieve optimal performance.
We demonstrate the application of these concepts within *Pneuma*, a custom-made environment specifically designed for this thesis. *Pneuma* aims to become a research base for independent Multi-Agent Reinforcement Learning (MARL), where multiple agents learn and interact within the same environment. We outline the requirements for such environments to support MARL effectively and detail the modifications we made to the baseline PPO method, as presented by OpenAI, to facilitate agent convergence for a single-agent level.
Finally, we discuss the potential for future enhancements to the *Pneuma* environment to increase its complexity and realism, aiming to create a more RPG-like setting, optimal for training agents in complex, multi-objective, and multi-step tasks.