removed theses and nimphs

2024-11-10 01:36:11 +01:00 · 2024-11-10 01:36:11 +01:00 · ff123fd07e
commit ff123fd07e
parent 8f1c0d2e63
5 changed files with 9 additions and 514 deletions
--- a/content/misc/_index.md
+++ b/content/misc/_index.md
@ -1,22 +1,14 @@
 ---
-title: "misc"
+title: "misc projects"
 weight: 30
 ---
 # misc projects
 ---
 this is the place for my miscalleneous projects. here you'll find a collection of different things that piqued my interest.
-this is the place for my miscalleneous projects. Here you'll find a collection of my work: from research and theses to explorations in mathematics, physics, and computer science.
+- [super-mario-ddqn](https://github.com/aethrvmn/supermarioddqn): implementing an agent to autonomously play super mario bros.
-
+- [ising-model-tax-evasion](https://github.com/aethrvmn/Ising-Tax-Evasion): simulating the ising model from physics as a means of estimating the probability of tax evasion in a given population.
-### contents 
+- [human-diffusion](https://github.com/aethrvmn/human-diffusion): a simulation of the Out of Africa (OOA) event in prehistory.
-
+- [black-scholes](https://github.com/aethrvmn/black-scholes-model): exploring the black-scholes equation.
 #### theses
 - [master-thesis](/pdf/mthesis.pdf): reinforcement learning theory and implementation in a custom environment.
 - [bachelor-thesis](/pdf/bthesis.pdf): random af spin-1/2 heisenberg model and the sdrg method.
 #### personal projects
 - <a>nyrids</a> is a collection of closed source nlp models I am working on, either for fun (nimertes), research (melite), or work (panope).
  - <a>melite</a> : solo research project into neurosymbolic ai using transformers and reinforcement learning methods.
  - <a>nimertes</a> : making a transformer from scratch in pytorch and then nim, a system's language.
  - <a>panope</a> : a foundational gpt based on the [nanogpt](https://github.com/aethrvmn/nanoGPT) architecture
 #### university projects
 - [super mario network](https://github.com/aethrvmn/supermarioddqn): implementing an agent to autonomously play super mario bros.
 - [human-diffusion](human-diffusion): a simulation of the Out of Africa (OOA) event in prehistory.
 - [black-scholes](black-scholes): exploring the black-scholes equation.
 feel free to explore and delve into the details.
--- a/content/misc/bachelor-thesis.md
+++ b/content/misc/bachelor-thesis.md
@ -1,17 +0,0 @@
 ---
 type: "page"
 showTableOfContents: true
 ---
 # The One–Dimensional Heisenberg Model, RG Methods and Numerical Simulation of the SDRG Process
 You can find the thesis [here](/pdfs/bthesis.pdf) and the code [here](https://github.com/aethrvmn/1d-RandAFHeisenberg-SDRG)
 ## Abstract
 The Strong Disorder Renormalisation Group (SDRG) method, first introduced by Dasgupta, Ma and Hu, and later greatly expanded by Fisher, yields asymptotically exact results in distributions where the disorder grows without limit in large scales, whilst Fisher also calculated limit values as well as scaling factors for random spin chains.
 These results where the first of many, yielded through the intense research that followed afterwards, firstly in random quantum systems, and later expanded in classically disordered systems as well. The previous Real Space RG methods that were used treated the whole space as homogenous, allowing the grouping of spins into super-spins, and although in systems absent of randomness this homogenity is physically verifiable, it comes into question in the presence of disordered systems. The SDRG method has the property of renormalising space in a non-homogenous way so it can better handle local disorders.
 More specifically, the XX chain, presented by Fisher, can be used to obtain exact results for the behaviour of phases dominated by randomness, as well as the critical behaviour near the various zero temperature phase transitions that occur. Studying the properties of antiferromagnetic Heisenberg spin-1/2 chains with random bonds, we analyse the low-energy behaviour, by decimating the strongest bond, replacing it with a new effective bond between the nearest neighbours. Repeating the procedure, the distribution becomes extremely broad improving the accuracy of the approximation. 
 The structure of the thesis is this. First we introduce the Heisenberg model, it's relation to the Ising and Free Fermion models, solve it exactly for the ferromagnetic case using the Bethe Ansatz and introduce the Block RG method for the antiferromagnetic case. Afterwards we present the Strong Disorder RG method, using a modernised version of Fisher's process to solve the random AF XX chain. Finally, we present the methods we created to simulate the process.
--- a/content/misc/black-scholes.md
+++ b/content/misc/black-scholes.md
@ -1,216 +0,0 @@
 ---
 title: Black-Scholes Model
 ---
 You can find the repository [here](https://github.com/aethrvmn/Black-Scholes-Model)
 To start off, we first install all the required modules, (We have the cell commented, but in the case that one or more modules aren't installed, uncomment and run once the cell below.)
 ```python
 # pip install -r "requirements.txt"
 ```
 where the module scipy.stats is not imported here but is needed in the class that we import.
 ```python
 import numpy as np
 import pandas as pd
 import matplotlib.pyplot as plt
 from BlackScholes import Model
 ```
 ### Section 1
 #### Testing
 To check whether the class functions work, as well as to explain the process, we can create a Brownian motion to simulate a possilbe stock price over four years (1460 days) and calculate all the needed variables.
 ```python
 def brownian(time_range, mean=0, sd=1):
    time = np.linspace(0,1,time_range)
    path = np.zeros(time_range)
    for i in np.arange(1, time_range):
        path[i] = path[i-1]+np.random.normal(mean*time[i], sd*time[i])
    return time, path
 ```
 where time has a range from 0 to 1, because we can always scale the dates of the hypothetical stock to fit those values with an appropriate transformation. Below we see the graph of the motion
 ```python
 time_range = 1460 #in days
 walk = brownian(time_range)
 time = walk[0]
 stock_price = walk[1] + np.abs(min(walk[1])) +1
 plt.plot(time, stock_price)
 plt.show()
 ```
 Next, we define the exercise price, where here we choose a "smooth" deviation from the original stock price, essentially taking a stable function and adding noise, defined by a uniform distribution over an arbitrary, probably non-realistic range. Likewise, the interest rate should float between 0% and 1%, and finally the range of the time to expiration was made 1, 2, 3, 4, and 5 months.
 ```python
 variance=np.random.uniform(0.005, 0.01)
 exercise_price = stock_price + 0.1*stock_price
 interest = np.random.uniform(0.005,0.01, time_range)
 expiration_date = 30 #days
 volatility = 0.23
 ```
 Finally, we can initialize the Model class, using the above as inputs and executing the call and put functions.
 ```python
 option = Model(stock_price, exercise_price, interest, expiration_date, volatility)
 option.call()
 option.put()
 ```
 ```python
 plt.plot(stock_price)
 plt.plot(option.callval)
 plt.plot(option.putval)
 plt.legend(['Stock Price', ' Call Price', 'Put Price'])
 plt.savefig('example.png')
 plt.show()
 ```
 Since we see that the behaviour is as expected and since we want to reuse the names of variables, we clear everything and move to Section 2. (We could just skip this step since the variables would simply get replaced, but it is easier during working to just clear everything. There were many instances where we unknowingly used variables that we didn't replace and were puzzled at the plots no changing etc.)
 ```python
 del option, stock_price, exercise_price, interest, variance, volatility, walk, time_range, time, expiration_date
 ```
 ### Section 2
 #### Application using real-world data
 Having checked that the idea works, we begin by opening the .xlsx file we wish to use and defining our parameters before initializing our class.
 ```python
 df = pd.read_excel('ZNGA.xlsx', usecols=['Date', 'High (in $)', 'Low (in $)', 'Close/Last', 'Difference of High and Low'])
 high = df['High (in $)'].array
 low = df['Low (in $)'].array
 close = df['Close/Last'].array
 date = df.Date.array
 interest = np.random.uniform(0.005,0.01, len(close))
 volatility = (np.average(high - low, weights = close))
 expiration_date = 60
 ```
 Because we want to check what happens for an exercise price both higher and lower than the original stock price (say ±10%), we create a vector containing those two values and we initialize both instances with a simple forloop
 ```python
 exercise_price = [close-(0.1*close), close+(0.1*close)]
 option = []
 for change in exercise_price:
    option.append(Model(close, change, interest, expiration_date, volatility))
 ```
 Now we call the call() and put() functions for each of the instances to generate the data, which we have plotted below.
 ```python
 for model in option:
    model.call()
    model.put()
    plt.plot(date, close)
    plt.plot(date, model.callval)
    plt.plot(date, model.putval)
    plt.legend(['Stock Price', ' Call Price', 'Put Price'])
    plt.xlabel('time')
    plt.ylabel('price')
    if model.ex_p[0] < model.stock_p[0]:
        plt.title('Behaviour with a lower exercise price than stock price')
        plt.savefig('lower.png')
    else:
        plt.title('Behaviour with higher exercise price than stock price')
        plt.savefig('higher.png')
    plt.show()
 ```
 Finally, we use pandas to export all of the information we have gathered and created into an excel file. The model genrates the data frame of values of change in option prices which can be converted into an xlsx file which can then be
 passed onto/be communicated to a non python user for further interpretation and decision making.
 ```python
 data = []
 for i in np.arange(len(exercise_price)):
        dict = {'Date': date,
        'Stock Price': option[i].stock_p,
        'Exercise Price': option[i].ex_p,
        'Interest': option[i].int_rate,
        'Time to Expiration': option[i].time_to_exp,
        'Volatility': option[i].vol,
        'Call Price': option[i].callval,
        'Put Price': option[i].putval}
        data.append(pd.DataFrame(dict, index = None))
 with pd.ExcelWriter('assignment.xlsx') as writer:  
    data[0].to_excel(writer, sheet_name='Sheet 1', header = True, index = False)
    data[1].to_excel(writer, sheet_name='Sheet 2', header = True, index = False)
 ```
 ```python
 time_range = 1460 #in days
 walk = brownian(time_range)
 time = walk[0]
 stock_price = walk[1] + np.abs(min(walk[1])) +1
 plt.plot(time, stock_price)
 ```
 Next, we define the exercise price, where here we choose a "smooth" deviation from the original stock price, essentially taking a stable function and adding noise, defined by a uniform distribution over an arbitrary, probably non-realistic range. Likewise, the interest rate should float between 0% and 1%, and finally the range of the time to expiration was made 1, 2, 3, 4, and 5 months.
 ```python
 variance=np.random.uniform(0.005, 0.01)
 exercise_price = stock_price + 0.1*stock_price
 interest = np.random.uniform(0.005,0.01, time_range)
 expiration_date = 30 #days
 volatility = 0.23
 ```
 Finally, we can initialize the Model class, using the above as inputs and executing the call and put functions.
 ```python
 option = Model(stock_price, exercise_price, interest, expiration_date, volatility)
 option.call()
 option.put()
 ```
 ```python
 plt.plot(stock_price)
 plt.plot(option.callval)
 plt.plot(option.putval)
 plt.legend(['Stock Price', ' Call Price', 'Put Price'])
 plt.savefig('example.png')
 plt.show()
 ```
 Since we see that the behaviour is as expected and since we want to reuse the names of variables, we clear everything and move to Section 2. (We could just skip this step since the variables would simply get replaced, but it is easier during working to just clear everything. There were many instances where we unknowingly used variables that we didn't replace and were puzzled at the plots no changing etc.)
 ```python
 del option, stock_price, exercise_price, interest, variance, volatility, walk, time_range, time, expiration_date
 ```
--- a/content/misc/human-diffusion.md
+++ b/content/misc/human-diffusion.md
@ -1,243 +0,0 @@
 ---
 title: Human Diffusion
 ---
 You can find the repository [here](https://github.com/aethrvmn/Human-Diffusion)
 ## A Q-Learning Process About The Human Migration From Africa
 We start by importing the proper modules.
 These are
 - NumPy
 - MatPlotLib
 - Pandas
 - PIL, (Pillow) an image handler
 - tqdm, (pronounced ta-qa-dum) from Arabic (taqadum, تقدّم) meaning *progress*, is a simple progress bar to be able to estimate the time for each task
 ```python
 #pip install -r requirements.txt
 ```
 ```python
 from earth import Earth
 ```
 ### Generating the Map
 We initialise the picture that we want to use, and convert it into pixel values, so we can have a pure black and white image of the earth to use.
 ```python
 stage = Earth()
 ```
 The following forloop checks each individual pixel and the converts it to black or white. The threshold was found through running the loop many times and picking a number that looked good enough.
 ```python
 stage.black_and_white('earth.jpg', 'newPixels.csv', 'pure-bw-earth.jpg')
 ```
 We then generate the new picture and save it before we convert it into an array.
 ```python
 stage.generate_image('pure-bw-earth.jpg')
 ```
 We are now ready to create the map we will need.
 ```python
 stage.plot('map.jpg')
 ```
 Now that we have our map ready, we can move on to the implementation of the algorithm.
 ### Application of the Q-Learning Algorithm
 We import the necessary libraries
 ```python
 import numpy as np
 import matplotlib.pyplot as plt
 from tqdm import tqdm
 np.random.seed(1)
 ```
 and define the actions that the agent is able to perform
 ```python
 actions = ['west', 'east', 'north', 'south']
 #coded to 0, 1, 2, 3
 ```
 Then we can generate the Q-map, which gives the rewards.
 ```python
 q_values = np.random.uniform(-1, 1, size=(stage.height,stage.width, len(actions)))
 ```
 After, we define the functions that we will use, namely tone to generate our starting position, one for the agent to take action either randomly or by checking the Q-table, and one to define the result of the action taken.
 ```python
 def starting_area(column, row):
    col = np.random.randint(column[0], column[1])
    row = np.random.randint(row[0], row[1])
    return col, row
 def next_action(current_height, current_width, epsilon):
    if np.random.random() < epsilon:
        move = np.argmax(q_values[current_height, current_width])
    else:
        move = np.random.randint(4)
    return move
 def next_location(height, width, action):
    new_width = width
    new_height = height
    if actions[action] == 'west' and width > -1:
        new_width = width - 1
    if actions[action] == 'east' and width < stage.width - 1:
        new_width = width + 1
    if actions[action] == 'north' and height > 1:
        new_height = height -1
    if actions[action] == 'south' and height < stage.height:
        new_height = height +1
    return new_height, new_width
 ```
 Now we are ready to run the algorithm for the number of episodes we need
 ```python
 reward_map = np.zeros(shape=(stage.height,stage.width))
 reward_map[np.where(stage.map > 0)] = -10
 reward_map[:10, :] = -15
 reward_map[610:, :] = -15
 reward_map[:, 720:] = -15
 reward_map[:, :10] = -15
 #Arabian bridge
    # Gulf of Aden
 reward_map[350:388, 250:282] = 0
    # Hormuz
 reward_map[300:340, 290:315] = 0
 # Indonesian bridge
    # Sumatra
 reward_map[417:433, 485:495] = 0
    # Java
 reward_map[450:455, 495:505] = 0
    # Brunei
 reward_map[430:465, 525:530] = 0
    # New Guinea
 reward_map[460:465, 525:645] = 0
    # Australia
 reward_map[460:505, 525:605] = 0
 # Bering Straight
 reward_map[30:60, 580:610] = 50
 # Australia
 reward_map[510:540, 580:610] = 50
 real_map = np.ones(shape=(stage.height,stage.width))*10
 real_map[np.where(stage.map > 0)] = -10
 timeline = np.arange(0, 5000)
 episodes = np.arange(0, 200000)
 reward_per_episode = np.zeros(len(episodes))
 lifetime = np.zeros(len(episodes))
 ims = []
 for episode in tqdm(episodes): #30k
    epsilon = 0.7
    discount_factor = 0.3
    learning_rate = 1
    rewards = np.zeros(len(timeline))
    if episode >= 195000: # This statement is the way we destabilise the system to get more natural motion
    # India
        reward_map[390, 388] = 20      
    # New Guinea Papua
        reward_map[455, 650] = 20
    # Brunei
        reward_map[425, 540] = 20
    #Australia
        reward_map[510:540, 580:610] = 50
    old_height, old_width = 400, 230
    height, width = starting_area([old_height-5, old_height+5], [old_width-5, old_width+5])
    for year in timeline:
        try:
            action = next_action(height, width, epsilon)
            old_height, old_width = height, width
            height, width = next_location(height, width, action)
            reward = reward_map[height, width]
            rewards[year] = reward
            old_q_value = q_values[old_height, old_width, action]
            temporal_difference = reward + (discount_factor*np.max(q_values[height, width])) - old_q_value
            new_q_value = old_q_value + (learning_rate * temporal_difference)
            q_values[old_height, old_width, action] = new_q_value                
            if reward_map[old_height, old_width] > 0:
                reward_map[old_height, old_width] = 0
            real_map[old_height, old_width] = 5
        except IndexError as e:
            break
        if year == timeline[-1]:
            lifetime[episode] = year
        if reward_map[old_height, old_width] <= -10 and reward_map[height, width] <= -10:
            lifetime[episode] = year
            break            
    reward_per_episode[episode] = np.mean(rewards)
    if reward_map[510:540, 580:610].all() == 0:
        #Australia
        reward_map[510:540, 580:610] = 50
    if reward_map[30:60, 580:610].all() == 0:
        # Bering Straight
        reward_map[30:60, 580:610] = 50
 plt.figure(figsize = (10,10))
 plt.ylabel('Latitude')
 plt.xlabel('Logntitude')
 plt.xticks([])
 plt.yticks([])
 plt.imshow(real_map, cmap = 'ocean_r')
 plt.show()
 ```
--- a/content/misc/master-thesis.md
+++ b/content/misc/master-thesis.md
@ -1,21 +0,0 @@
 ---
 type: "page"
 showTableOfContents: true
 ---
 # Reinforcement Learning: Theory and Implementation in a Custom Environment
 You can find the thesis [here](/pdfs/mthesis.pdf) and the code [here](https://github.com/aethrvmn/GodotPneumaRL)
 ## Abstract
 Reinforcement Learning (RL) is a subcategory of Machine Learning that consistently surpasses human performance and demonstrates superhuman understanding in various environments and datasets. Its applications span from mastering games like Go and Chess to optimizing real-world operations in robotics, finance, and healthcare. The adaptability and efficiency of RL algorithms in dynamic and complex scenarios highlight their transformative potential across multiple domains.
 In this thesis, we present some core concepts of Reinforcement Learning.
 First, we introduce the mathematical foundation of Reinforcement Learning (RL) through the Multi-Armed Bandit (MAB) problem, which serves as a simplified model for decision-making problems without state transitions, focusing solely on the trade-off between exploration and exploitation. We then extend the discussion to the more complex Markov Decision Processes (MDPs), which provide a formal framework for modeling decision-making problems where outcomes are partly random and partly under the control of a decision-maker, involving state transitions influenced by actions. Finally, we give an overview of the two main branches of Reinforcement Learning: value-based methods, which focus on estimating the value of states or state-action pairs, and policy-based methods, which directly optimize the policy that dictates the agent's actions.
 We focus on Proximal Policy Optimization (PPO), which is the *de facto* baseline algorithm in modern RL literature due to its robustness and ease of implementation. We discuss its potential advantages, such as improved sample efficiency and stability, as well as its disadvantages, including sensitivity to hyper-parameters and computational overhead. We emphasize the importance of fine-tuning PPO to achieve optimal performance.
 We demonstrate the application of these concepts within *Pneuma*, a custom-made environment specifically designed for this thesis. *Pneuma* aims to become a research base for independent Multi-Agent Reinforcement Learning (MARL), where multiple agents learn and interact within the same environment. We outline the requirements for such environments to support MARL effectively and detail the modifications we made to the baseline PPO method, as presented by OpenAI, to facilitate agent convergence for a single-agent level.
 Finally, we discuss the potential for future enhancements to the *Pneuma* environment to increase its complexity and realism, aiming to create a more RPG-like setting, optimal for training agents in complex, multi-objective, and multi-step tasks.