Did you know that nearly 90% of portfolio managers underperform their benchmark indices over a 15-year period? Moreover, according to a Vanguard study, 91% of a portfolio's long-term performance is attributed to its asset allocation? Portfolio optimization is not just an advanced technique, but a crucial necessity for maximizing returns and minimizing risks in the world of investments. Traditional methods, such as Markowitz's Modern Portfolio Theory, have long been the go-to strategies. However, with the rise of complex market dynamics and the need for more robust solutions, innovative approaches are emerging, in this context, genetic algorithms are emerging as a revolutionary tool. Inspired by Darwin's theory of evolution, these algorithms mimic the process of evolution, iteratively selecting and refining potential solutions to find the optimal portfolio composition. By harnessing the principles of natural selection and genetic variation, genetic algorithms offer a versatile and robust approach to portfolio optimization that can adapt to changing market conditions and investor preferences. Leveraging genetic algorithms, investors can explore a vast space of potential portfolios more effectively than ever before. In this post, we will explore how genetic algorithms can be applied to optimize investment portfolios. We will delve into their functioning, the advantages they offer over traditional methods, and provide a practical case study to illustrate their application.
Portfolio optimization is the process of constructing an investment portfolio that maximizes returns while minimizing risk. In essence, it involves selecting the best combination of assets to achieve the desired investment objectives. This process is crucial in investment management as it allows investors to balance their desired level of return with their tolerance for risk.
Traditionally, portfolio optimization has been approached using methods such as Modern Portfolio Theory (MPT), pioneered by Harry Markowitz and explained in the post: Modern Portfolio Theory Applied in Python. MPT emphasizes the importance of diversification and asset allocation in reducing portfolio risk. However, one of its limitations is the assumption of normal distribution of returns, which may not hold true in real-world scenarios, especially during periods of market stress or volatility.
Another traditional method is Mean-Variance Optimization (MVO), a mathematical framework that aims to maximize expected return while minimizing portfolio variance. While MVO provides a systematic approach to portfolio construction, it can be sensitive to input parameters and may lead to suboptimal results if the underlying assumptions are violated.
Despite their usefulness, traditional portfolio optimization methods have their drawbacks, particularly in handling complex investment objectives and incorporating non-linear relationships among assets. This is where innovative approaches like genetic algorithms come into play, offering a more flexible and adaptive approach to portfolio optimization.
In the realm of portfolio optimization, leveraging genetic algorithms offers a promising avenue for achieving optimal investment strategies. A crucial aspect of applying genetic algorithms in this context lies in the representation of portfolios within the algorithmic framework.
Typically, portfolios are represented as arrays or strings of binary digits within the genetic algorithm. Each bit in the string corresponds to a particular asset or investment option. For instance, if we have a portfolio consisting of stocks A, B, and C, we might represent it as a binary string such as 101, where each digit represents whether the corresponding asset is included (1) or not (0) in the portfolio.
This binary representation allows for easy manipulation and evolution of portfolios within the genetic algorithm. During the optimization process, the algorithm can explore different combinations of assets by toggling the bits on or off, simulating the creation of diverse portfolios. This enables the algorithm to search through a vast space of potential solutions efficiently.
Moreover, the binary representation facilitates the implementation of genetic operators such as crossover and mutation. Crossover involves combining traits from two parent portfolios to create offspring portfolios, mimicking the genetic recombination observed in biological evolution. Mutation introduces random changes in the portfolio representation, adding diversity to the population and preventing premature convergence to suboptimal solutions.
The representation of portfolios in the context of genetic algorithms plays a pivotal role in portfolio optimization. By encoding portfolios as binary strings and employing genetic operators, the algorithm can effectively explore and refine investment strategies, ultimately leading to the identification of optimal portfolios tailored to meet specific investment objectives and constraints.
At the heart of this methodology lies the concept of fitness evaluation, where the suitability or "aptitude" of each portfolio is assessed based on various factors, including expected returns and risk.
Fitness evaluation serves as the guiding principle for genetic algorithms in portfolio optimization, as it determines the selection, crossover, and mutation of candidate solutions within the algorithm. The ultimate goal is to identify portfolios that strike an optimal balance between maximizing returns and minimizing risk, tailored to the investor's specific objectives and constraints.
To evaluate the fitness of a portfolio, multiple criteria are typically considered, with expected returns and risk being paramount. Expected returns quantify the anticipated profitability of the portfolio over a given time horizon, taking into account historical performance, asset allocation, and market forecasts. Meanwhile, risk assessment involves measuring the volatility or uncertainty associated with the portfolio's returns, often represented by metrics such as standard deviation or beta.
In the context of genetic algorithms, fitness evaluation involves assigning a numerical score or fitness value to each portfolio based on its performance according to the defined criteria. Portfolios that exhibit higher expected returns while maintaining lower levels of risk are assigned higher fitness scores, indicating their superiority as potential solutions within the optimization process.
However, the evaluation of fitness in portfolio optimization using genetic algorithms is not limited solely to returns and risk. Depending on the investor's preferences and constraints, additional factors such as liquidity, diversification, and transaction costs may also be taken into consideration during fitness evaluation. This comprehensive approach ensures that the resulting portfolios not only maximize returns but also adhere to practical considerations and investment objectives.
The function of fitness provides a systematic framework for assessing the suitability of candidate solutions based on expected returns, risk, and other relevant criteria. By integrating these considerations into the optimization process, genetic algorithms offer a powerful tool for investors seeking to construct well-balanced and efficient portfolios in dynamic market environments.
There are three fundamental operators: selection, crossover, and mutation. These operators work in harmony to iteratively refine and evolve a population of potential solutions until an optimal portfolio is reached. Together, these operators drive the evolutionary process, guiding the algorithm towards optimal or near-optimal solutions to complex investment problems.
The selection operator plays a crucial role in genetic algorithms by determining which individuals from the current population will be chosen for reproduction to form the next generation. Various selection techniques exist, with some of the most common ones being roulette wheel selection, tournament selection, and rank-based selection.
In portfolio optimization, the selection operator typically evaluates the fitness of each candidate portfolio based on predefined criteria, such as expected return, risk, and other constraints. Portfolios with higher fitness scores are more likely to be selected for reproduction, simulating the principle of "survival of the fittest" from Darwinian evolution.
Once the individuals for the next generation have been selected, the crossover operator is applied to generate offspring by combining genetic information from two or more parent portfolios. In portfolio optimization, crossover is analogous to the recombination of genetic material in biological organisms, producing offspring with traits inherited from their parents.
There are several methods for implementing crossover in portfolio optimization, including single-point crossover, multi-point crossover, and uniform crossover. The choice of crossover technique depends on the specific characteristics of the problem and the desired exploration-exploitation trade-off.
While selection and crossover drive the exploration and exploitation of the solution space, the mutation operator introduces stochasticity by randomly altering the genetic makeup of individual portfolios. Mutation helps prevent premature convergence to suboptimal solutions and promotes diversity within the population.
In portfolio optimization, mutation may involve randomly adjusting the weights of assets in a portfolio, adding or removing assets, or introducing other structural changes. By introducing small, random changes to the solutions, mutation enables the algorithm to explore new regions of the search space that may lead to better-performing portfolios.
Let's consider a scenario where an investor wants to optimize their investment portfolio consisting of five assets: stocks AAPL, MSFT, GOOGL, AMZN, and META. The investor seeks to maximize returns while minimizing risk. Each asset has a historical return and risk associated with it, which we'll use to guide our optimization process.
First we will have to import the needed libraries and define the class incharged of creating the optimization through the genetic algorithm. This code will not be explained, feel free to reach me through X (Twitter) or my personal webpage in case you have any doubt.
from enum import Enum
import numpy as np
import pandas as pd
class GeneticPortfolioOptimizer:
class Selection(Enum):
ROULETTE: str = 'ROULETTE'
TOURNAMENT: str = 'TOURNAMENT'
class Crossover(Enum):
SINGLE: str = 'SINGLE_POINT'
UNIFORM: str = 'UNIFORM'
def __init__(self, data:pd.DataFrame, population_size:int=100, mutation_rate:float=0.1) -> None:
'''
Initializes the GeneticPortfolioOptimizer object.
Parameters
----------
data: pandas.DataFrame
Pandas DataFrame with the historic prices for each asset. The asset names will be the column names and there must be only one column per asset.
population_size: int
Population quantity.
mutation_rate: float
Rate for the mutations.
'''
self.population_size: int = population_size
self.mutation_rate: float = mutation_rate
self.num_assets: int = len(tickers)
self.data: pd.DataFrame = data.copy().dropna(axis=1, how='all')
self.tickers: list = self.data.columns
self.population: np.ndarray = self._initialize_population()
def _initialize_population(self) -> np.ndarray:
'''
Create initial population.
'''
population: list = []
for _ in range(self.population_size):
weights: np.ndarray = np.random.rand(self.num_assets)
weights /= np.sum(weights)
population.append(weights)
return np.array(population)
def _fitness_function(self, weights: np.ndarray) -> float:
'''
Calculates the sharpe ratio to use as fitness function.
Parameters
----------
weights: numpy.ndarray
Array with the weight of each asset in the portfolio.
Returns
-------
sharpe: float
Sharpe Ratio of the current portfolio iteration.
'''
portfolio_return: float = np.sum(np.mean(self.data.pct_change() * weights, axis=1))
portfolio_volatility: float = np.sqrt(np.dot(weights.T, np.dot(np.cov(self.data.T), weights)))
return portfolio_return / portfolio_volatility
def _roulette_wheel_selection(self) -> np.ndarray:
'''
Select a sample of the population.
Returns
-------
sample: numpy.ndarray
Array with the sample.
'''
fitness_scores: np.ndarray = np.array([self._fitness_function(individual) for individual in self.population])
probabilities: np.ndarray = fitness_scores / np.sum(fitness_scores)
selected_indices: np.ndarray = np.random.choice(np.arange(self.population_size), size=self.population_size, p=probabilities)
return self.population[selected_indices]
def _tournament_selection(self, k:int=5) -> np.ndarray:
'''
Select a sample of the population.
Returns
-------
sample: numpy.ndarray
Array with the sample.
'''
selected_indices: list = []
for _ in range(self.population_size//2):
tournament_indices: np.ndarray = np.random.choice(np.arange(self.population_size), size=k, replace=False)
tournament_population: np.ndarray = self.population[tournament_indices]
tournament_fitness: np.ndarray = np.array([self._fitness_function(individual) for individual in tournament_population])
selected_indices.append(tournament_indices[np.argmax(tournament_fitness)])
return self.population[selected_indices]
def _single_point_crossover(self, parents:np.ndarray) -> np.ndarray:
'''
Makes the crossover of the sample to generate the offspring.
Returns
-------
crossover: numpy.ndarray
Array with the offspring.
'''
children: list = []
for _ in range(self.population_size - len(parents)):
parent1, parent2 = np.random.choice(parents, size=2, replace=False)
crossover_point: int = np.random.randint(1, self.num_assets - 1)
child: np.ndarray = np.concatenate((parent1[:crossover_point], parent2[crossover_point:]))
children.append(child)
return np.array(children)
def _uniform_crossover(self, parents:np.ndarray, p:float=0.5) -> np.ndarray:
'''
Makes the crossover of the sample to generate the offspring.
Returns
-------
crossover: numpy.ndarray
Array with the offspring.
'''
children: list = []
for _ in range(self.population_size - len(parents)):
parent1, parent2 = np.random.choice(len(parents), size=2, replace=False)
mask: np.ndarray = np.random.choice([True, False], size=self.num_assets, p=[p, 1-p])
child: np.ndarray = np.where(mask, parents[parent1], parents[parent2])
children.append(child)
return np.array(children)
def _mutation(self, children:np.ndarray) -> np.ndarray:
'''
Creates mutations in the offspring.
Returns
-------
mutated: numpy.ndarray
Array with the mutated offspring.
'''
for i in range(len(children)):
if np.random.rand() < self.mutation_rate:
mutation_point: int = np.random.randint(0, self.num_assets)
children[i][mutation_point] = np.random.rand()
children[i] /= np.sum(children[i]) # Normalize weights to sum to 1
return children
def selection(self, method:Selection=Selection.TOURNAMENT) -> np.ndarray:
'''
Executes the correct selection method.
Parameters
----------
method: Selection
Selection method to use.
Returns
-------
sample: numpy.ndarray
Array with the sample.
'''
if method.value == self.Selection.TOURNAMENT.value:
return self._tournament_selection()
elif method.value == self.Selection.ROULETTE.value:
return self._roulette_wheel_selection()
else:
raise ValueError(f'The method selected ({method}) is not a valid selection method. Check the GeneticPortfolioOptimizer.Selection class.')
def crossover(self, parents:np.ndarray, method:Crossover=Crossover.UNIFORM) -> np.ndarray:
'''
Executes the correct crossover method.
Parameters
----------
parents: numpy.ndarray
Array containing the parents from which to generate the offspring.
method: Crossover
Crossover method to use.
Returns
-------
crossover: numpy.ndarray
Array with the offspring.
'''
if method.value == self.Crossover.SINGLE.value:
return self._single_point_crossover(parents=parents)
elif method.value == self.Crossover.UNIFORM.value:
return self._uniform_crossover(parents=parents)
else:
raise ValueError(f'The method selected ({method}) is not a valid crossover method. Check the GeneticPortfolioOptimizer.Crossover class.')
def optimize(self, num_generations:int=100, selection:Selection=Selection.TOURNAMENT,
crossover:Crossover=Crossover.UNIFORM) -> dict:
'''
Iterates over the number of generations for the evolution to take place.
Parameters
----------
num_generations: int
Number of iterations.
selection: Selection
Selection method to use.
crossover: Crossover
Crossover method to use.
Returns
-------
best_weights: dict
Dictionary with the optimized portfolio weights.
'''
for _ in range(num_generations):
selected_parents: np.ndarray = self.selection(method=selection)
offspring: np.ndarray = self.crossover(parents=selected_parents, method=crossover)
mutated_offspring: np.ndarray = self._mutation(offspring)
self.population: np.ndarray = np.vstack((selected_parents, mutated_offspring))
best_weights: np.ndarray = self.population[np.argmax([self._fitness_function(individual) \
for individual in self.population])]
return dict(zip(self.tickers, best_weights))
The next step will be to download the price data for the universe of assets choosed. We wil use Yahoo Finance for this.
import yfinance as yf
tickers: list = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'META']
start_date: str = '2020-01-01'
end_date: str = '2024-01-01'
data: pd.DataFrame = yf.download(tickers, start=start_date, end=end_date)['Adj Close']
data.dropna(axis=1, how='all', inplace=True)
This time we won't be just doing a simple simulation. We are going to apply the genetic algorithm from 2021-01-01 all the way to 2023-12-01 using the previous year data and applying a rebalance each month (recalculating the optimal portfolio). Commissions won't be taken into account as they depend to much on the capital invested. The smaller the capital used the higher the proportion between commissions and capital. To learn more about how commissions affect backtestings read my previous blog Importance of Commissions in a Backtest.
Here you can see the code used for the simulation.
prices = data.resample('M').last()
weights_raw: list = []
for date in [d for d in data.resample('MS').first().index.tolist() if d >= pd.Timestamp('2021-01-01')]:
optimizer: GeneticPortfolioOptimizer = GeneticPortfolioOptimizer(
data[(date-dt.timedelta(days=365) < data.index) & (data.index < date)].copy()
)
weights_raw.append(optimizer.optimize())
weights = pd.DataFrame(weights_raw, index=prices.index[-len(weights_raw):])
weights = weights.div(weights.sum(axis=1), axis=0) # Apply a maximum weight for the portfolio of 100%
complete_df: pd.DataFrame = prices[-len(weights):].copy()
for c in complete_df.columns:
complete_df[c] = complete_df[c] / complete_df[c].iloc[0] - 1
complete_df['Portfolio'] = ((prices/prices.shift(1) - 1) * weights).dropna().sum(axis=1).cumsum()
In the next figure we compare the gross return of the portfolio with the individual return of each asset over the same period of time. This would not be enough to determine if we should implement this type of portfolio optimization but looks quite promising.
Optimizing portfolios using genetic algorithms provides a robust and flexible approach to achieving better returns while managing risk. By simulating the process of natural selection, these algorithms iteratively improve portfolio compositions, adapting to changing market conditions and asset behaviors. The GeneticPortfolioOptimizer
class we developed exemplifies how to harness the power of genetic algorithms in Python, leveraging real financial data for practical application.
In our practical example, we demonstrated how to use historical stock prices from Yahoo Finance to build and optimize a portfolio over time. The genetic algorithm adjusted the weights of the assets in the portfolio, maximizing the return-to-volatility ratio.
This method's adaptability and heuristic nature makes it particularly well-suited for financial markets, where conditions and correlations between assets are constantly evolving. By experimenting with different selection, crossover, and mutation strategies, investors can tailor the optimization process to their specific needs and preferences.
Ultimately, the genetic algorithm-based approach offers a dynamic and powerful tool for portfolio management, blending computational intelligence with financial acumen. As markets continue to grow more complex, such innovative techniques will play an increasingly vital role in helping investors achieve their financial goals.