Ever wondered how concepts from high school math class—like matrices, vectors, and eigenvalues—can actually be used to understand the real world? More specifically, how these tools can reveal hidden patterns in something as chaotic as the stock market? In this blog, we’ll explore how linear algebra powers the analysis of stock return relationships using eigen decomposition of the covariance matrix, helping us discover underlying patterns in how stocks move together.
Suppose we have a matrix A that represents a transformation (like our covariance matrix). If we apply it to a vector x, we usually get a new vector pointing in a different direction. However, there are special vectors—called eigenvectors—that don’t change direction when transformed by A. They only get stretched or compressed. Formally:
Here:
This means: vector x comes out scaled by λ, but pointing in the same direction. These special vectors (eigenvectors) and scalars (eigenvalues) reveal intrinsic structure in data transformations.
Now, what is a covariance matrix?
By performing eigen decomposition on this matrix, we can uncover the main directions in which the data varies—i.e., the dominant patterns of co-movement in our stock returns.
Suppose you’re analyzing the daily returns of 5 different stocks: Stock_A to Stock_E. Each of these stocks moves up and down based on a variety of factors—company news, economic indicators, global events, etc. The challenge is: can we uncover any underlying structure in these movements?
We first generate a synthetic dataset representing daily closing prices of five stocks. Then, we compute their daily log returns:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
np.random.seed(42)
n_days = 100
n_stocks = 5
stock_names = ['Stock_A', 'Stock_B', 'Stock_C', 'Stock_D', 'Stock_E']
# Generate random daily returns with correlation
base_returns = np.random.normal(0, 0.01, size=(n_days, 1))
noise = np.random.normal(0, 0.005, size=(n_days, n_stocks))
returns = base_returns @ np.ones((1, n_stocks)) + noise
prices = 100 * np.exp(np.cumsum(returns, axis=0))
# Create a DataFrame
df_prices = pd.DataFrame(prices, columns=stock_names)
# Show the head
df_prices.head()
# Compute daily log returns
df_returns = np.log(df_prices / df_prices.shift(1)).dropna()
A daily return measures the percentage change in a stock’s price from one day to the next, but instead of simple returns, we often use log returns—calculated as the natural logarithm of the ratio between consecutive prices. Log returns are preferred in quantitative finance because they are time-additive (meaning you can sum them over multiple periods to get total returns), tend to be more normally distributed, and help stabilize variance, all of which make them more suitable for statistical modeling and analysis.
The covariance matrix captures how each stock’s returns move in relation to every other stock. Its diagonal elements represent the variance of each stock (how much each fluctuates), and the off-diagonal elements show covariances (how pairs of stocks move together).
cov_matrix = df_returns.cov()
To uncover the main patterns of movement in this data, we perform eigen decomposition on the covariance matrix:
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
This eigen decomposition of the covariance matrix is the mathematical core of Principal Component Analysis (PCA), a powerful technique used to reduce dimensionality and uncover key patterns in data. When you perform Principal Component Analysis (PCA), this eigen decomposition step is crucial. The eigenvalues indicate how much variance each principal component explains. The eigenvectors define the directions in the original variable space where variance is maximized.
Example Output:
print("Eigenvalues:", eigenvalues)
print("Eigenvectors (each column is a principal direction):")
print(pd.DataFrame(eigenvectors, columns=["PC1", "PC2", "PC3", "PC4", "PC5"], index=df_returns.columns))
Now let’s see how much of the total variance is explained by each eigenvector:
total_variance = sum(eigenvalues)
explained_variance_ratio = eigenvalues / total_variance
print(explained_variance_ratio)
You might get something like:
[0.8302, 0.0501, 0.0447, 0.0358, 0.0391]
PC1 captures the vast majority (83%) of the structure in your stock returns. This strongly suggests a single dominant pattern—likely a market-wide movement where all stocks rise or fall together.
PC2 through PC5 capture smaller, residual movement patterns—likely due to:
So instead of analyzing 5-dimensional stock behavior, you can explain about 83% of the movement using just 1 component (PC1).
Bonus Insight: The eigenvectors of the covariance matrix are the principal components used in Principal Component Analysis (PCA). Each principal component is a linear combination of the original stocks, weighted by the eigenvector coefficients. These components define new axes that capture the maximum variance in the data, helping to reveal the most important patterns.
We can use this decomposition to reduce dimensionality and simplify analysis. For example:
Understanding stock market behavior through the lens of linear algebra not only demystifies complex financial data but also empowers analysts and investors to identify key patterns driving market movements. By leveraging eigenvectors and covariance analysis, we can reduce noisy, high-dimensional stock returns into a few meaningful components that capture the bulk of market dynamics. This approach lays a strong foundation for more advanced techniques in portfolio management, risk assessment, and predictive modeling—proving that math truly is a powerful tool in unlocking the hidden stories behind market fluctuations.
© Copyright 2023 by DataChronicle