Rule 1: Never lose money.

Rule 2: Never forget rule 1.

- Warren Buffet

If only it were that easy. At some point you will lose some money, even Mr. Buffet has. Understanding how much money is at risk in your portfolio over a specified timeframe can help minimize those losses, or at least help understand them. Further, having the ability to construct portfolios which match investor risk appetite, and understanding how different financial instruments in a portfolio correlate under normal conditions is extremely useful to investors. Value at Risk (“VaR”) is the tool which accomplishes this. In this blog Vital Data Science discusses VaR and provides R code which will calculates VaR for any portfolio.

**What is Value at Risk?**

**What is Value at Risk?**

Given a probability (commonly 95%) and a timeframe (commonly one day or two weeks), VaR measures the maximum loss the portfolio can likely experience. For example, a VaR of $1 million at a 95% probability, means there is a 95% probability that current portfolio holdings will not lose more than $1 million over the specified time horizon, under normal market conditions.

The ‘under normal market conditions’ is a critical qualifier. As correlation of asset returns is used in the calculation of VaR, in stressed market conditions where all asset returns are closely correlated VaR is not an accurate measure of possible losses. There are two ways to address this short coming. If the investor needs to quantify the possible loss which can occur during periods of extreme market stress, the investor can find such an event in history and calculate asset returns, standard deviations, and correlations, then use those in the VaR formula. Or, since extreme market volatility events often occur in clusters and are usually preceded by market distortions (i.e. extreme valuations not supported by trade flows or fundamentals), another way to deal with VaR’s short comings is to simply understand them, acknowledge them, and adjust your market behaviour accordingly.

The general formula for VaR is:

VaR = Standard Deviation Multiple * Portfolio Standard Deviation * Square Root of Time Period

*Standard Deviation Multiple*

The multiple of the standard deviations of portfolio returns at the required probability/confidence interval. For example, given a required probability of 95%, and assuming a normal distribution, the confidence interval value is 1.65.

*Portfolio Standard Deviation*

This is the dollar value portfolio standard deviation. In it’s most general form the equation takes the form of:

Where,

For a portfolio with two assets, portfolio standard deviation can be rewritten as:

*Square Root of Time Period*

Time period is simply the amount of days for which you would like to calculate VaR for. Therefore, for a two week period, the square root of the time period is the square root of 10.

**Uses of VaR**

VaR is a simple and quantifiable technique that illustrates the impacts of diversification. By adding long/short positions in various assets, determining their statistical properties and how the assets interact with each other, an investor can design portfolios which target specific risk/reward profiles.

Often investors will set a VaR target which a portfolio is not to exceed. In this case VaR would be measured daily, and if calculated VaR exceeds allowable VaR, the trader would need to hedge, liquidate some positions, or take other actions to reduce risk.

VaR has two major short comings.

1. Past asset behaviour is not always indicative of future asset behaviour.

This can happen for various reasons ranging from systematic to those specific to the asset. For example, a new rule, market structure, or competitive dynamic may change the way investors hedge or hold assets, therefore changing previous asset correlations and standard deviation of returns. One example of this type of scenario is the increased use of index ETFs as a hedge tool and/or a replacement for cash in portfolios. This change in investor behaviour, driven by market structure and competitive dynamics has changed how those instruments behave. Another example is a firm maturing in their business model. Where before such a company may have seen less competition with share price returns having low correlation to the overall market, if the firm is now in a mature industry it will be competing on price, achieving lower unit returns, and whose economic fortunes are more correlated to the general economic business cycles.

2. During periods of extreme market volatility asset correlations tend to significantly increase, thereby reducing the benefit of diversification.

The best example of this point is during a financial crisis where all investors are heading for the exit doors at the same time. When this occurs, all asset prices tend to fall irrespective of historical correlations and few true hedges exist.

VaR is no silver bullet - nothing is. Investors must be vigilant with their portfolios in order to understand when and why historical correlations and standard deviations of returns are not applicable. Investors must also be able to forecast periods when markets have a higher chance of experiencing larger than normal losses and adjust portfolios accordingly. The risk in doing so is that by adjusting for some event which does not occur the investor will lose out on a large run up in prices. This is a legitimate risk and is the downside of protecting a portfolio. However, portfolios can be protected cheaply and without exiting positions through the use of options and other derivatives. As well, pure alpha strategies can be employed to minimize systematic risk to the portfolio.

For Vital Data Science views on the market and information on pure alpha strategies please sign up to receive our research **here**.

**The code**

We will now present our R code for calculating VaR. Two inputs to this code are necessary: a portfolio which resembles Figure 1, and a pricing file which resembles Figure 2. In our code we do not assume a standard normal distribution accurately represents the portfolio return characteristics. We fit Cauchy, Normal, and T distributions to the portfolio return then select the most appropriate.

*Note:** this code is being provided for educational and information purposes only. It is being provided for free. We offer absolutely no guarantee or warranty as to accuracy, correctness, usefulness, or results of this code. It is up to the user to verify the code and ensure correct results. Use at your own risk.*

**Figure 1:** Sample portfolio for VaR code calculation

**Figure 2:** Sample pricing file for VaR code calculation

The code:

#Load high use R libraries - standard in all my models

library(caret)

library(e1071)

library(corrplot)

library(lattice)

library(ipred)

library(MASS)

library(lmtest)

library(nlme)

library(nnet)

library(neural)

library(RSNNS)

library(kernlab)

library(ebdbNet)

library(longitudinal)

library(bnlearn)

library(bdpv)

library(coda)

library(pROC)

library(arm)

library(Matrix)

library(lme4)

library(lubridate)

library(wavelets)

library(RDRToolbox)

library(plot3D)

library(scatterplot3d)

library(lattice)

library(misc3d)

library(MBA)

library(fields)

library(quantmod)

library(timeDate)

library(tseries)

library(fBasics)

###############################################

##################DEFINE FUNCTIONS#############

###############################################

#No functions to define

############################################

##############IMPORTING DATA################

############################################

Data_dir <- c("path")

setwd(Data_dir)

port_1 <- read.csv(file = "port_file.csv", header = TRUE, sep = ",", strip.white=TRUE)

NYSE_raw <- read.csv(file = “price_file.csv", header = TRUE, sep = ",", strip.white=TRUE)

stock_names <- colnames(NYSE_raw)

######################################################################

###########GET NECESSARY STOCK DATA, CLEAN UP, CALCULATE VOLATILITY###

######################################################################

tickers <- paste(port_1$Ticker)

tickers <- as.character(tickers)

colNames <- paste(port_1$Ticker, port_1$Long.or.Short, sep = " ")

colNames <- as.character(colNames)

multiplier <- as.numeric(port_1$Long.or.Short)

units <- as.numeric(port_1$Units)

###Get prices

stock_prices <- data.frame(matrix(ncol = (length(tickers)+1), nrow = nrow(NYSE_raw)))

names <- c("dates", colNames)

colnames(stock_prices) <- names

stock_prices[,1] <- NYSE_raw[,1]

for(i in 1:length(tickers)){

col_num <- match(tickers[i], stock_names)

stock_prices[,(i + 1)] <- NYSE_raw[,col_num]}

###Finds the first value the first value where prices exist

start_val <- vector()

n <- 1

for(i in 2:ncol(stock_prices)){

start_val[n] <- min(which(stock_prices[,i] != stock_prices[1,i]))

n <- n+1}

start_val <- max(start_val)

stock_prices <- stock_prices[start_val:nrow(stock_prices),]

current_prices <- vector(mode = "numeric", length = length(tickers))

for(i in 1:length(current_prices)){

current_prices[i] <- stock_prices[nrow(stock_prices),(i+1)]}

current_value <- current_prices * units

port_value <- sum(current_value)

weights <- current_value/port_value

###Get percentages

stock_percentages <- data.frame(matrix(nrow = nrow(stock_prices), ncol = (ncol(stock_prices) - 1)))

stock_percentages[1,] <- 0

for(i in 1:ncol(stock_percentages)){

stock_percentages[,i] <- multiplier[i] * Delt(stock_prices[,(i+1)])}

stock_percentages[is.na(stock_percentages)] <- 0

colnames(stock_percentages) <- colNames

###Covariance and correlation matrices

stocks_cov <- cov(stock_percentages) #Do not need this, just for reference

stocks_corr <- cor(stock_percentages)

stocks_sd <- vector()

for(i in 1:ncol(stock_percentages)){

stocks_sd[i] <- sd(stock_percentages[,i])}

stocks_sd_yr <- stocks_sd * 252^(0.5)

###port_1 sd loop

n <- ((length(tickers))*(length(tickers) + 1))/2

port_1_var <- vector(mode = "numeric", length = n)

k <- 1

w <- 1

for(i in 1:length(tickers)){

for(j in w:length(tickers)){

if(i == j){

port_1_var[k] <- weights[i] * weights[j] * current_value[i] * current_value[j] * stocks_sd_yr[i] * stocks_sd_yr[j]

k <- k + 1}

else{

port_1_var[k] <- 2 * weights[i] * weights[j] * current_value[i] * current_value[j] * stocks_sd_yr[i] * stocks_sd_yr[j] * stocks_corr[i,j]

k <- k + 1}}

w <- w + 1}

port_var_tot <- sum(port_1_var)

port_sd <- (port_var_tot)^(1/2)

######################################################################

#################FIT DISTRIBUTION TO STOCK DATA, GET CI######################

######################################################################

###Calculate portfolio return

port_return <- vector(mode = "numeric", length = nrow(stock_percentages))

weighted_stock_percent <- data.frame(matrix(nrow = nrow(stock_percentages), ncol = ncol(stock_percentages)))

for(i in 1:ncol(stock_percentages)){

weighted_stock_percent[,i] <- stock_percentages[,i] * weights[i]}

for(i in 1:nrow(weighted_stock_percent)){

port_return[i] <- sum(weighted_stock_percent[i,])}

###Standardize portfolio return

port_return_standard <- (port_return - mean(port_return))/sd(port_return)

###Get probability distribution location and scale factors

###Note that this may be an iterative process depending on goodness of fit results

p_dist <- c("cauchy", "normal", "t")

port_dist <- list()

cauchy_start <- list(location = 0, scale = .02) #Tuning parameters, adjust as necessary

t_start <- list(m = 0, s = .22) #Tuning parameters, adjust as necessary

port_dist[[1]] <- fitdistr(port_return_standard, densfun = p_dist[1], start = cauchy_start)

port_dist[[2]] <- fitdistr(port_return_standard, densfun = p_dist[2])

port_dist[[3]] <- fitdistr(port_return_standard, densfun = p_dist[3], start = t_start, df = 5) #Tuning parameters, adjust as necessary

###Goodness of fit tests

##Bin sample data

port_return_standard_cut <- cut(port_return_standard, breaks = 11)

port_freq <- table(port_return_standard_cut)

port_freq_prob <- port_freq/sum(port_freq)

port_freq_prob_x <- seq(min(port_return_standard),max(port_return_standard),(max(port_return_standard)-min(port_return_standard))/10)

port_quantile <- seq(min(port_return_standard),max(port_return_standard),(max(port_return_standard)-min(port_return_standard))/1000)

##Build distributions

#Cauchy

cauchy_loc <- port_dist[[1]]$estimate[1]

cauchy_scale <- port_dist[[1]]$estimate[2]

port_cauchy <- dcauchy(port_quantile, location = cauchy_loc, scale = cauchy_scale)

#Normal

norm_mean <- port_dist[[2]]$estimate[1]

norm_sd <- port_dist[[2]]$estimate[2]

port_norm <- dnorm(port_quantile, mean = norm_mean, sd = norm_sd)

#t

t_df <- 5

t_mean <- port_dist[[3]]$estimate[1]

t_sd <- port_dist[[3]]$estimate[2]

t_ncp <- t_mean/(t_sd/(length(port_return_standard))^(0.5))

port_t <- dt(port_quantile, df = 5, ncp = t_ncp)

#This can only caluclate if the lengths are the same

#cauchy_fit <- mean(abs(port_freq_prob - port_cauchy))

#normal_fit <- cauchy_fit <- mean(abs(port_freq_prob - port_norm))

#t_fit <- cauchy_fit <- mean(abs(port_freq_prob - port_t))

#Graphical check

quartz()

plot(port_freq_prob_x, port_freq_prob, type = "l",ylim = c(0, max(max(port_cauchy), max(port_norm))), col = "orange")

lines(port_quantile, port_cauchy, type = "l", col = "blue")

lines(port_quantile, port_norm, type = "l", col = "red")

lines(port_quantile, port_t, type = "l", col = "green")

###Pick appropriate distribution

#Use visual as well as min(cauchy_fit, normal_fit, t_fit) to determine most appropriate distribution

###Make appropriate quantile - in this case we're using 'cauchy'

confidence <- 0.05 #Tuning parameter, adjust as necessary

port_CI <- abs(qcauchy(confidence, location = cauchy_loc, scale = cauchy_scale))

####################################################

###################CALCULATE VAR####################

###################################################

port_period <- 1 #Unit is days; tuning parameter, adjust as necessary

VAR <- port_CI * sqrt(port_period) * port_sd

VAR_percent <- VAR/port_value