Bayes Theorem: Google uses it to steer driverless cars, it helped solve Churchill’s greatest problem during WWII – German U-boats devastating Ally navies. Let’s see how we can use it to increase our chances of successful trades.

The idea behind Bayes Theorem is simple: use an event that has already occurred (B) to gain further insight into the future event you are interested in (A). Mathematically,

What we are interested in is the probability that going long or short a financial instrument as a result of receiving a buy or sell signal from our model will be followed by price appreciation or depreciation, respectively. In our case, the event that has already occurred is the buy or sell signal derived from fundamental or technical data. In other words, if our model is giving a buy or sell signal, what is probability that the trade will be successful?

Bayesian Networks are used to implement Bayes Theorem on data sets. A Bayesian Network is represented graphically by a series of nodes connected by arcs. Arcs represent the relationship between parent and child nodes, where a child node is dependent on the status of its parent nodes. A simple Bayesian Network is depicted below.

For the case we are interested in, parent nodes are statistics derived from fundamental and/or technical data, and the child node is price movement of the financial instrument we are modeling. Once the problem is graphically represented, an algorithm is implemented which ultimately determines the probability distributions of the child node. It is possible to use either discrete or continuous data for parents or children. Further, the network can consist of much more complex child parent relationship than depicted above and the analyst need not specify complicated datasets graphically; algorithms such as the growth-shrink algorithm can be implemented to learn the network from the data.

Once the network and probability distributions are determined, the combination of statistics that yields the highest chances of success is programmed into a trading algorithm. After careful back testing and stress testing, we are ready to implement.

Confused? Here’s an example.

Example. The Vital Data Science team used the above approach to implement a trading algorithm for GE equity. Our network, depicted below, consists of two technical/fundamental statistics as parent nodes and GE equity price movement as the child node. Each node had three discrete levels: buy, hold, sell; and

where 1’s represent buy signals, -1’s represent sell signals, and 0’s represent hold signals. The problem we are solving is to determine the probability distribution when both parents are indicating buy or sell. Our hypothesis is that the buy combination of signals in the parents will yield the highest probability of a successful long in the child node, and the sell combination in the parents will yield the highest probability of a successful short in the child node.

The chart below shows the probability distribution for each distinct combination of levels in our model. The plots we are interest in are circled green for buy and red for sell.


The plot tells us we’ve created a strong model. When both statistics give buy signals there is an 80% chance of positive future price movement, and when both statistics give sell signals there is a 70% chance of negative future price movement. The prevalence of positive price movement is 54% and negative price movement is 46%, therefore our model is predicting significantly better than guessing.

We back tested the approach and the chart below shows periods where we would have jumped into and out of GE equity. Note that the model is sending buy signals now, however as we’ve not received a sell signal to date we cannot assess the result of the latest round of buying/selling.

Results. How did we do? Fantastic! Between January 1, 2010 and February 2012 our strategy returned 90% vs. a buy and hold return of 20%, excluding dividends. The obvious fallacy of this strategy is that we did not participate in the significant run-up in share price between February 2012 to present. However, even without participating in the latest run-up, our model returned 12% above a buy and hold strategy.

The reason the model skipped the majority of the period between 2012 to present is because we set one of the statistics based on annual values, therefore the model was slow to adjust to the run-up in price. The simple solution is to shorten the duration of that statistic, for example to reset the statistic quarterly. That said, one has to balance the frequency of trades with the ability to predict them correctly; in this case we decided to cut our trading frequency in order to achieve higher chances of success.


Any material provided in this blog is for general information use only. You should not act based solely upon the materials provided herein. Vital Data Science Inc. advises you to obtain professional advice before making investment decisions. Your use of these materials is entirely at your own risk. In no event shall Vital Data Science Inc, its officers, directors or employees be liable for any loss, costs or damages whatsoever.