Measuring the “Consistency” of a Strategy

25May10

Warning: extreme geekiness ahead…

In my last post I talked about the Consistency Metric, an objective measure of how consistently a strategy has performed, and in my opinion a good indicator of a strategy’s likeliness to continue performing out-of-sample.

In this post I want to flesh out my ideas re: Consistency, including a sample Excel file to help readers put it to use in their own trading.


[logarithmically-scaled]

Consider the graph above of a portfolio trading an RSI(2) (red) vs DV(2) strategy (grey), from 2000 to present. See last post for strategy rules.

Just eyeballing the chart, which would you say has been the more consistent over the last 10+ years?

Grey gained a bit more in the end, but clearly red was more consistent. Most of grey’s gains came during a few months in 2008/09, but leaving that brief period aside, the red line has been the better, smoother play.

Comparing these two specific strategies together is a simple matter of eyeballing a chart, but what if we had many to consider (such as in our last post re: the best 2-day indicator)? How do we capture consistency statistically?

The Consistency Metric

Note: my thoughts on how best to calculate Consistency evolved (again) and these results are different (and I think better) than my previous post.

Calculating the Consistency Metric:

1. Create a “volatility-neutral” monthly equity curve (VNEC). Note the big bump in both strategies in 2008/09. That bump wasn’t necessarily related to either strategy’s efficacy; rather, it was mostly a result of a more volatile market. Here we adjust for that.

The VNEC for the two strategies would look as follows:


[logarithmically-scaled]

2. Apply an exponential regression to the VNEC. The regression is exponential rather than linear to compensate for compounding returns.

3. Calculate the average absolute % deviation between the VNEC and exponential regression.

4. Divide the annualized return of the VNEC by the average absolute deviation. Voila…the result equals the Consistency Metric.

Now would be a good time to share the Excel workbook. Right click to save as.

Instructions: Grey shaded cells require user input, and white cells are calculated automatically. Month-end values for each portfolio are entered in columns B and C, the 5-year and 1-month daily std. dev. of VFINX (the benchmark) in columns D and E, and the number of trading days in the month in column F. Consistency Metric results are calculated automatically in columns S to U.

For the RSI(2) and DV(2) strategies above, the Consistency Metric equals 2.52 and 1.31 respectively, indicating the RSI(2) strategy was much more consistent over the sampled period.

Last Thoughts

Readers know I’m not a fan of creating new named indicators (the “Marketsci Super Duper Indicator” or whatever). The vast majority of these newfangled thingamajigs are just iterations of something else, and I find the whole thing to be mostly masturbatory.

Having said that, I haven’t seen anything like the Consistency Metric before, and I think in this case a newfangled thingamajig might be warranted.

Expect to see the Consistency Metric in all future strategy tests at the MarketSci Blog. This (like everything I do) is a work in progress, and reader input is always appreciated.

Happy Trading,
ms

. . . . .

To stay up to date with what’s happening at the MarketSci Blog, we recommend subscribing to our RSS Feed or Email Feed.



29 Responses to “Measuring the “Consistency” of a Strategy”

  1. That is an interesting way of looking at things, nice work Michael! I can see uses for this measure in a round of testing we are currently doing. http://etfhq.com/blog/2010/05/25/best-technical-indicators/

    Cheers
    Derry

  2. 2 Emilio Lizardo

    Pretty good, but you should Studentize your residuals.

    • 3 MarketSci

      RE to Emilio: and what pray tell would the benefit of studentizing the residual be in this particular application? michael

  3. 4 sjev

    What’s wrong with the Sharpe ratio? To measure the consistency over a longer period you could use monthly intervals instead of daily.

    • 5 MarketSci

      RE to sjev: nothing’s wrong with the Sharpe Ratio. It’s one of many metrics I consider in my own trading. But it’s very different than the concept of Consistency.

      For example, both of the lines in the graph above have a very similar Sharpe (almost identical if I remember correctly). Sharpe has no view of “time” or how an equity curve progresses over the sample. It’s just a single lump sum measurement.

      You could do a rolling Sharpe with some sort of measurement of variability over the sample and that would be similar to the idea in this post.

      michael

  4. 6 steve

    masturbatory? no response necessary.

  5. I like the VNEC idea and thanks for sharing the spreadsheet.

    Consistency (and its implicit cousin: robustness – isnt that what we are really after?) is something I have been thinking about and I think there are 3 areas that can apply to:
    - time (what you are doing here)
    - markets (strategy would be doing similarly on other markets)
    - parameters (slight variation in parameters should not yield hugely different results)

    However I am starting to wonder if time consistency really is a sign of robustness: a strat could consitently but profitably exhibit strong volatility whereas other strats could have a much more linear equity curve but break down in the next few months/years, etc..

    robustness is not an easy topic!..

    • 8 MarketSci

      RE to Jez: good point, but we could say that about any metric right (Sharpe, Sortino, Ulcer Index, etc.)?

      Given the choice today b/t a strategy that has been very consistent over the last X number of years versus one that has been inconsistent over that same X years (all other things like volatility-adj’ed return being equal), I would choose the more consistent. To your point, it’s not guaranteed indicative of future performance, but it’s definitely something. michael

  6. Brilliant. Posts like this is why this is my favorite blog. Always new ideas to cogitate on.

    Everybody go write nice things about the MarketSci blog on Investimonials:
    http://www.investimonials.com/blogs/reviews-marketsci-blog.aspx

    You can also review Michael’s for-pay strategies (YK, Scotty, etc):
    http://www.investimonials.com/blogs/reviews-marketsci–market-timing-strategies.aspx

    Michael – Investimonials can get you some free marketing. I requested that the pages above be added, and I put the first reviews in place. If you were to link to them, I bet you’d get a ton of positive reviews from your followers. Just a thought.

    I also created pages for some of your competition, who I shall not name here :)

  7. 10 Don PG

    I like simple calculations. I think Sharpe is simpler, but you say in response to an earlier comment that you don’t think Sharpe ratios discriminate enough between similar strategies, and that it doesn’t show change over time. Two comments:

    1. Can’t you see the change over time just by plotting? Your consistency metric in the spreadsheet (thanks for including a link!) has no time element. No different than Sharpe.

    2. Can you achieve the same sort of strategy ranking using a variant of the Sharpe ratio? Here’s what I’ve been using. Instead of std deviation in the denominator to measure the variation in monthly returns, I use the difference between a large return and a small return. The excel formula I might use is =median
    (b1:b127)/(percentile(b1:b127,.95)-percentile(b1:b127,.05)). Column B would contain monthly returns for this formula to produce the desired metric. Bigger is better just as with Sharpe ratios. In your spread sheet, Strat#1 produces .128 ratio where Strat #2 produces .063 ratio, Thus Strat#1 is higher “risk adjusted” return, e,g, more consistent. Strat#1 ranks higher even if I use different percentiles, you might prefer P50/(P95-P5) or P50/(P80-P20). I think my variant of Sharpe ratios deals better with fat-tailed return distributions.

    • 11 MarketSci

      RE to Don PG: thanks for the comment…replies:

      RE point #1: it does include a time element because of the expon regression, and the expectation that the strategy hug that regression as it progresses. Sharpe (or Sortino, etc.) is just a lump sum measurement. Now a rolling Sharpe ratio I could get behind and is, as I mentioned in a previous comment, similar in spirit to the Consistency Metric.

      RE point #2: I think that approach has merit (and I’m not knocking it) but (a) no sense of time, and (b) like the “lump sum” Sharpe Ratio, very subject to changes in market volatility (in other words, why overly penalize for high returns during high vol. periods and low returns during low vol periods when those high/low returns have nothing to do with the efficacy of the strategy?)

      michael

  8. 12 Lee

    Just wondering if we should also measure consistency from a relative basis, vis a vis the benchmark i.e. do the same for strategy returns less benchmark returns.

    • 13 MarketSci

      RE to Lee: smart comment. I thought about that, but then that unfairly penalizes true absolute return strategies which (by their nature) aren’t meant to “beat the benchmark”, but rather, to be all-weather portfolios. Just my $0.02. michael

    • 14 MarketSci

      RE to Lee: thought a bit more about your comment. I don’t see why you couldn’t do both, a Consistency Metric and a Consistency of Outperformance Metric. I could see both being useful. Again, great comment. michael

  9. 15 sjev

    I’ve got even a simpler idea to measure consistency of a strategy: A very consistent strategy will have a linear PnL curve (log scale of course in case of reinvested gains). The linearity can be measured by the Rsquared statistic of the linear fit ( squared residuals). So, the R^2 of the linear fit through pnl is a metric for strategy consistency.

    • 16 MarketSci

      RE to sjev: that’s exactly how I did it originally in this series:

      http://marketsci.wordpress.com/2010/03/22/roundup-the-best-2-day-indicator/

      There are a number of problems with that approach:

      1. The equity curve is subject to changes in market volatility which will change the consistency metric even though the efficacy of the strategy didn’t change (discussed in this post). You could change the equity curve to a VNEC like I discuss in this post, but you’d still be left with issues #2 and #3 below.

      2. Due to the nature of how r-sq is calc’ed, you tend not to get much of a range of results (in this application). Everything (that’s remotely worth trading) hugs around 0.80 to 0.99 or so. This is more aesthetics than anything, but you want a metric that shows enough variance to be memorable.

      3. More important than #2 (but less than #1) that approach doesn’t put consistency in context of returns. In my approach, consistency = return (VNEC) / avg. deviation (much like Sharpe = ann. ret. / std. dev.) meaning you’re measuring deviation or volatility or whatever relative to returns the strategy generated. A simple r-sq doesn’t tell us anything beyond how straight the eq. curve is, even if that straight is straight sideways.

      I hope this response doesn’t seem like I’m shooting down all of your ideas. I walked down the exact same logic path as you are (Sharpe then r-sq of expon. reg.), but after giving it continued thought moved to where I’m at now.

      michael

  10. 17 Paolo

    I’m starting to get the big picture about this but still wondering how it would be different/superior versus your SOT reliability metric and other statistical filters like t-stat (for instance the adaptive time machine by DV).

    • 18 MarketSci

      RE to Paolo: good question. The two biggest differences are that both of those (along with Sharpe, Sortino, etc.): (a) don’t have any “sense of time” or any sense of how the eq. curve progresses over time – they’re simply lump sum measurements, and (b) they aren’t volatility-neutral intra-sample (in my case I’m adjusting for vol monthly). There are many other disimilarities, but those are the two biggest.

      Given the number of questions similar to the above, it looks like I need to put t/g a blog post with a more detailed explanation.

      michael

  11. 19 Dennis

    Great post on consistency.

    A simple question on your spreadsheet. How is VFINX Daily SD 5year and 1month computed?

    Thanks

    • 20 MarketSci

      RE to Dennis: good question. I’ll answer it in terms of how you would do it in Excel.

      Start with a column of dates and a column of div-adj’ed VFINX prices (sorted oldest to newest)

      Use the LN function [ex. LN(B2/B1)] to get the daily % log change for all dates.

      For the 5-year daily SD I used a rolling 1260 days [ex. STDEV(C2:C1261)]

      For the 1-month daily SD you could just use a rolling 21 days, but I actually used the true calendar month. Explaining how to do this is a little complicated, so if it’s beyond your Excel know how, I would just use a 21 day period.

      Hope that helps!

      michael

  12. 21 John

    Michel, I have been playing around with this concept for sometime and have factored into my process the capture rate of vertical feet. This helps me evaluate the effectiveness of an indicator based on the opportunity in that market. Volatility and tradable opportunity are not always the same.

    A simple version is to calculate the monthly absolute returns possible for SPX. The vertical feet or maximum possible returns in October of 2008 was 137.42% compared to 21.72% possible in July of 2009. If indicator X’s average monthly capture rate is 15%, it should have made 20.61% in October of 2008 and 3.26% in July of 2009 or something close to that number. I find that measuring an indicators capture rate of V-Feet for a given index to be a more helpful gauge or at least a more intuitive measure of consistency. Since you are about 100X smarter than me, I am sure you can show me/us a more clever way to incorporate it in your new consistency indicator.

    Taking a regression line of this calculation and the deviation from it would probably work quite well.

    As always thanks for what you do!

    • 22 MarketSci

      RE to John: I am smarter than no one. I take that back. My dog chases her tail an hour a day. I’m convinced I’m smarter than her.

      I like what you’re doing with the capture rate. In a way it’s similar to the idea of the VNEC in this post in that you’re measuring what the strategy did relative to what the market was offering.

      I think if you then took an average of those capture rates (in this case it would be linear and not exponential) and then measured the average deviation from that average, that would be similar to the avg % deviation in this post.

      Lastly if you took the average capture rate and divided by the average deviation that would be simliar in spirit to the Consistency Metric. Different, but similar idea. I like it.

      P.S. you mention giving an “intuitive feel” and I agree that’s important. For me it’s looking at the VNEC (the second graph in the post above). Even though the original performance graph looks flat until the big 2008/09 bump, if we adjust for market volatility, we see that both strategies have actually been incredibly flat and consistent throughout.

      Thanks for the insightful comment John!

      michael

  13. 23 Ying

    This is really interesting stuff. Thanks for the post AND the sheet! Thinking about your approach I have a few comments/questions.

    It seems to me the interesting part of this approach is the first step, i.e the vol neutralisation. Once you get to the vol adjusted portfolio (column L and M), isn’t the rest very similar to just calculating the Sharpe? Regressing the equity curve around exponential line is same (in principle) as regressing returns on a horizental line. If I calc the Sharpe for the numbers in col L and M I get 1.39 vs 1.15, showing strat #1 is better.

    I noticed you used absolute deviation (norm 1) in col P and Q, instead of the norm 2 we usually do in calculating vol. Is there any particular reason for this? Anyway surely we can modify the Sharpe to use norm 1 as well.

    The vol neutralisation is interesting and useful and I need to think a bit more about it.

    Thanks for the excellent blog.

    • 24 MarketSci

      RE to Ying: thanks for the well thought comment.

      In short, the Sharpe is different b/c it doesn’t account for the sequence of events (i.e. “time”).

      Consider two equity curves of “small” and “big” months AFTER being volatility-adjusted (i.e. columns L & M). Assume equity curve A is made of 1 small mo, followed by 1 big mo, followed by 1 small mo, etc. Curve B is made of all small mo’s at the beginning of the sample followed by all big mo’s at the end.

      Sharpe (or any other lump sum measurement) will see those as the same. But the regression introduces the sequence of events into the equation (important to determine “consistency”) and A will be seen as being much more consistent than B (which it is).

      I didn’t follow your next paragraph RE: norm 1 vs 2. Can you expand?

      michael

      • 25 Ying

        Ah, indeed, your approach captures the sequence of events nicely. There’re many possible paths for your account to reach a given return and realising a given volatility, but the path that tracks the growth curve most closely is the most consistent. thanks for the explanation!

        BTW my other paragraph is just minor point cos in col P and Q the absolute difference is calculated instead of the normal squared differences. won’t make difference for the purpose of this comparison.

    • 26 MarketSci

      RE to Ying: got you now…good question.

      I used the “straight” deviation rather than deviation^2 b/c I wanted to be “linear” in my treatment of deviations from the equity curve. For example, take two data points: one has deviated 5% and the other 10%. Unsquared deviation sees the second as twice as “bad” as the first. Deviation^2 sees the second as four-times as “bad”. For the purpose of calculating consistency, the former makes more sense to me than the latter.

      michael

  14. 27 Josh

    Hi Michael,

    Interesting approach, but one drawback of your metric is that comparing systems that incorporate a volatility model (that adjust leverage based on predicted volatility) would skew the conclusions. Also, this doesn’t allow comparing apples to oranges (different strategy types) like a sharpe or sortino. I can see this being useful for comparing indicators, but how, for example, do you compare a breakout system which thrives on volatility with an option writing strategy or your own brand of more consistent short term mean reversion? By adjusting for market volatility you overweight the consistent small profits of selling options and underweight the large infrequent losses such a strategy incurs during volatile markets. The trendfollowing/breakout strategy’s return stream gets unfairly adjusted, because just at the time it makes the bulk of its profits is when its returns are adjusted downward for current market volatility.
    Of course I don’t mean to attack your creation! You just got me thinking quite a bit about how to compare different strategies.

    Josh

  15. Michael:

    I was applying the consistency measure to S&P 500 index itself for the peroid from 2005 start to last month, which is an overall down 5.75 years. Instead of using a monthly aggregate, I calculate VNEC and absolute range on daily basis by looking back 20 trading days. While the Sharpe ratio for the peroid is negative, but the consistency measure is positive ~ +0.55.

    Related to this observation, I can imagine a situation where most of gains are in peroids of low volatility and losses are in peroids of high volatility, sort of contrary to the equity curve situation shown in your original post on RSI(2) and DV(2). Since smaller gains are magnified and larger losses are scaled down in VNEC, I am wondering if the consistency measure can have a bias favoring aggressive strategies in peroids including tail risk short sub-peroid like 2008-2009 for S&P?

    Thanks for your comments for a late thought on the topic.

    George
    @ Flexible Plan Investments, Ltd.


  1. 1 Tuesday links: May drawdowns Abnormal Returns

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

Join 48 other followers