Via Bob Bronson, we get this very interesting way to think about the potential universe of market returns:

In addition to the almost universally improper use of the correlation function that we have presented before (our Correlation Puzzle is available on request), Alpha-Beta, Efficient Frontier, Black Scholes, VaR, stochastic modeling, and exotic derivatives from Modern Portfolio Theory and Post-Modern Portfolio Theory also variously depend on security returns being “normally” distributed, but it is demonstrable that they are not for any capital market over any time scale.

Don’t confuse the distribution of returns on a log-scale versus an arithmetic scale – see those chart comparisons further below. A common mistake made by many, especially on internet blogs, is not understanding the huge application difference between application of logarithms and percentages. A good example is the systemic drift with index funds and ETFs, especially the high (beta) multiple inverse ones.

Category: Mathematics, Think Tank

Please use the comments to demonstrate your own ignorance, unfamiliarity with empirical data and lack of respect for scientific knowledge. Be sure to create straw men and argue against things I have neither said nor implied. If you could repeat previously discredited memes or steer the conversation into irrelevant, off topic discussions, it would be appreciated. Lastly, kindly forgo all civility in your discourse . . . you are, after all, anonymous.

[...] Stock market returns are not normally distributed. (Big Picture) [...]

Check out The Capitalism Distribution by Blackstar:

http://bit.ly/H0w0K

I’d really appreciate some color on the “logarithm and percentages” issue.

I assume the black is an “expected” normal distribution. The blue are “actual” results of calendar year returns? I’m missing the logarithmic link here.

Also, there’s a pretty limited number of data points (130) when limited to calendar years.

Does the data appear to be more normally distributed when monthly returns are used? What about rolling 12-month data points? After all, I know of nobody who only invests capital on the first day of January.

Thanks in advance for intelligent answers.

What is missing from this graph is any kind of error bar. How significant are the deviations from normality? Let us assume normality. Looking solely at the peak bin, we have x-obs = 8, x-exp = 18.5, n = 130, p = 18.5/130. Thus the z-score is (x-obs – x-exp) / sqrt(n*p*(1-p)) = -2.63, barely significant at the 5% level. Of course there are two depressed central results, and if you combine the two bins, you get a significance level of 0.06%, so there does appear to be something going on here.

I expect a fat tail, and I am not surprised at the skew to the left, but this bimodal thing is really new. It doesn’t exist for daily prices, and yet for these yearly changes, doesn’t seem to be a fluke. And it plays hob with all theories I’ve seen.

And where can I get 130 years of Dow history?

@saunderscc, I think the logarithms come in when the data is first processed, and the data graphed is log(Dow at year close) – log(Dow at prev. year close). Evidently natural logarithms, so for small amounts the log change is roughly the percentage change. So when you see a bin labelled “15%” that’s not technically correct — it actually means 0.15 log units, which is really a 16% change.

@anewc2 – Thank you for throwing the BS flag.

Something is fishy with the presentation of the data. I too would like the underlying data. I can only use the blue data as presented (which has been histogrammed)

0) There appears to be at least one error in the data plot of the real data. How is the blue data point at 21% histogram (x-axis) plotted at 17.5 observations? The blue has to be whole numbers.

1) The ‘Normal curve’ is centered at the 15% histogram, but the average of the blue data is 9.8%. Why has the normal curve been shifted to the right?

2) The ‘Normal curve’ presented uses a much lower std deviation than the standard deviation implied by the blue data. This makes the tails on each side for the black curve much ‘thinner’ than they really should be. How was the std dev for the black curve determined?

Once you correct for what seems to be errors in plotting the normal curve, the ‘bimodality’ is much less impressive. Think of a much fatter normal curve, with a peak of around 13 at 9%, going down to around 10 at -10% and 28%. You can re-run anewc2′s signficance test, with the proper x-exp of ~13 and the deviations seen will not be significant.

~~~

BR: Bob Responds: See this:click for larger imageThese statistical arguments really require more careful language. You need a hypothesis (clearly stated “null hypothesis”) such as “these data describe a normal distribution” (or the opposite “…do not describe..”). Then you must assign a “confidence limit”, i.e. 90% certainty, 67%, 95%, whatever (there really is not 100% certainty). You use the WILKS SHAPIRO NORMALITY TEST. (This can be searched very easily). I might go and plug the data in and see how much certainty can be ascribed to the hypothesis that these data describe a normal curve. Note: it will be above 0% certainty.

I am not a big fan of a lot of these supposed stat gurus like Taleb. Anybody can see that they data are not 100% perfect at describing a normal/gaussian curve. (There is also the central limit theorem, that given enough data, all distributions tend to approach the normal curve!!!!). Anyway, the normal/guassian is a useful construct, if you keep all the assumptions in mind (just like any model).

Dow Jones Industrials daily prices are at http://finance.yahoo.com/q/hp?s=^DJI&a=10&b=1&c=1928&d=08&e=21&f=2009&g=d. But only since October 1, 1928. That’s when it expanded to 30 stocks. Before that it was 20, or less, going back to 1896, when it was 12 stocks. Before that, back to 1884, Charles Dow averaged 9 railroads and 2 industrial companies. (This is all from http://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average#History.) But for 130 calendar years, ending no later than 2008, you need to start in 1878. So I would really like to see how this series was constructed.

In the data since 1928, I see no evidence of bimodality. So until someone can defend its existence, I’m going to call it an artifact of inconsistent data.

CountingSheep, thanks for deconstructing the Normal curve. I hadn’t even looked at that.

As far as the normality, yes, for a rigorous result you have to go through the full process as wkevinw says. I haven’t done this for this data, but eyeballing it, no, it doesn’t look normal. Give me a little more time to play with this and I can come up with a graph. But by now, non-normality seems well established as a general principle.

The Central Limit Theorem says that if you have a bunch of independent random variables, and add them up, then the distribution of the sum approaches the normal distribution, more so as you add more variables. But it only works if the variables you are averaging have a finite standard deviation. (Approximately. There are some distributions with infinite std dev that also approach a normal. This gets real esoteric.) You also need independence. Normally (so to speak) people have lots of different opinions which are independent. But when the financial system starts to collapse, everybody gets the same idea and their decisions are no longer independent.

If the standard deviations are infinite, you can substitute Levy-stable distributions for the normal distribution, and rescue the Central Limit Theorem. But when people’s decisions are no longer independent, statistical theory really doesn’t know what to do. There are times when things are “normal” and times when everything just goes off the rails. Maybe this is what Mandelbrot’s multifractal theory tries to address, but it’s hard for me to tell because his prose is as dense as his math. See his _Fractals and Scaling in Finance_, 1997, ISBN 0-387-98363-5.

WordPress mangled my Yahoo finance link, which should extend up to “…&g=d”. Copy and paste the whole thing — if you want the data. There’s a download link on that page. It’s a 1.1 MB file, more than 20,000 trading days.