A few days ago, I got the opportunity to speak with my old teacher again, Professor Kalamkar after a good 17 years or so (that makes me feel really old!); A Professor of Statistics, he also happened to belong to the same school as I did, albeit years apart!

Although I had been keeping myself updated with the ongoings on the campus via friends, websites etc, it was especially satisfying to catch up with him and getting to know, first hand, how the University had progressed leaps and bounds. Felt happy and immensely proud.

Learning about his new blog (its here, for those interested in Statistics), I enthusiastically started reading his blog articles in a chronological order only to find how much I had distanced myself from the subject I had majored in!

I did find a post that I could comment on, and as of now it awaits moderation; but after submitting the comment, I started thinking about how much I could remember (and demonstrate) from the college days, and on the subject, if anyone asked me that minute.

I set about thinking of ‘Probability’ (which incidentally was the topic I had read and commented on). How would I demonstrate to anyone how I could calculate Probability – of say a roll of a dice – the most basic/standard experiment in Statistics.

Now we know that there are equal chances for me to roll any number between 1 to 6 on a single dice. i.e. 1/6 or 0.17 (0.16666…, but we’ll round it up to 2 decimals). “How” do we demonstrate that the chances are 1/6? or the probability of rolling a number, say 2, is 0.17?

Whilst during my college days I remember being explained something along the lines of, “if you roll (a balanced) dice, say 6 times, one of the outcome would be the number 2”.

Fair enough. But this could not be proven. What if all 6 times the dice resulted in 2? What if I did it 12 times and all 12 resulted in a 2? Would I succeed if I did 18 rolls? 20? There was only one way to find out. Experiment it.

The closest thing to experimenting with a dice was doing this virtually using my favourite tool Excel. I had decided to roll the dice enough times. How much is enough? I settled on 5000 first, then thought 7500 and then settled with a nice round number of 10000.

The best way to roll a dice in Excel is to use the function “RANDBETWEEN(1,6)” and then copying it into the 10,000 cells – took me all of 10 seconds!

Next, list the outcomes i.e. numbers 1 to 6 and then count the number of times each occurred. Also called the ‘frequency’ (done using a simple “COUNTIF” function). Finally divide each number by 10000 (i.e. total number of rolls) to give us the probability.

In my case, this was

No | 1 | 2 | 3 | 4 | 5 | 6 | Totals |

Outcomes | 1728 | 1687 | 1655 | 1667 | 1660 | 1603 | 10000 |

Probability | 0.1728 | 0.1687 | 0.1655 | 0.1667 | 0.1660 | 0.1603 | 1 |

Bingo! Experiment successful!

Add to this the fact that the 10000 cells of actual outcomes and the summary table containing formulae, a simple F9 on the spreadsheet refreshes and recalculates these random numbers, thus conducting another 10000 rolls in a fraction of a second. The actual outcomes change but still stay within the 0.16 – 0.17 range, thus proving the experiment a success!

Now I wanted to scale this up and experiment with two dies. I also remembered that such a random experiment would have a pattern to it when I plot a simple bar chart for the amount of outcomes. This is in the shape of a bell-curve. This would allow me to demonstrate the simple, probability density function (or the normal distribution).

For the first part, I just substituted the actual outcome formula to “RANDBETWEEN(2, 12)” – since the minimum number rolled would be 1 on the first dice + 1 on the other dice and the maximum being 6 and 6. The ‘frequency distribution’ for this experiment now changed to:

No | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | Totals |

Outcomes | 933 | 894 | 906 | 920 | 942 | 903 | 905 | 901 | 919 | 883 | 894 | 10000 |

Probability | 0.0933 | 0.0894 | 0.0906 | 0.0920 | 0.0942 | 0.0903 | 0.0905 | 0.0901 | 0.0919 | 0.0883 | 0.0894 | 1.0000 |

For the second part, I had to select the actual outcomes and chart a simple bar chart, but this came out all wrong.

The graph did not represent a bell-shaped curve at all. This can’t be right, I said to myself. I thought for a bit, went for a coffee and while having coffee I remembered that the roll of dies were independent of each other! i.e. roll of a dice did not affect the outcome of other. So, all I had to do was change the formula to “RANDBETWEEN(1,6)+RANDBETWEEN(1,6)”. This resulted in the outcome to:

No | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | Totals |

Outcomes | 269 | 520 | 854 | 1109 | 1368 | 1663 | 1423 | 1132 | 823 | 574 | 265 | 10000 |

and the graph looked familiar:

It was very rewarding to recollect these simple, first lessons of Statistics and indeed recollected charting some of these down during the practical sessions on a graph paper. Although the sample sizes were significantly smaller!

Drawing parallels with testing, we are learning constantly, during the course of our employment, during training sessions, conferences, coaching sessions or even discussing it with a peer, development team members, business analysts, whilst reading articles, blogs, related and unrelated topics, interacting with our friends, families, children and many more sources;

It is becoming increasingly difficult, not completely un-manageable though, to store little nuggets of useful information, tips ‘n tricks of the trade and recollecting them when you need most.

A single, un-related conversation sparked the curiosity of remembering what you learn over the years, to fetch information that lies dormant in some remote corners of the brain, only to help us succeed in our craft.

I regularly try to reflect on what I have learnt new, now recalling what I had learnt ages ago goes on that list too!

Thanks for taking time to read this. All comments/feedbacks/critiques welcome!