I’ve started, so I’ll finish

In a true Mastermind fashion, I’ve got to say “I’ve started, so I’ll finish”; this post; and not do a wrap up of 2011…

My fascination with data must have started when I was about 9 or 10 when I helped my father unearth a discrepancy in the bank journal – it was one of those huge leather bound ledgers that had daily entries.

Me and a friend had caught a bus to my Dad’s bank to go Diwali shopping after the bank hours. I remember key staff, about 5-6 of them, including my Dad, his boss and others were still trying to resolve a discrepancy of about Rs 1500 for a particularly high profile customer.

With our constant requests of taking us Diwali shopping (mainly to get fireworks) falling on deaf ears, I said, “Let me have a go, I’ll find it for you”.

By the time I had said this, everyone was pretty tired and had wanted to give up. My Dad’s boss, who also happened to be my class teacher’s husband ordered tea and biscuits and asked the staff to relax. He and my Dad then explained to me what I should be looking for – about 7-8 entries that totalled Rs 1500 for a specific customer.

I requested paper and pen, noted down the amount, the customer name and decided to start with a mission of finding *all* entries for that customer.  I thought if I presented them with everything, they could surely work it out for themselves. The thought of this seemingly simple activity waned away at the sight of the ledger.

My Dad then came to my rescue and said, see if you can look at the last few months and gave me a specific date (the date they knew the customer had deposited a large amount). I decided to use that as a starting point; going through each entry enquiring what each item meant. I think I enquired on about 3-4 type of entries and got a hang of how they made the entries in the ledger.

After spending about 25-30 mins, I had been able to identify that some entries been clubbed together whilst others had been on what seemed like incorrect dates. I even saw an entry for 29th Feb which then I thought was odd as that year we only had 28 days and I was at a birthday party that day.

I excitedly called my Dad’s boss and asked if he could see what I had found; He was impressed with this and following on with the ‘trend’ I had noticed, they went on to uncover all the discrepancies. I just happened to overhear, “That is not the main cashier’s initials against these entries, they are someone else’s.”

This was an incident that made had a lasting impression on me. Being able to contribute to something that benefitted so many was not just satisfying, but rewarding too. I got an ice-cream and an additional Rs 5 to spend from my Dad’s boss to spend on fireworks! Probably my first pay, come to think of it – and worse still – all of it went up in smoke – literally in this case!

The data fascination continued from then on – from maintaining records, data, scoring sheets for Cricket teams, to studying Statistics and eventually ending up rather serendipitously into Data Warehouse testing!

Data warehouse testing requires a keen eye for detail especially when there is a need to investigate a data discrepancy in hundreds of millions of rows spread across a multitude of tables that span across different databases, schemas, applications or one of the data warehouse ‘layers’.

It is quite easy to get lost, frustrated and even confused when you are trying to find the discrepancy ‘needle’ in the in the data ‘haystack’; over the years, especially during my data warehouse testing days, I devised my own way of finding ways to succeed this finding mission. It was recently at the Rapid Software Testing course that I learnt, that it was the ‘focussing/defocussing’ strategy I had been applying!

Very briefly, Defocussing allows us to find a pattern that violates the test pattern and encourages to look at / investigate multiple factors at a time. Focussing on the other hand allows us to zoom in on one factor at a time that will help establish  the cause of the discrepancy. Well, at least, this is a interpretation that works for me!

The best analogy to this is a pivot table in MS Excel that allows you to look at summary values for different categories you are analysing the data for. You spot something that bothers you and you can just double click the number to get the details of what the source is for those numbers.

Same goes with data warehousing testing. It would just be easy to zoom in on a discrepancy (called ‘drill down’ in data warehousing terms) using an OLAP cube that is built on top of the data marts within the data warehouse, and then use any associated ‘drill through’ reports to see the data, but for some reason, there is always an unwillingness to provide the OLAP cube before the ETL process has been tested.  Something that I intend to write on in the new year.

In conclusion, this indeed was a rambling on something I just started to write a couple of days ago. I know of at least one (other) follower who reads this blog and critiques it enough to encourage me to do better!

As usual, comments, critiques, suggestions welcome!

Thanks for reading and good luck for the new year!

Advertisements
Leave a comment

2 Comments

  1. Really nice story, enjoyed reading it, hope you write more on this in the New Year

    Reply
  1. 1+2+3≠2+2+2 « ramblings on testing…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: