Another way to test? Visualise!

One of the challenges in testing a data warehouse is pin-pointing the source of discrepancy (I prefer calling an inconsistency in the expected and actual value as a ‘discrepancy’ rather than a bug, more on that some other time…).

Generally emotions run high in many teams when someone whispers (or cries out) “bug” amidst a team of developers.

Personally, I like them; Hey, I’ve been one. I especially admire their tenacity – specifically in the data warehousing domain – to carefully extract, ruthlessly junk, elegantly transform and delivered the polished data into the datamart. Its’ a tough job being a data warehouse developer.

So, I treat them with great respect. I engage with them and spend time debating why some ‘discrepancy’ should or should not be classified as a ‘bug’. But for this debate, I prepare rather rigorously; run the test at least twice to ensure I’m not the one making a fool of myself – check the logic, SQLs and most importantly, any rounding I may require to do. I then produce a worked example, following each data item from source to target (via all layers it traverses, if feasible), showing them why it was not transformed/calculated as expected.

If they agree with my finding, it is logged as a bug, else I get to know of an undocumented business rule that requires me to got back to the BA for confirmation and subsequently to my SQL scripts, make change and repeat the process all over again.

This process helps me understand their code behaviour, and in turn, helps them know the source of the problem and therefore allows them to deliver a fix quickly.

I see this as a win-win situation; a rather good symbiotic relationship!

Today was such a day; found myself running a script, and there wasnt much of a discrepancy as such but I did observe a rather strange pattern to the data. Because I had been looking at a date range with counts against each month, I noticed the data was split across two dates within a month – some being reported against start of the month, while others against the end of the month.

It would have taken me a good half a day to individually go through each possible combination of the columns (dimensions) to check for the one causing it; There had to be a better, quicker way to identify.

My default way of quickly identifying issues is to do this visually by getting data into Excel, analyse them using pivot tables, conditional formatting etc to quickly identify the problem areas. This time it was just not good enough adopting this technique.

I’d noticed that the data appeared twice a month, increasing gradually over a period of time so I emailed the developers about this and ask if they knew of any special rule that required us to have different reporting dates each month and left for lunch.

By the time I returned back from lunch, the developers had dug around and replied back they’d noticed it before but hadn’t known of any rule or why this was the case;  I’d noticed the pattern earlier so thought it would be interesting to see how the pattern looked when charted. This is my way of defocusing, I play with data in Excel a lot.

I separated the two sets of data – one with counts for the start of the month, other set that reported for end of the month.

I charted the data and saw this (ok, this isnt exactly what it was, but an extremely good representation of it using similar date ranges and google spreadsheets!):

This was almost a ‘Eureka’ moment for me; Almost because I wasnt 100% sure I’d found it. I emailed this back to the developer and asked, isnt this the month when the system was migrated onto the new platform? Immediately, the reply asked me to check the system the two data set represented. It wasnt surprising that they indeed were from two separate systems, with the old system one suddenly ending at March 2010.

I’d spent about 15 minutes and got the root cause without running different SQL scripts to get to the bottom of this discrepancy. Was glad that there was no bug, but we learnt something and it now formed a part of our ‘knowledgebase’!

Visualisations are all around us; we are being bombarded constantly with them in our daily lives – and they are fantastic tools too; a picture is worth and all that!

Testing doesn’t necessarily have to be technical, it can be visual too. I happen to work in an area where data is of great importance *before* it is visualised; but there’s no restriction on visualising the data any way possible to ensure it is visualisable! Afterall, every little helps!

Finally, some interesting reads on the visual aids for testers I’ve found are:

  1. blink tests that James Bach demonstrated during the Rapid Software Testing course and his article on it located here;
  2. another article by Ajay Balamurugudas on it – it’s here.
  3. an excellent presentation by David Evans at the TestBash in Cambridge last month; covered brilliantly by  Markus Gärtner here.

I have and continue to use visualisation a lot when testing; I’d be very interested to know what your experience has been.

Thanks for reading. All feedback, critique and comments welcome!