1+2+3≠2+2+2

I love numbers; I have been good at remembering numbers, be it times table, telephone numbers, even bug numbers when I was in charge of a team that supported a significant number of clients from Australia to the US.

I still remember the name of the scripts (and screens) of an ERP product  I used to work on. The modules ACP for Accounts Payble, ACR for Receivable, SLS for Sales, PUR for Purchase, GLD for General Ledger and each one had progressive screens numbering from 1101, 1102 etc. The most dreadful and feared of them was ACR1112 (I’m sure people who’ve worked on BAAN/MK might recognise them – from over a decade ago!) that played the most important role in currency conversion when we implemented EURO for many clients.  I even remember the rounding problems while handling 4-6 digit currency exchange rates that created currency differences that led to mismatches in many GL batches being posted! We decided to get around the differences first by multiplying by 100, doing the calculations and then dividing by 100 when saving/posting it!

It is important to remember small things like these in our jobs. They go a long way in helping us deal with slightest of discrepancies we might notice.  I started noticing, or rather, got interested in finding out discrepancy since a very young age (related post here) and continue to do so even today; so much so that it almost embarks almost upon being an obsession, i.e. to have the numbers to be perfect.

This may be my number bias (might consult James Lyndsay), but I use that to my advantage; and I think most testers do.

In my current job, I’ve been working on reconciling numbers based on how the business try to see the data. The only difference is that with hundreds of thousands of transactions that run into millions of pounds, the business user is not really concerned with a discrepancy of a few pounds;  I am.

The Analysts keep telling me that if the numbers are ‘close’, it is acceptable, which I can see a point for them, but the tester in me just does not agree with not matching the numbers or matching just about.

It is essential, especially in the data warehouse testing, to test for the correctness of the data to the lowest granularity possible/available. As they say, “Look after your pennies, and pounds will look after themselves”.

My argument (also) is, If I have tools that allow me to compare vast amount of numbers (data) to the desired precision in an ‘acceptable’ timescale, should we not use it to ensure we have the precise result? Yes, there is a certain degrees of freedom one can take – simplest being rounding a 6 decimal number to 3 decimal places (or two if the amount is monetary). As a student of Statistics I’ve always known that every number is important.

In the world of finance, these decimals are ever so more important. Rounding errors across hundreds of thousands of transactions gives you potential discrepancy of millions of pounds! The more complex the rules, the worse it can get.

In my current work, I have to split an amount received into different components. Each component is then allocated a percentage share based on the type of transaction. Each share goes in a ‘pot’, which in turn may have further sub-pots that are dependent on eligible number of days.

On the face of it, the rule might look simple (and it has been made simple for obvious reasons), but if the expectation is to allocate say a hypothetical figure of £100*, then the allocation could be shown as:

£10 – split as £6, £4**

£20 – split as £14 and £6

£30 – split as £15 and £15

£40 – split as £10, £10, £10, £10

I would expect to see £6, £4, £14, £6, £15, £15, £10, £10, £10 and £10 and not 10 transactions of £10 each.

Multiply this by thousands of transactions generated daily over a number of years, each resulting in unknown number of splits and add to this the complexity of having the fields storing the values up to 6 (or even 8) decimals place, multiple currencies that have fluctuating exchange rates, with the dreaded rounding errors a big possibility and you are in hours of head scratching/pulling exercise (wonder if I’ve lost my hair that way?) while you script, debug, test, fix, debug, test your code before the script can be useful in testing it on a large scale.

It is a software development life cycle on its own!

Its only when your script is being run for real that you get a chance to fine-tune to the exceptions that the production data invariably throws.

Some of the scripts I have put in have been unchanged for quite some time (albeit they underwent at least half a dozen iterations), but as the time goes by, there is always some data that will slip past; some that will force me to tweak the script; there’s always an exception to the business rules that you’re never made aware of…

Whilst this may not (necessarily) point at the shortcomings of the scripting, they definitely give an insight into the data quality issues – at times highlighting the shortcomings of the application itself; at times user behaviour  (on how they use the system and/or enter data) ; thus contributing – continually – in lessons learnt activity.

It is only when you see at least (and I say this very very conservatively) a fair few runs over a period of time that you would see every transaction, its splits and the numbers that can easily add up to the higher level of analysis that gives you a feel of how small numbers end up skewing the numbers to a ridiculous proportions.

I remember one of my first tests ended up sending shivers in my spine when I found a difference of approximately £1.2 million between expected and actual values! This happened because for a start I’d rounded a field to incorrect number of decimals and more importantly, because of the incorrect splits! Even when you know how important it is, it was definitely a lesson learnt -

When testing a data warehouse, 1+2+3≠2+2+2!

If you’ve managed to reach this far, thanks for reading; I’d appreciate all your comments, feedback, critiques.  Thanks!

* Actual amounts are never so rounded
** Neither are the actual splits when you consider percentages and division operations on a monetary amount.
About these ads
Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 157 other followers

%d bloggers like this: