There is that embarrassing moment when the CFO calls you into their office after a senior management meeting and tells you that the CEO just gave them a grilling over “unreliable numbers”. Why, because your reports are showing that there are 8 regions when clearly everyone knows there are six regions.
A seemingly innocent typographical error may go unnoticed in the operational system but may have a big impact in the reporting system. Here are a couple of simple examples:
So, how do you solve this problem? The traditional approach of putting many validation rules into the ETL is time consuming and flawed. The problem is that we have to involve the programmer every time a change is required. The problem with traditional approaches is:
There are a few things that can be improved but they do seem minor in the scheme of things:
Overall, I am happy with the tool and the benefits far outweigh then aspects that I think need improving. In fact I would go as far as to say that the items I have listed as possible improvements are all trivial.
Anyway, the real test is to see if I can get the business to actually take ownership of cleaning their data. It sounds good in principle but proof is in the puddingJ I will let you know how I get on with it.
Need a consultant to help you identify and define your data quality needs. Call us 1300-799-321.
www.techlogix.com.au
A seemingly innocent typographical error may go unnoticed in the operational system but may have a big impact in the reporting system. Here are a couple of simple examples:
- Example 1 - Customer Address: In Australia there are 6 states and 2 territories. It is therefore common for companies to analyse sales, profitability and other data by region (state or territory). In theory this should be a simple exercise, however after a senior member of the organisations prepares a quick pivot table or ad-hoc report (using any reporting tool) they soon discover that their results may return up to 12 or more groupings. So how does this situation come about? Simple! The computer cannot easily tell that Queensland, QLD and “Qeenland” are meant to be the same thing.
- Example 2 - Unique Customer Attributes: Sometimes companies need to uniquely identify certain entities such as customers. Assuming that the system forces all customers to have a national security number or a driver’s license number, can you assume that the numbers are unique (just because the users say they should be)?
So, how do you solve this problem? The traditional approach of putting many validation rules into the ETL is time consuming and flawed. The problem is that we have to involve the programmer every time a change is required. The problem with traditional approaches is:
- The business owner of the data does not have the tools or capability to correct the issues in the data warehouse.
- The ETL programmer who does have the tools and the capability does not have the best understanding of the business data to accurately and consistently make corrections to operational data (also they often don't have the authority to make such changes).
- It is easy to learn. The tool set is very much intuitive and wizard driven. Out of the full SQL Server toolset it is perhaps the one that needs the least technical knowledge and anyone.
- The focus is on knowing your data rather than technically driving the tool. More time is spent on analysing and understanding the data than learning the tool. It is designed with end users in mind.
- It comes with SSIS components that make it easy to automate and incorporate into existing ET
- It is included in the SQL server license so no additional cost justification required.
There are a few things that can be improved but they do seem minor in the scheme of things:
- If you are using a 64 bit version of Microsoft Office then you can only use the office 97 – 2003 Excel file formats (.xls) and not the 2010 or 2013 (.xlsx) files. This seemed a bit off to me given that I was using SQL 2012 SP1 and the new shinny Office 2013.
- As far as I can see you can only connect to databases on the same instance as the Data Quality Services databases. I might need to investigate this more so if anyone else has a found a way of doing it please let me know.
- With the Data Quality Project you cannot go back to previous steps after you move to the export screen. The back button is there but it is disabled. Having said that, creating a new project is easy so you can always do that.
Overall, I am happy with the tool and the benefits far outweigh then aspects that I think need improving. In fact I would go as far as to say that the items I have listed as possible improvements are all trivial.
Anyway, the real test is to see if I can get the business to actually take ownership of cleaning their data. It sounds good in principle but proof is in the puddingJ I will let you know how I get on with it.
Need a consultant to help you identify and define your data quality needs. Call us 1300-799-321.
www.techlogix.com.au