Outsourcing Data Cleansing: Top 5 Best Practices
It may seem like a scary proposition to send your data outside your organization to be improved. Who do you send it too, what do you provide them, how much is it going to cost, how do you know if the data is correct, how can you make it successful. At Convergence Data we have learned to bridge this gap between outsourced data cleansing operations and US based companies – to make it a successful experience. There are some basic practices to follow that we have learned we can share.
Data Outsourcing Companies – India is a popular option given skill set and pricing
- Define the Data Model with the End Requirements in mind: One of the first critical steps is to define the requirements that need to be met prior to starting any work, this will mitigate unnecessary rework down the road. As part of defining the requirements, you must agree on a data model that describes the data to collect and its format. The data model must be approved before any work begins. The data model cannot change once the project starts. It’s also important to distinguish key attributes over other values in your data model – those have to be complete and correct. Finally, the source of the data needs to be agreed – if its customer documents, those documents need to be approved and delivered prior to the start of the project.
- Plan how to validate results: It is essential to have a data validation plan in place before you receive the cleansed data. It’s always helpful to have tools to analyze the quality and completeness of a data delivery prior to signing off but some sort of validation process is needed. Optimally your data cleansing partner should complete the initial validation to minimize human error and reporting to show the validation steps taken. Items to validate include such things as ensuring the data is in the correct format, the part is classified in the proper node of the tree, the unit of measures are properly identified and the attributes are driven by the properties of the data model. Data analytic tools will streamline the process of validation and quite often you can spot inconsistencies that don’t make sense through the tool. It is another view into the data – a more of a graphical view.
- Break work into Batches and Sample Data sets: To help manage the delivery process, we have learned to do things in samples and batches. Samples of data should be provided to make the work more manageable and to ensure accuracy early on to mitigating significant rework. You will need to make sure you have the resources available to review these samples in a timely manner, otherwise things can really slow down. We also recommend the final data is delivered in batches on a monthly basis versus a big one-time final delivery, this will streamline the final data load into the system of choice.
- Regular Project Communications is Key: It critical that a process is defined and followed throughout the project lifecycle. A clear project team and project owner must be defined and the project team must agree on deliverables, schedules, data model approvals, source documentation, etc. Weekly project meetings are imperative to review these areas, resolve any open issues, and make sure everyone is on the same page.
- Find a Good Outsource Partner: What are the characteristics of a good data outsourcing partner. You will want one that is focused on delivering quality and is willing to follow a process agreed to by your project team – be sure to keep an open mind, the proposed process may be different than the way you work today. Workflow software is a big part of a more streamlined process – a process that everyone can easily follow. The software should accommodate the way outsource partners work most efficiently. You will want a partner that is willing to warranty their work, if there are problems in the data they deliver – they need to be willing to address them in a timely manner. We recommend finding out why an issue occurred and ensure mitigation is in place to prevent a reoccurrence. You will also want to review a partner’s security policy and practices to make sure your data is handled in a secure manner. If possible, we recommend visiting the facility first to obtain confirmation in this area.
Conclusions: Data Outsourcing can save a lot of time and money if it’s done following some of these basic principles. It needs to be a partnership between your partners and your company. If you are unsure or can’t decide who is the best partner – provide them a sample data set and compare the results before committing to a longer term project. Most companies would be willing to do this work for free to win your business.