New Call-to-action

sales@convergencedata.com
603-657-9449 ext. 2

Approaches to Cleansing Data

Blog

2 Approaches to Cleansing Data

By Ryan Donovan | Aug 19, 2015 10:10:00 AM

1. So you realize your data is a mess

Do you have good data? Can your engineers and designers quickly find parts to 

Load Parts Here.png

complete their design? Is your part data complete enough to determine if duplicates exist? Are your descriptions consistent and complete enough to sufficiently describe the form and functional specifications of the part? If you answered no, then perhaps it is time to cleanse your data.

After realizing you need to cleanse your data, the question becomes how to clean the data.

There are two general approaches to consider. You can cleanse the data in an all-out-effort or in an as-you-go approach. Both have advantages and disadvantages to consider.

2. As-you-go

The as-you-go process entails adding steps to existing processes to review part data to ensure it meets data quality requirements. If the data does not meet the quality requirements, process steps to normalize, cleanse, and enhance the data need to be added.

Advantages to an as-you-go process include:

  1. There is no need to prioritize parts to cleanse, since parts being processed in the system are the priority.
  2. Out-of-date components are not processed by the system so no resources are used to cleanse them.
  3. The upfront budget is less than an all-out-effort budge, making it an easier sell to management.

An as-you-go effort does require a budget; since additional steps need to be added to the current product development process and resources are required to complete the new processes. Software to process the part data is often a required expense. The as-you-go budget is distributed across a longer period and depending on the implementation, the cost of can be distributed across several departments.

Additional steps required by as-you-go include:

  1. A data quality check to make sure the data is complete and good
  2. A data cleansing process to normalize and enrich the data
  3. A New Part Introduction (NPI) to review the request for new components and ensures the data is complete, and to make sure redundant or duplicate components are not added to the system.
  4. If duplicates are preexisting in the system, a process to identify duplicates and choose preferred parts should be created.

As data quality improves and fewer parts require cleansing, the implemented steps can evolve, but should remain in place to ensure ongoing governance of the data.

3. Project – All-at-once

An all-out-effort cleanses all part data in an organized project, effectively creating a separate project to cleanse part data. Since it is organized as a separate project, it typically requires a dedicated team, resources, and software tools and consequently has its own budget.

Since all of the data is to be cleansed, it is a common practice to prioritize the part data to be cleansed. Typically, the data is cleansed in groups and each group receives a priority based on the company’s requirements.

Grouping data allows for economies of scale and is more efficient at processing large amounts of part data. Grouping large amounts of data facilitates working with 3rd parties to cleanse and enrich data, which can speed up the process and be cost effective.

At least initially, the processes to cleanse, ensure quality, and identify duplicates are contained within the project and do not affect company processes.

As the cleansed data is introduced to the company, the company should create data governance processes to ensure the data is used and maintained. A NPI process is required to make sure new part data is of acceptable quality and that duplicate components are not created.

To take full advantage of the cleansed data, users need to be given access to the data, but also the users need to be educated about how to use the data. They need to understand he

4. Change

In both cases, the need for clean data needs to be understood and committed to. Both cases require changes in processes which will affect peoples’ responsibilities and will inevitably frustrate people leery of change. To avoid negative backlashes, management needs to be entirely supportive of the project and engage regularly with project members and users to overcome any setbacks and maintain a positive attitude.

As the data is provided to the company, the users need to be provided tools to access the data and encouraged to use the new data. Users need to be educated to the benefits of using the data, how to access the data, and the governance processes to maintain quality data.

5. Deciding

Which data cleansing approach is best? Consider some of the differences listed in the table.

              
As-you-go All-at-once

Initially requires a lower budget and can sometimes be distributed across different departments. Makes it an easier sell to management.

Requires a detailed budget, typically higher than the initial budget for as-you-go. Management needs to be sold, likely needs a thorough ROI analysis, which can be difficult to determine.

Cleanses and enriches data active in the system, so priority is set automatically, and non-active parts are not cleansed.

Effective when there are large groups of data, so economies of scale can leveraged. Is typically good for large companies with many divisions and companies which have grown through acquisition.

Requires additional steps be added to product development processes, but these steps should be incorporated to ensure ongoing quality data.

Additional process are not initially required, but should be added once users start to use the data. Fewer people require changes to their processes at the beginning, but when changes are added they can benefit immediately from having access to good data.

If there are no planned major changes to IT systems.

When migrating to new IT systems, it makes sense to have all the data cleansed and enriched so change and user training can be incorporated with the migration.



10 critical REQ classification System


Topics: Cleansing Data, Classification, Duplicate Parts

Leave a Comment