Data Cleaning Part 2 – Data Health

25th September 2016

In part 1 of Data Cleaning we looked at deceased and goneaway records; in part 2 we’ll be looking at customer suppression files, data de-duplication, word processing and PAF correction.

Customer Suppressions

If you are mailing people who are customers of your business you will likely also have a list of people who have requested not to be contacted; this list is known as a stop file.

Specialised software similar to matching deceased and goneaway records is used by data processing bureaus to identify whether an individual’s details appear on a stop file and if they do appear those details are removed from the mailing file.

Just like deceased and goneaways you can specify the match level to run against the stop file, though depending on the mailing pack it might be best to use the widest match level so you can guarantee you won’t accidentally mail somebody who has refused contact.

Data De-duplication

Data de-duplication is the process of removing duplicate records from your mailing list so you don’t waste money mailing Mr J Johnson twice or even three times.

You may end up with duplicates in your mailing data if you have combined data from more than one data source, or you have manually captured data onto your database or file multiple times.

Similar to stop file matching you can select multiple levels of de-duplication depending on your mailing criteria.

For an in-depth look at de-duplication see Data De-duplication and Fuzzy Matching.

Word Processing

Depending on the source of the mailing data the names and addresses themselves may contain certain words which identify an address as unsuitable for mailing. These words may identify an individual as deceased, or indicate they have moved house; there may also be particular names or words which identify a particular mailing record as a hoax.

These words may include: “Deceased”, “Passed Away”, “Dead”, “Goneaway”, “Moved House”, or “Not at this address”; with hoax words including such names as “Donald Duck”, “Prince Charles”, or other more vulgar words.

Data processing bureaus use custom lists of words to identify records such as these and can remove them from your mailing data file.

PAF Correction

PAF stands for Postal Address File and it is a collection of valid UK postal addresses that was created and is maintained by Royal Mail.

Mailing data can be corrected using PAF correction software; various parts of the original address, such as the postcode, town, or premise, are used to try and find a match in the PAF collection, and if a matching address can be found then the original address can be replaced with the PAF valid version resulting in a cleaner data file.

For an in-depth look at PAF check out What is PAF and how does it work?.

Data Healthchecking

Some companies, including DMP, offer data health reports; to create a data health report, your data file is run through the various data cleaning services and the results are supplied in a report so you can decide whether you need to purchase any of the data cleaning services.


Data cleaning is an important part of sending direct mail, mailing unnecessary data costs you more money than simply removing those records from your data and may cost you time if you have to deal with complaints from individuals who have requested not to be mailed. If you are planning a direct mail campaign remember to request a data healthcheck and see how your data could be improved for mailing.