Data cleaning

Data cleaning is a continuous process that requires corrective actions throughout the data lifecycle. Data cleaning is the process of detecting and correcting corrupt or inaccurate records from a datasetData cleaning involves identifying, replacing, modifying, or deleting incomplete, incorrect, inaccurate, inconsistent, irrelevant, and improperly formatted, data. Typically, the process involves updating, correcting, standardising, and de-duplicating records to create a single view of the data, even if they are stored in multiple disparate systems. 

- CASRAI Dictionary

The most important thing to realise about data cleaning is that it is not just a one-time activity. Cleaning can (and should!) occur at every stage of the research data lifecycle.

» Glossary of Terms