Title
A Technique For The Quantitative Measure Of Data Cleanliness
Keywords
Data cleanliness; Data quality; Dirty data
Abstract
With the amount of data that is collected, viewed, processed, and stored today, techniques for the analysis of the accuracy of data are extremely important Since we cannot improve what we cannot measure, the need for a tangible quantitative measure of data quality is a necessity. This paper focuses on a data-cleanliness algorithm, which makes use of the 'Levenshtein distance', to measure the data quality for a criminal records database. Actual law enforcement name records were used for this research. The results help us arrive at the extent of dirtiness in the data, and also highlight the different types of dirty data. We then go on to show how measuring the data quality not only helps in setting up guidelines for the data clean-up process, but also can be used as a metric for cross-comparing like databases. © 2008 IEEE.
Publication Date
1-1-2008
Publication Title
2008 IEEE International Conference on Cybernetics and Intelligent Systems, CIS 2008
Number of Pages
1258-1263
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
DOI Link
https://doi.org/10.1109/ICCIS.2008.4670930
Copyright Status
Unknown
Socpus ID
57749169543 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/57749169543
STARS Citation
Wakchaure, Abhijit; Eaglin, Ronald; and Motlagh, Bahman, "A Technique For The Quantitative Measure Of Data Cleanliness" (2008). Scopus Export 2000s. 10952.
https://stars.library.ucf.edu/scopus2000/10952