Title

A Technique For The Quantitative Measure Of Data Cleanliness

Keywords

Data cleanliness; Data quality; Dirty data

Abstract

With the amount of data that is collected, viewed, processed, and stored today, techniques for the analysis of the accuracy of data are extremely important Since we cannot improve what we cannot measure, the need for a tangible quantitative measure of data quality is a necessity. This paper focuses on a data-cleanliness algorithm, which makes use of the 'Levenshtein distance', to measure the data quality for a criminal records database. Actual law enforcement name records were used for this research. The results help us arrive at the extent of dirtiness in the data, and also highlight the different types of dirty data. We then go on to show how measuring the data quality not only helps in setting up guidelines for the data clean-up process, but also can be used as a metric for cross-comparing like databases. © 2008 IEEE.

Publication Date

1-1-2008

Publication Title

2008 IEEE International Conference on Cybernetics and Intelligent Systems, CIS 2008

Number of Pages

1258-1263

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1109/ICCIS.2008.4670930

Socpus ID

57749169543 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/57749169543

This document is currently not available here.

Share

COinS