Data Quality
The concept of data quality is a complex one. It includes data accuracy, completeness, consistency, timeliness, uniqueness and validity. High quality data depends on the needs of the organization. (highly contextual to the organization).
- Accuracy: it confirms that the data represents the real world.
- Completeness: it ensures that the data is comprehensive of all fields and records.
- Consistency: it ensures that the data is consistent across all systems and databases.
- Timeliness: the delay between data generation and usage doesn't affect accuracy.
- Uniqueness: it ensures that the data is unique and not duplicated.
- Validity: it ensures that the data is in the correct syntax and format.
Fitness for Purpose: find the correct data for your task, otherwise it will be useless for training.
The aim should be to create enterprise wise standards and governance. For example, quality data metric (QDI), which is the percentage of records that are accurate and available across all systems of the enterprise.