🧵 Poor data quality rarely announces itself loudly.

Are you safe, or can you spot some warning signs in our guide? 👇

https://hedda.io/when-data-turns-against-you-spotting-and-handling-early-signs-of-data-quality-issues/

#DataQuality #DigitalTransformation #DataStrategy #

@heddaio Would be interested to know if your platform supports organisations assess their data accuracy? i.e. generate statistically significant sample of data to check against the entity it represents and then record the outcomes of these checks.
Over the years, I have find this question is a good one to test out data profiling tool vendors
@jschwa1 However, as we do not interact with the data sources themselves, we do not provide data sampling. In that case, for example, in Spark (Databricks, Fabric), when passing the dataframe to HEDDA.IO, you would use `df.sample(0.1)` to load, say, 10% of the data.
@heddaio Data quality typically is understood by six ‘dimensions’ - Validity, Completeness, Uniqueness, Consistency, Accuracy and Timeliness. The first four can be assessed programmatically, but the last two cannot.
Just because data exists, is valid and is plausible, does not mean it is correct! The data about me could state I have a full head of hair, analysing the data in isolation is unlikely to spot a problem, but checking against my profile image, you can see the data is inaccurate.
@heddaio Quality of historic transaction data is also a challenge - data recording a historic business transaction, for example, could have been correct against the data requirements of the time, but without an independent record, you cannot assess its accuracy. Also, if data requirements have subsequently changed, what does that mean when assessing the quality of the data about this transaction?
@jschwa1 Timeliness is generally covered as well, including staleness and freshness detection, etc. It is also possible to branch rules based on data age. With SRP (Single Row Processing), correctness is ensured as close as possible to the point of data creation. By not only validating but also cleansing data, and through our trigger architecture, we can significantly improve quality across all connected systems and ensure a high standard of quality.