Member-only story
How to Properly Test Your Data Models
Where and what to test in your tables and columns

Testing is one of the most important practices in the engineering world. In software engineering, testing prevents bugs from being carried through to production. In data engineering, testing ensures data is captured and moved around properly. In analytics engineering, it ensures your data is high-quality and ready to be used by business teams.
Testing data models catches problems in your data before they manifest in downstream models. Without testing, issues can go undetected for days, weeks, or even months. I’ve run into scenarios where no data is being collected on the backend of a website due to expired tokens, leading to two weeks’ worth of lost data. I’ve also had tables that were outdated due to a schema or data type change at the source, causing stakeholders to use stale data. Testing is one of the only ways to ensure you are being proactive about your data rather than reactive.
As a key data governance practice, testing can be used to uphold high standards put in place to control access to data and mask customer PII information. This helps keep the data within your data warehouse secure and accurate.
Testing at the source and model levels is key to covering all of your bases in terms of data quality. It’s also…