This talk is a story, using examples in Python and pySpark, about testing ETL pipelines efficiently. I won’t try to convince you that you need unit tests or automated tests – that’s up to you. If you do have unit tests for your ETL pipelines, or if you want them, it can be useful to make sure you aren’t testing more than you need.
I’ll be describing how a practical (non-pyramid shaped) heuristic helps me efficiently cover edge cases and unexpected bugs in my code by ensuring I test only the code needed for the feature I’m building.
Connect with us:
Website: https://databricks.com
Facebook: https://www.facebook.com/databricksinc
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/data...
Instagram: https://www.instagram.com/databricksinc/