Every system fails eventually and your Python app is no different. And while failure is inevitable, downtime isn't. One of the key reliability tools in distributed systems are retries – so let's look how they can do more harm than good, and how to retry safely and easily in Python by using the stamina and httpx packages as examples. Both using a decorator and a smart loop-context manager combo!
🔗 Links
► https://stamina.hynek.me/
► https://tenacity.readthedocs.io/
► https://www.structlog.org/
► https://en.wikipedia.org/wiki/Thundering_herd_problem
► https://en.wikipedia.org/wiki/Denial-of-service_attack#Distributed_DoS
► https://www.bbc.com/news/business-64652835
► https://www.w3.org/Provider/Style/URI
► https://sre.google/books/
► https://how.complexsystems.fail
► https://en.wikipedia.org/wiki/Three_Mile_Island_accident
► https://longform.asmartbear.com/scale-rare/
► https://www.youtube.com/watch?v=YVuqeXyvOUc
🤓 ME ELSEWHERE
🏡: https://hynek.me/
🐘: https://mastodon.social/@hynek/
🦋: https://bsky.app/profile/hynek.me
🅇: https://twitter.com/hynek
🧵: https://www.threads.net/@the_hynek
✉️ Newsletter: https://buttondown.email/hynek
❤️ Support my work: https://hynek.me/say-thanks/
🙏 CREDITS
Outro Music: @RPLKTR / https://rplktr.com/
Other music and material licensed from Envato.
📖 Chapters
00:00 Failures are a fact of life.
01:03 Who's talking?
01:54 Why retries?
02:46 Dangers of retries.
03:23 Backoffs
04:38 Timeouts
05:00 Jitters ☕️
05:20 A brief glimpse at advanced problems (feat. elementary school math)
06:44 Python: Tenacity
08:02 Introducing stamina
08:43 stamina: decorator
10:35 stamina: loop + context manager
11:03 stamina: RetryingCaller / AsyncRetryingCaller
12:02 stamina: testing helpers
13:04 stamina: instrumentation
14:29 Summary & takeaways
15:18 Thank you & goodbye! ❤️