In this talk, Daniel Lenton will explore how the AI deployment stack has become increasingly fragmented in recent years, with new serving technology, compression techniques, compiler workflows and hardware all entering the scene at unprecedented speed. All of this results in tools that are not compatible, excessive friction, and lost performance. Daniel explores a way forward, through unification at the level of the machine learning frameworks, and he will explain significant runtime improvements which can be achieved by simply converting code between different ML frameworks, each of which have different runtime characteristics on different hardware. We will then discuss a more holistic approach to unification, looking at tools such as MLIR, and the role this plays in combining different low-level compiler toolchains such as TensorRT, OpenAI Triton, OpenXLA, OpenVINO etc., before finally exploring how the AI stack might continue to evolve in the years to come.