Build Apps with Gemini 2.0 API with Spatial Understanding (Code Walkthrough)

Yankee Maharjan 3,317 4 months ago

Video Not Working? Fix It Now

Gemini 2.0 flash has been a substantial upgrade from the Gemini 1.5 which improves a lot of the multimodal capability of the model. One of the highlights of the new upgrade is the spatial understanding of the model, which can identify and reason with the things it sees on an image or video. In this tutorial, we will be utilizing the Spatial capabilities of Gemini 2.0 to build a multi-purpose application. --- 🔥 Resources Code: https://github.com/yankeexe/llm-gemini-2.0-spatial-demo --- ⚡️ Follow me - Github: https://github.com/yankeexe - LinkedIn: https://www.linkedin.com/in/yankeemaharjan - Twitter (X): https://x.com/yankexe - Website: https://yankee.dev -- 🎞️ Chapters 0:00 Intro 0:11 Demo-1 0:51 Demo-2 1:31 Demo-3 1:48 Demo-4 3:12 Code: Application Frontend 8:00 Code: Image Resize 11:02 Code: Calling LLM 16:18 Code: LLM Bounding Box Co-ordinates 18:06 Code: Plot Bounding Box 24:59 Completed App Demo 25:52 Outro

Comment