Gemini 2.0 flash has been a substantial upgrade from the Gemini 1.5 which improves a lot of the multimodal capability of the model.
One of the highlights of the new upgrade is the spatial understanding of the model, which can identify and reason with the things it sees on an image or video.
In this tutorial, we will be utilizing the Spatial capabilities of Gemini 2.0 to build a multi-purpose application.
---
🔥 *Resources*
Code: https://github.com/yankeexe/llm-gemini-2.0-spatial-demo
---
⚡️ *Follow me*
- Github: https://github.com/yankeexe
- LinkedIn: https://www.linkedin.com/in/yankeemaharjan
- Twitter (X): https://x.com/yankexe
- Website: https://yankee.dev
--
🎞️ Chapters
0:00 Intro
0:11 Demo-1
0:51 Demo-2
1:31 Demo-3
1:48 Demo-4
3:12 Code: Application Frontend
8:00 Code: Image Resize
11:02 Code: Calling LLM
16:18 Code: LLM Bounding Box Co-ordinates
18:06 Code: Plot Bounding Box
24:59 Completed App Demo
25:52 Outro