MENU

Fun & Interesting

Video Frame Interpolation, Video Restoration & Multi-Shot Video Understanding | Multimodal Weekly 77

TwelveLabs 121 lượt xem 3 weeks ago
Video Not Working? Fix It Now

​​​​​​In the 77th session of Multimodal Weekly, we had three exciting presentations on video frame interpolation, video restoration, and multi-shot video understanding.

​​​​​​​​​​​​​​​​​​​​✅ Zujin Guo presented Generalizable Implicit Motion Modeling (GIMM), a novel and effective approach to motion modeling for Video Frame Interpolation.
- Connect with Zujin: https://gseancdat.github.io/
- GIMM-VFI: https://gseancdat.github.io/projects/GIMMVFI

​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​✅ Kamran Janjua presented Turtle, a method to learn the truncated causal history model for efficient and high-performing video restoration.
- Connect with Kamran: https://kjanjua26.github.io/
- Turtle: https://kjanjua26.github.io/turtle/
​​​​​​​​
​​​​✅ Mingfei Han presented Shot2Story, a new multi-shot video understanding benchmark dataset with detailed shot-level captions, comprehensive video summaries and question-answering pairs.
- Connect with Mingfei: https://mingfei.info/
- Shot2Story: https://mingfei.info/shot2story/

Timestamps:
00:07 Introduction
03:28 Zujin starts
03:47 Video Frame Interpolation
04:05 Method - Preliminary
04:58 Motivation
06:03 Method - GIMM
08:36 Method - GIMM-VFI
09:36 Motion Modeling Evaluation
11:43 Ablation Studies
13:08 Interpolation Evaluation
13:57 Gallery
14:19 Perceptually Enhanced GIMM-VFI and Qualitative Improvement
14:54 Find GIMM-VFI on GitHub
15:15 Q&A with Zujin
19:55 Kamran starts
21:07 Learning, Processing Streaming Videos and Histories
23:00 Truncated Causal History Model
29:24 Is Causal History Model necessary?
30:40 Two Views of Turtle
32:15 Distinct Features
33:22 Some Selected Results
36:30 Q&A with Kamran
48:30 Mingfei starts
48:45 Video clip - What does the video convey?
49:10 Video clip - Understand the multi-short video clip
50:40 Shot2Story
52:30 Data distribution of Shot2Story (visual captions, narration captions, and multi-short video QA pairs)
55:34 Human involved and rectified text annotations
57:57 Baseline
59:10 Benchmark performance in video shot captioning, multi-shot video summarization, and multi-shot video question answering tasks
01:01:54 Zero-shot VQA with video summaries generated by the model
01:03:40 Q&A with Mingfei

Join the Multimodal Minds community to receive an invite for future webinars: https://discord.gg/CzeUNYr5Bt

Comment