Train and Deploy a Multimodal AI Model: PyTorch, AWS, SageMaker, Next.js 15, React, Tailwind (2025)

Andreas Trolle 28,666 lượt xem 3 months ago

Video Not Working? Fix It Now

Source code
AI Model: https://github.com/Andreaswt/ai-video-sentiment-model
API SaaS: https://github.com/Andreaswt/ai-video-sentiment-saas
Discord & More: https://andreastrolle.com

Hi 🤙 In this video, you'll learn how to train and deploy a multimodal AI model from scratch using PyTorch. The model will accept a video as its input, and predict its sentiment and emotion. When training the model, you'll build features like text, video, and audio encoding, multimodal fusion, and emotion and sentiment classification. After training and deploying the model, you'll build a SaaS around your trained model, where users can run inference on their videos through your API. You'll set up invocation of the deployed model with SageMaker Endpoints, and manage the monthly quotas users have. The SaaS will be built with technologies such as Next.js, React, Tailwind, and Auth.js and is based off of the T3 Stack. You'll be able to build along with me from start to finish.

Excalidraw drawing + Model files (with and without class imbalance fix)
https://drive.google.com/drive/folders/1f5tOlIixDUeYtzzIdctQRb_-qllzAMQd?usp=sharing

Dataset
MELD: https://affective-meld.github.io/

Features
🎥 Video sentiment analysis
📺 Video frame extraction
🎙️ Audio feature extraction
📝 Text embedding with BERT
🔗 Multimodal fusion
📊 Emotion and sentiment classification
🚀 Model training and evaluation
📈 TensorBoard logging
🚀 AWS S3 for video storage
🤖 AWS SageMaker endpoint integration
🔐 User authentication with Auth.js
🔑 API key management
📊 Usage quota tracking
📈 Real-time analysis results
🎨 Modern UI with Tailwind CSS

💲Costs + How to follow along for free
One full training job run costs ~15 USD. When deploying the endpoint it’s ~1.5 USD per hour of uptime. S3 is really cheap. IAM roles, users etc are free.

If you want to not use money:
-Don’t create the S3 bucket
Then you also won’t need the EC2 instance for downloading the dataset
-Don’t start a training job, but download my provided model from Google Drive to play around with locally
-Don’t deploy the endpoint. When building the SaaS part, use the dummy data I write in the video before calling the actual endpoint
-You can look into using free tier instances from AWS, so you can play around AWS
-You can of course still follow the video, learn the concepts and code along

📖 Chapters
00:00:00 Demo
00:02:14 Project initialization
00:06:38 What we’ll build
00:19:01 Training theory
00:42:42 Fitting
00:47:45 Representing data in ML
00:50:00 Our model
01:13:31 Extracting dataset
01:18:03 Dataset class architecture
01:25:36 Dataset class implementation
02:27:13 Model architecture
02:44:21 Model implementation
03:41:54 Logging with TensorBoard
04:06:25 Counting model parameters
04:12:24 Train script implementation
04:35:51 FFMPEG installation on instance
04:45:15 SageMaker training job creation script
04:50:04 AWS infrastructure
05:11:02 Downloading dataset to S3 with EC2
05:22:40 Creating training jobs
05:35:36 Class weights for class imbalances
05:55:20 Checking TensorBoard logs
06:01:18 Inference script
06:29:00 Local inference
06:33:00 Comparing with state-of-the-art models
06:35:55 Deploying endpoint
06:56:10 IAM user for endpoint invocation
07:00:28 Initializing Next.js project
07:06:19 Auth
07:58:00 Dashboard setup
08:03:17 Database schema
08:14:55 Docs part of dashboard
08:41:30 Endpoint for S3 signed url
08:54:14 Endpoint for inference
09:04:02 Invoke endpoint
09:08:39 API demo in dashboard
09:56:38 Deploying endpoint
09:57:44 End-to-end testing and debugging
10:05:02 Successful E2E example
10:05:32 Fixing up docs
10:08:23 Timeout issue
10:11:06 Closing notes

Comment