Build an AI Document (PDF, DOC, XML) Processing Pipeline for RAG | Docling, OCR, Chunking, Images

Venelin Valkov 4,586 lượt xem 3 weeks ago

Video Not Working? Fix It Now

Full-text tutorial with source code (requires MLExpert Pro): https://www.mlexpert.io/v2-bootcamp/document-processing-for-ai

Step-by-step tutorial on building an AI document processing pipeline - completely local. Convert PDFs, perform OCR, use VLMs for images, apply LLM semantic chunking, and add context. Get your documents ready for RAG and AI models.

Docling: https://docling-project.github.io/docling/
Chunking evaluation: https://research.trychroma.com/evaluating-chunking
PDF document: https://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-fourth-quarter-and-fiscal-2025

AI Bootcamp: https://www.mlexpert.io/
LinkedIn: https://www.linkedin.com/in/venelin-valkov/
Follow me on X: https://twitter.com/venelin_valkov
Discord: https://discord.gg/UaNPxVD6tv
Subscribe: http://bit.ly/venelin-subscribe
GitHub repository: https://github.com/curiousily/AI-Bootcamp

👍 Don't Forget to Like, Comment, and Subscribe for More Tutorials!

00:00 - Welcome
01:01 - Document processing pipeline
02:07 - Full-text tutorial and source code on MLExpert.io
02:41 - Docling
03:53 - PDF document sample
04:38 - Notebook setup
05:45 - PDF to Markdown (OCR, layout analysis, image to text)
08:45 - Visual inspection
11:02 - Image annotations
14:37 - Chunking with Ollama (and Gemma 3)
19:58 - Contextual enrichment (retrieval)
21:50 - Test the pipeline with simple RAG
24:42 - Conclusion

Join this channel to get access to the perks and support my work:
https://www.youtube.com/channel/UCoW_WzQNJVAjxo4osNAxd_g/join

#rag #ocr #documentprocessing #ollama #chatgpt #python #artificialintelligence

Machine Learning

Artificial Intelligence

Data Science

Deep Learning

Comment