MENU

Fun & Interesting

Open NLP Meetup 15: Web Data Extraction with Trafilatura & Deploying with Hayhooks and Open WebUI

deepset 210 lÆ°á»Łt xem 1 month ago
Video Not Working? Fix It Now

Sign up for the livestream next week: https://lu.ma/breaking-down-deepseek

đŸŽ™ïž TALKS​
1:10 Optimizing Web Data Extraction for NLP and LLMs with Trafilatura by Adrien Barbaresi

As the demand for Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) continues to grow, the need for better training data becomes increasingly critical. This talk introduces Trafilatura, a powerful open-source Python package and command-line tool that streamlines text discovery and extraction, from web crawling to robust and configurable extraction.
We will discuss how Trafilatura tackles common data quality issues, such as noise or missing metadata, and highlight its key features, including deduplication, content selection and multiple output formats. We will also explore Trafilatura's seamless integration with Haystack and explain how to make the most of existing parameters. Join us to discover how to transform raw HTML into meaningful data to improve model training and fine-tuning.
Trafilatura x Haystack: https://haystack.deepset.ai/integrations/trafilatura

​​31:26 Deploying an LLM Application with Hayhooks and OpenWeb UI by deepset Team

Deploying LLM applications doesn’t have to be complex. In this talk, we’ll walk through an end-to-end demo of deployment using Hayhooks, an open-source project for deploying Haystack pipelines. We’ll spotlight OpenWeb UI, an intuitive and customizable interface tailored to LLM applications, and discuss its pivotal role in enhancing user experience. Whether you’re a developer, researcher, or AI enthusiast, you’ll learn the practical tools to go from concept to deployment with ease. As a bonus, we’ll cover how concurrent requests and streaming responses can be handled with this approach
Hayhooks: https://github.com/deepset-ai/hayhooks

Slides:
https://adrien.barbaresi.eu/deepset/
https://docs.google.com/presentation/d/1z3AjAyt6RYmGtrFAE-T_CBb6fCv7zAlpR1yj9mVCIME

#haystack #opensource #opennlp

Comment