MENU

Fun & Interesting

Realtime Socket Streaming with Apache Spark | End to End Data Engineering Project

CodeWithYu 20,671 1 year ago
Video Not Working? Fix It Now

In this video, you will be building a real-time data streaming pipeline with a dataset of 7 million records. We'll utilize a powerful stack of tools and technologies, including TCP/IP Socket, Apache Spark, OpenAI Large Language Model (LLM), Kafka, and Elasticsearch. MORE FREE COURSES: https://datamasterylab.com 📚 What You'll Learn: 👉 Setting up and configuring TCP/IP for data transmission over Socket. 👉 Streaming Data With Apache Spark from Socket 👉 Realtime Sentiment Analysis with OpenAI LLM (ChatGPT) 👉 Prompt Engineering 👉 Setting up Kafka for real-time data ingestion and distribution. 👉 Using Elasticsearch for efficient data indexing and search capabilities. ✨ Timestamps: ✨ 0:00 Introduction 01:10 Creating Spark Master-worker architecture with Docker 10:40 Setting up the TCP IP Socket Source Stream 23:25 Setting up Apache Spark Stream 42:56 Setting up Kafka Cluster on confluent cloud 47:12 Getting Keys for Kafka cluster and Schema Registry 1:12:53 Realtime Sentiment Analysis with OpenAI LLM (ChatGPT) 1:24:10 Setting up Elasticsearch deployment on Elastic cloud 1:30:50 Realtime Data Indexing on Elasticsearch 1:36:05 Testing and Results 1:41:50 Outro 👦🏻 My Linkedin: https://www.linkedin.com/in/yusuf-ganiyu-b90140107/ 🚀 Twitter: https://twitter.com/YusufOGaniyu 📝 Medium: https://medium.com/@yusuf.ganiyu 🌟 Please LIKE ❤️ and SUBSCRIBE for more AMAZING content! 🌟 🔗 Useful Links and Resources: ✅ Code: https://github.com/airscholar/RealtimeStreamingEngineering ✅ Medium Article: https://medium.com/@yusuf.ganiyu/real-time-streaming-for-sentiment-analysis-with-sockets-spark-openai-kafka-and-elasticsearch-a577b35a7cb9 ✅ Customer Reviews Dataset: https://www.yelp.com/dataset/ ✅ Confluent Cloud Docs: https://docs.confluent.io/cloud/current/overview.html ✅ Elasticsearch Documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html ✅ Docker Compose Documentation: https://docs.docker.com/compose/ ✅ Apache Kafka Official Site: https://kafka.apache.org/ ✅ Apache Spark Official Site: https://spark.apache.org/ ✨ Tags ✨ Data Engineering, Apache Airflow, Kafka, Apache Spark, Cassandra, PostgreSQL, Zookeeper, Docker, Docker Compose, ETL Pipeline, Data Pipeline, Big Data, Streaming Data, Real-time Analytics, Kafka Connect, Spark Master, Spark Worker, Schema Registry, Control Center, Data Streaming, Real-time Data Streaming, OpenAI LLM, Elasticsearch, Data Processing, Data Analytics, TCP/IP, Streaming Solutions, Data Ingestion, Real-time Analysis, Spark Configuration, OpenAI Integration, Kafka Topics, Elasticsearch Indexing, Data Storage, Stream Processing, Machine Learning Integration ✨ Hashtags ✨ #confluent #DataEngineering #TCP #TCPIP #sockets #socketstreaming #Kafka #ApacheSpark #Docker #ETLPipeline #DataPipeline #DataStreaming #OpenAI #Elasticsearch #RealTimeData #BigData #TechTutorial #StreamingAnalytics #MachineLearning #DataFlow #SparkStreaming #DataScience #AIIntegration #RealTimeAnalytics #StreamingData #realtimestreaming #realtime

Comment