PySpark 101: Introduction to Big Data with Spark

Matt Layman 267 3 weeks ago

Video Not Working? Fix It Now

Unlock the PySpark for Big Data. This is a beginner-friendly course designed to introduce you to Apache Spark, a fast and scalable distributed computing framework. This class covers the fundamentals of PySpark, including: * Apache Spark Overview – Understand the core concepts and benefits of Spark for big data processing. * PySpark Essentials – Learn about RDDs (Resilient Distributed Datasets) for distributed computation and DataFrames for optimized, structured data handling. Using SQL. * Machine Learning with MLlib – Explore basic Spark’s scalable machine learning library for analytics and predictive modeling. Perfect for beginners in data engineering and analytics, this course will equip you with the foundational skills to process and analyze large datasets efficiently using PySpark. Presenter: Michael Jadoo Michael Jadoo is a data scientist at AFS. He has 15 years of experience in data production and data engineering for the federal government using SAS. He is a data science educator with experience in Python, R, data analysis, and machine learning.

Comment