MENU

Fun & Interesting

πŸš€ Databricks & PySpark Full Course | Master Big Data Processing from Scratch

DataToCrunch 681 lượt xem 2 months ago
Video Not Working? Fix It Now

Welcome to the Databricks & PySpark full-course tutorial! If you're looking to become a Data Engineer, Data Analyst, or Big Data Expert, this video is for you. We take a hands-on approach to mastering Databricks using PySpark, covering everything from basics to advanced concepts.

This is your one-stop guide to understanding how PySpark works inside Databricks, with real-world examples and practical demonstrations.

πŸ’‘ Why Learn Databricks & PySpark?

Fast & Scalable: Apache Spark is 100x faster than traditional big data tools.
Easy to Use: PySpark makes working with big data simple & efficient.
Industry Demand: Databricks and Spark are widely used in AI, ML, and Data Engineering.

Cloud-Ready: Works with AWS, Azure, and Google Cloud.

πŸ”₯ What You’ll Learn in This Course
πŸ“Œ Step 1: Loading & Understanding Data
βœ… Reading CSV files in Databricks
βœ… Checking schema and data types

πŸ“Œ Step 2: Data Cleaning & Transformation
βœ… Renaming columns for better clarity
βœ… Converting categorical values into numerical format
βœ… Handling NULL values using fillna(), dropna()

πŸ“Œ Step 3: Working with DataFrames
βœ… Using filter(), sort() operations
βœ… Performing column operations like withColumn(), alias(), and cast()

πŸ“Œ Step 4: Advanced PySpark Functions
βœ… Using explode(), collect_list(), pivot(), when(), otherwise()
βœ… String functions: initcap(), upper(), lower()
βœ… Date functions: current_date(), datediff(), date_add(), year(), month()

πŸ“Œ Step 5: Joins & Data Merging
βœ… Inner, Left, Right & Outer Joins
βœ… Union & UnionByName for combining datasets

πŸ“Œ Step 6: Window Functions & Ranking
βœ… Using rank(), dense_rank(), and cumulative sum()
βœ… Partitioning and ordering data efficiently

πŸ“Œ Step 7: User Defined Functions (UDFs)
βœ… Writing custom functions in PySpark
βœ… Applying UDFs to transform and clean data

πŸ“Œ Step 8: Writing & Saving Data
βœ… Writing datasets in CSV, JSON, ORC, and Delta formats
βœ… Overwriting, appending, and handling errors while saving

πŸ“Œ Timestamps -
00:00:00 - Intro
00:00:28 - A] Agenda
00:00:51 - B] Data Understanding
00:02:05 - C] Compute Creation
00:02:43 - D] Data Ingestion
00:03:14 - E] Folder & Notebook Creation
00:05:04 - F] Data Reading
00:11:39 - G] Data Cleaning & Transformation
00:11:45 - 1. Column Name Rename
00:14:44 - 2. When - Otherwise, Col, Lit
00:20:08 - 3. WithColumn, regexp_replace, Col
00:22:41 - 4. FillNa
00:25:42 - 5. Select, Alias
00:28:25 - 6. Filter
00:30:16 - 7. Sort
00:32:18 - 8. DropDuplicates
00:33:42 - 9. Select, Initcap, Lower, Upper
00:36:19 - 10. DropNa
00:41:37 - 11.FillNa - Another Example
00:43:58 - 12. Drop Column
00:45:01 - 13. Joins - Inner, Left, Right, Outer
00:56:07 - 14.Union & UnionByName
01:01:23 - 15. Date Functions
01:13:33 - 16. Array
01:16:08 - 17. Explode
01:18:44 - 18. Collect_List
01:22:58 - 19. Count
01:24:49 - 20.PIVOT
01:30:07 - 21. When-Otherwise
01:33:15 - 22. Window - Rank & Dense Rank
01:40:05 - 23. Cumulative Sum
01:44:51 - 24.User Defined Function
01:54:58 - 25. Data Export with different modes & different formats
02:05:09 - Conclusion

πŸ“Œ Dataset Used: https://www.kaggle.com/datasets/arashnic/hr-analytics-job-change-of-data-scientists

πŸ“Œ Relevant Videos:
1. From Data to Business Insights: PySpark on Databricks for Amazon Prime Dataset Analysis πŸ“ŠπŸš€ - https://youtu.be/7aZGAf8Luys?si=c7N4eVkX07DYva2_

2. Databricks Journey Begins: Compute, Catalog, Workflows, Data Management, and More! - https://youtu.be/4qreAFJfID4?si=my-Y7qD69SfAS-zP

3. Apache Spark & Databricks: Lazy Evaluation| Fault Tolerance| DAG| Catalyst Optimizer(Theory) - Part 4 - https://youtu.be/12IDOqhsv2w?si=Dfxi2WZZkzUdaRoe

4. Spark & Databricks: RDDs| DataFrames| Datasets| Spark Ecosystem| RDD Operations (Theory) - Part 3 - https://youtu.be/5Ckap52tuHk?si=hbPYC7U4zXHeYx7l

5. Spark & Databricks - Spark Architecture |Memory Management |Application Workflow (Theory) - Part 2 - https://youtu.be/T6CGh-R9C84?si=koVHbkD2Cks9z2w_

6. Introduction to Apache Spark | Databricks (Theory) - Part 1 - https://youtu.be/lbFax1jxSec?si=WNWL7nhon8mJ-Wmf

πŸ‘ Like, Share & Subscribe for more Big Data & Analytics tutorials!
πŸ”” Turn on notifications to stay updated on new videos.

#databricks #databricksforbeginners #spark #apachespark #pyspark #bigdata #bigdataanalytics #dataengineering #dataengineer #machinelearning #datavisualization #python #databricksai

Comment