MENU

Fun & Interesting

Big Data Engineer Mock Interview | Questions on Data Skewness | Salting | Out of Memory Error

Sumit Mittal 10,872 11 months ago
Video Not Working? Fix It Now

๐“๐จ ๐ž๐ง๐ก๐š๐ง๐œ๐ž ๐ฒ๐จ๐ฎ๐ซ ๐œ๐š๐ซ๐ž๐ž๐ซ ๐š๐ฌ ๐š ๐‚๐ฅ๐จ๐ฎ๐ ๐ƒ๐š๐ญ๐š ๐„๐ง๐ ๐ข๐ง๐ž๐ž๐ซ, ๐‚๐ก๐ž๐œ๐ค https://trendytech.in/?src=youtube&sub=mockdec for curated courses developed by me. ๐–๐š๐ง๐ญ ๐ญ๐จ ๐Œ๐š๐ฌ๐ญ๐ž๐ซ ๐’๐๐‹? ๐‹๐ž๐š๐ซ๐ง ๐’๐๐‹ ๐ญ๐ก๐ž ๐ซ๐ข๐ ๐ก๐ญ ๐ฐ๐š๐ฒ ๐ญ๐ก๐ซ๐จ๐ฎ๐ ๐ก ๐ญ๐ก๐ž ๐ฆ๐จ๐ฌ๐ญ ๐ฌ๐จ๐ฎ๐ ๐ก๐ญ ๐š๐Ÿ๐ญ๐ž๐ซ ๐œ๐จ๐ฎ๐ซ๐ฌ๐ž - ๐’๐๐‹ ๐‚๐ก๐š๐ฆ๐ฉ๐ข๐จ๐ง๐ฌ ๐๐ซ๐จ๐ ๐ซ๐š๐ฆ! "๐€ 8 ๐ฐ๐ž๐ž๐ค ๐๐ซ๐จ๐ ๐ซ๐š๐ฆ ๐๐ž๐ฌ๐ข๐ ๐ง๐ž๐ ๐ญ๐จ ๐ก๐ž๐ฅ๐ฉ ๐ฒ๐จ๐ฎ ๐œ๐ซ๐š๐œ๐ค ๐ญ๐ก๐ž ๐ข๐ง๐ญ๐ž๐ซ๐ฏ๐ข๐ž๐ฐ๐ฌ ๐จ๐Ÿ ๐ญ๐จ๐ฉ ๐ฉ๐ซ๐จ๐๐ฎ๐œ๐ญ ๐›๐š๐ฌ๐ž๐ ๐œ๐จ๐ฆ๐ฉ๐š๐ง๐ข๐ž๐ฌ ๐›๐ฒ ๐๐ž๐ฏ๐ž๐ฅ๐จ๐ฉ๐ข๐ง๐  ๐š ๐ญ๐ก๐จ๐ฎ๐ ๐ก๐ญ ๐ฉ๐ซ๐จ๐œ๐ž๐ฌ๐ฌ ๐š๐ง๐ ๐š๐ง ๐š๐ฉ๐ฉ๐ซ๐จ๐š๐œ๐ก ๐ญ๐จ ๐ฌ๐จ๐ฅ๐ฏ๐ž ๐š๐ง ๐ฎ๐ง๐ฌ๐ž๐ž๐ง ๐๐ซ๐จ๐›๐ฅ๐ž๐ฆ." ๐‡๐ž๐ซ๐ž ๐ข๐ฌ ๐ก๐จ๐ฐ ๐ฒ๐จ๐ฎ ๐œ๐š๐ง ๐ซ๐ž๐ ๐ข๐ฌ๐ญ๐ž๐ซ ๐Ÿ๐จ๐ซ ๐ญ๐ก๐ž ๐๐ซ๐จ๐ ๐ซ๐š๐ฆ - ๐‘๐ž๐ ๐ข๐ฌ๐ญ๐ซ๐š๐ญ๐ข๐จ๐ง ๐‹๐ข๐ง๐ค (๐‚๐จ๐ฎ๐ซ๐ฌ๐ž ๐€๐œ๐œ๐ž๐ฌ๐ฌ ๐Ÿ๐ซ๐จ๐ฆ ๐ˆ๐ง๐๐ข๐š) : https://rzp.io/l/SQLINR ๐‘๐ž๐ ๐ข๐ฌ๐ญ๐ซ๐š๐ญ๐ข๐จ๐ง ๐‹๐ข๐ง๐ค (๐‚๐จ๐ฎ๐ซ๐ฌ๐ž ๐€๐œ๐œ๐ž๐ฌ๐ฌ ๐Ÿ๐ซ๐จ๐ฆ ๐จ๐ฎ๐ญ๐ฌ๐ข๐๐ž ๐ˆ๐ง๐๐ข๐š) : https://rzp.io/l/SQLUSD I have trained over 20,000+ professionals in the field of Data Engineering in the last 5 years. BIG DATA INTERVIEW SERIES This mock interview series is launched as a community initiative under Data Engineers Club aimed at aiding the community's growth and development Our highly experienced guest interviewer, Chandrali Sarkar, https://www.linkedin.com/in/chandrali-sarkar-4570a1102/ shares invaluable insights and practical guidance drawn from her extensive expertise in the Big Data Domain. Our expert guest interviewee, Soumya Ranjan Parida, https://www.linkedin.com/in/soumya-parida/ has an interesting approach to answering the interview questions on Apache Spark, SQL and Azure Cloud Services. Link of Free SQL & Python series developed by me are given below - SQL Playlist - https://www.youtube.com/playlist?list=PLtgiThe4j67rAoPmnCQmcgLS4iIc5ungg Python Playlist - https://www.youtube.com/playlist?list=PLtgiThe4j67pQSwkaEF9uHXzr8Td9IEpV Don't miss out - Subscribe to the channel for more such informative interviews and unlock the secrets to success in this thriving field! Social Media Links : LinkedIn - https://www.linkedin.com/in/bigdatabysumit/ Twitter - https://twitter.com/bigdatasumit Instagram - https://www.instagram.com/bigdatabysumit/ Student Testimonials - https://trendytech.in/#testimonials TIMESTAMPS : Questions Discussed 00:35 Introduction 01:40 Explain your project's end-to-end pipeline and overview. 03:17 What is the data source for your project? 03:36 Where does the data get ingested? 04:36 What types of data are being processed? 05:04 How do you capture incremental data in an OLTP environment? 07:52 What is the frequency and volume of the incoming data? 08:28 Which file formats have you worked with? 09:00 What is the predicate pushdown? 10:14 What optimizations have you applied in Spark? 10:45 Define broadcast join. 11:10 List some transformations you've used in Spark. 11:27 Explain narrow and wide transformations. 12:03 What is the difference between reduceByKey and groupByKey. 12:56 Have you encountered "out of memory" errors in Spark? How did you resolve them? 14:22 How will salting help in resolving out of memory error? 14:46 What is data skewness? 15:22 Explain cache and persist in Spark. 16:57 If memory and disk are full then in that case what will happen? 17:40 When would you use coalesce and repartition? 18:00 Provide a scenario where coalesce and repartition can be used? 18:38 Where does repartition happen at driver or executor level? 19:30 What is the difference between rank, dense rank, and row number functions? 22:06 Describe the internal process of submitting a Spark job. Music track: Retro by Chill Pulse Source: https://freetouse.com/music Background Music for Video (Free) Tags #mockinterview #bigdata #career #dataengineering #data #datascience #dataanalysis #productbasedcompanies #interviewquestions #apachespark #google #interview #faang #companies #amazon #walmart #flipkart #microsoft #azure #databricks #jobs

Comment