MENU

Fun & Interesting

Process Excel files in Azure with Data Factory and Databricks | Tutorial

Video Not Working? Fix It Now

Excel files are one of the most commonly used file format on the market. Popularity of the tool itself among the business users, business analysts and data engineers is driven by its flexibility, ease of use, powerful integration features and low price. This is why every data engineer out there should be to understand advantages and disadvantages of this format. The variety of different internal formats like XLS, XLSX, XLSB and XLSM and which tools to use in order to process those files effectively in the cloud. Today I bring to you a quick introduction to the process of building ETL solutions with Excel files in Azure using Data Factory and Databricks services. Code samples: https://github.com/MarczakIO/azure4everyone-samples/tree/master/azure-excel-file-processing-with-data-factory-and-databricks Agenda 00:00 Introduction 00:25 Excel Business Justification 01:22 Excel Challenges 02:20 Supported Services 04:30 Data Factory Introduction 05:35 Demo Setup 07:13 Demo using Data Factory 13:36 Databricks Introduction 14:44 Databricks Setup 18:14 Databricks Demo - Reading Excels 20:55 Databricks Demo - Reading Excels using References 25:56 Databricks Demo - Workbook Metadata 28:05 Databricks Demo - Defining Schema 30:03 Databricks Demo - Defining Schema 32:53 Additional Options Next steps for you after watching the video 1. Excel format in Data Factory - https://docs.microsoft.com/en-us/azure/data-factory/format-excel 2. Spark Excel by Crealytics documentation - https://github.com/crealytics/spark-excel ### Want to connect? - Blog https://marczak.io/ - Twitter https://twitter.com/MarczakIO - Facebook https://www.facebook.com/MarczakIO - LinkedIn https://www.linkedin.com/in/adam-marczak/ - Site https://azure4everyone.com

Comment