MENU

Fun & Interesting

Web Scraping Databases with Mechanical Soup and SQlite

Python Simplified 93,698 3 years ago
Video Not Working? Fix It Now

Hi Everyone! In this step by step tutorial, we will extract a huge table of data from the internet and store it inside an SQLite database! To keep things simple I've chosen a Wikipedia table, but I highly encourage you to apply the same principles on data that updates a bit more frequently (for example weather forecasts) 😃 If you're curious about my IDE - I'm using Wayscript which is now available for the wide public! you no longer need an invitation, you can simply sign up with the following link: https://app.wayscript.com ⭐clone complete tutorial code⭐ https://app.wayscript.com/lairs/517c9eb3-a662-41ec-9fe8-c09b2a7559bc/public ⏰ TIMESTAMPS ⏰ *************************************** 00:00 - intro 00:34 - imports and installs 01:42 - web scraping with mechanical soup 02:20 - select HTML table elements 03:47 - extract element attributes 06:11 - find the index value of a list item 07:13 - extract multiple columns of table data 09:44 - organize extracted columns 12:44 - enumerate function 14:02 - dictionary to data frame 14:53 - create SQLite database 15:36 - create SQLite table 16:35 - insert Pandas data frame into SQlite table 17:26 - save data permanently inside database file 18:49 - thanks for watching! 💻 CODE AND IMPORTANT LINKS 💻 *************************************** ⭐ URL used in the tutorial: https://en.wikipedia.org/wiki/Comparison_of_Linux_distributions ⭐ complete code repository on Github: https://github.com/MariyaSha/WebscrapingDatabases ⭐install SQLite on Linux: sudo apt install sqlite3 ⭐install SQLite on Windows: Download the Precompiled Binaries for Windows zip file from SQLite docs: https://www.sqlite.org/download.html ⭐install SQLite on MAC or Anaconda: no need to install - you already have it! 😁 ⭐ code used in the tutorial: column_names = ["Founder", "Maintainer", "Initial_Release_Year", "Current_Stable_Version", "Security_Updates", "Release_Date", "System_Distribution_Commitment", "Forked_From", "Target_Audience", "Cost", "Status"] 📽️ RELATED TUTORIALS📽️ *************************************** 🌞 Much Better HTML table Web Scraping with Pandas: https://youtu.be/oF-EMiPZQGA 🌞 SQLite Databases for Beginners: https://youtu.be/Ohj-CqALrwk 🌞 Web Scraping Images with Mechanical Soup: https://youtu.be/drDdb1MBBfI 🌞 Web Scraping Text with Beautiful Soup: https://youtu.be/ySNSY7iiBDY

Comment