Realtime Streaming with Unstructured Data Engineering | Get Hired as an Experienced Data Engineer
In this video you will be building a realtime streaming pipeline for unstructured data with different data types (TEXT, IMAGE, VIDEO, CSV, JSON, PDF) with over 600+ different datasets.
MORE DATA ENGINEERING VIDEOS AVAILABLE on datamasterylab.com
Like this video?
- Buy me a coffee: https://www.buymeacoffee.com/yusuf.ganiyu
- Support the channel: https://www.youtube.com/@codewithyu/join
Timestamps:
0:00 Introduction
1:50 System Architecture Overview
4:08 System Architecture Design
13:22 Setting up Spark Streaming for Unstructured Data
21:46 Handling multiple unstructured data types
24:31 Creating data schema
30:35 Creating custom user define functions for data extraction
51:14 Parsing and extracting text data
1:40:30 Structuring the results into a dataframe
1:46:15 Reading JSON structured files into the streams
1:49:47 Joining Structured and Unstructured Data Streams
1:52:50 Writing Data to AWS S3 Bucket
2:04:20 Creating AWS Glue Crawler for the data
2:08:25 Verifying the crawler results on Athena
2:11:36 Deploying Spark Streams to Spark Clusters
2:26:31 Verification of Results
2:29:40 Outro
👦🏻 My Linkedin: https://www.linkedin.com/in/yusuf-ganiyu-b90140107/
🚀 X(Twitter): https://x.com/YusufOGaniyu
📝 Medium: https://medium.com/@yusuf.ganiyu
🌟 Please LIKE ❤️ and SUBSCRIBE for more AMAZING content! 🌟
🔗 Useful Links and Resources:
✅ Source Code and Datasets: https://www.buymeacoffee.com/yusuf.ganiyu/source-code-real-time-streaming-pipelines-unstructured-data
✅ Docker Compose Documentation: https://docs.docker.com/compose/
✅ Apache Spark Official Site: https://spark.apache.org/
✅ Confluent Docs: https://docs.confluent.io/home/overview.html
✅ S3 Documentation: https://docs.aws.amazon.com/s3/
✅ AWS IAM Documentation: https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html
✨ Tags ✨
Data Engineering, Apache Spark, Unstructured Data, Docker, Docker Compose, ETL Pipeline, Data Pipeline, Big Data, Streaming Data, Real-time Analytics, Kafka Connect, Spark Master, Spark Worker, Schema Registry, Control Center, Data Streaming
✨ Hashtags ✨
#DataEngineering #ApacheSpark #unstructureddata #Docker #ETLPipeline #DataPipeline #StreamingData #RealTimeAnalytics
17
views
Smart City End to End Realtime Data Engineering Project | Get Hired as an AWS Data Engineer
In this video, you will be building a Smart City End to End Realtime data streaming pipeline covering each phase from data ingestion to processing and finally storage. We'll utilize tools like IOT devices, Apache Zookeeper, Apache Kafka, Apache Spark, Docker, Python, AWS Cloud, AWS Glue, AWS Athena, AWS IAM, AWS Redshift and finally PowerBI to visualize data on Reshift.
Like this video?
- Buy me a coffee: https://www.buymeacoffee.com/yusuf.ganiyu
- Become a member: https://www.youtube.com/@codewithyu/join
Timestamps:
0:00 Introduction
1:29 System Architecture
7:22 Project Setup
9:00 Docker containers setup and coding
26:17 IOT services producer
38:19 Vehicle information Generator
48:10 GPS Information Generator
50:13 Traffic information Generator
53:13 Weather information Generator
58:35 Emergency Incident Generator
1:03:39 Producing IOT Data to Kafka
1:14:43 AWS S3 setup with policies
1:16:38 AWS IAM Roles and Credentials Management
1:19:14 Apache Spark Realtime Streaming from Kafka
2:01:14 Fixing Schema Issues in Apache Spark Structured Streaming
2:07:31 AWS Glue Crawlers
2:10:23 Working with AWS Athena
2:13:22 Loading Data into Redshift from AWS Glue Data Catalog
2:17:58 Connecting and Querying Redshift DW with DBeaver
2:20:51 Connecting Redshift to AWS Glue Catalog
2:23:34 Fixing IAM Permission issues with Redshift
2:26:05 Outro
👦🏻 My Linkedin: https://www.linkedin.com/in/yusuf-ganiyu-b90140107/
🚀 X(Twitter): https://x.com/YusufOGaniyu
📝 Medium: https://medium.com/@yusuf.ganiyu
🌟 Please LIKE ❤️ and SUBSCRIBE for more AMAZING content! 🌟
🔗 Useful Links and Resources:
✅ Docker Compose Documentation: https://docs.docker.com/compose/
✅ Apache Kafka Official Site: https://kafka.apache.org/
✅ Apache Spark Official Site: https://spark.apache.org/
✅ Confluent Docs: https://docs.confluent.io/home/overview.html
✅ S3 Documentation: https://docs.aws.amazon.com/s3/
✅ AWS IAM Documentation: https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html
✨ Tags ✨
Data Engineering, Apache Airflow, Kafka, Apache Spark, Cassandra, PostgreSQL, Zookeeper, Docker, Docker Compose, ETL Pipeline, Data Pipeline, Big Data, Streaming Data, Real-time Analytics, Kafka Connect, Spark Master, Spark Worker, Schema Registry, Control Center, Data Streaming
✨ Hashtags ✨
#confluent #DataEngineering #ApacheAirflow #Kafka #ApacheSpark #Cassandra #PostgreSQL #Docker #ETLPipeline #DataPipeline #StreamingData #RealTimeAnalytics
12
views
Kubernetes for Modern Data Engineering: An End to End Guide
Sign up To Datamasterylab.com for more Amazing Data Engineering Contents!
Become a channel member: https://www.youtube.com/@codewithyu/join
In this video, we dive deep into the world of Kubernetes, a powerful tool for managing containerized applications, and explore its applications in the field of data engineering.
📝 What You Will Learn:
✅ Setting Up Kubernetes on Docker: We start from the basics, showing you how to set up Kubernetes on Docker. This section is perfect for those new to Kubernetes or looking to refresh their knowledge.
✅ Mastering kubectl: Learn how to effectively use kubectl to manage your Kubernetes cluster. We'll cover essential commands and tips for smooth navigation.
✅ Deploying the Kubernetes Dashboard: Step-by-step guidance on deploying the Kubernetes Dashboard, a user-friendly interface for managing Kubernetes clusters.
✅ Running Apache Airflow with Helm Charts: Discover how to run Apache Airflow, a leading tool for orchestrating complex computational workflows, on Kubernetes using Helm charts.
Timestamps:
0:00 Introduction
1:05 Setting up Docker on your machine
3:42 Managing Resources for Docker and Kubernetes
4:20 Enabling Kubernetes on Docker Desktop
5:47 Setting up Kubectl and other Kubernetes Management Tools
11:06 Setting up Helm charts
13:08 Working with Kubectl commands
17:25 Setting up Kubernetes Dashboard and Managing the cluster
20:56 Generating and managing Kubernetes dashboard Token and secrets
22:54 Kubernetes Service Accounts
23:48 Kubernetes Role Based Access Control (RBAC)
25:20 Kubernetes Secrets and Tokens
29:10 Kubernetes Dashboard
36:38 Setting up Apache Airflow on Kubernetes with Helm Charts
40:13 Accessing Apache Airflow on Kubernetes
43:08 Reconfiguring Apache Airflow on Kubernetes
47:40 Connecting Apache Airflow DAGs to Kubernetes
1:01:19 Working with Multiple DAGs on Kubernetes
1:18:41 Optimising Airflow DAGs on Kubernetes
1:23:50 Outro
Resources:
Tags ✨:
Kubernetes, Data Engineering, Apache Airflow, Docker, Helm Charts, Kubernetes Tutorial, Data Processing, Cloud Computing, DevOps, Technology Education, Kubernetes Dashboard, kubectl, Containerization, Workflow Automation, Big Data, Tech Tutorial, Kubernetes Cluster Management, Kubernetes for Beginners, Scalable Data Engineering, IT Infrastructure, Cloud Services, Kubernetes in Data Science, Kubernetes and Airflow, Continuous Integration, Continuous Deployment
Hashtags ✨:
#Kubernetes #DataEngineering #ApacheAirflow #Docker #HelmCharts #DevOps #Tutorial #TechEducation #KubernetesTutorial #DataProcessing #CloudComputing #TechnologyTutorial #Learning #Education
22
views
CI/CD for Modern Data Engineering | End to End Data Engineering Project
In this video, we delve deep into the world of DevOps, focusing on Continuous Integration (CI) and Continuous Deployment (CD) within the realm of modern data engineering.
Enroll in my Data Engineering Mastery Course: datamasterylab.com
Timestamps:
0:00 Introduction
2:00 System architecture
4:00 Prerequisites and installations
9:21 Azure Infrastructure as Code Automation with Terraform
13:50 Storage account module with Terraform
32:52 Automating Azure Data Factory with Terraform
1:04:30 Testing and Validation of Results
1:09:14 Outro
Resources:
Github Code: https://github.com/airscholar/cicd_for_data_engineering.git
Terraform installation: https://developer.hashicorp.com/terraform/install
Azure CLI installation: https://learn.microsoft.com/en-us/cli/azure/
Tags:
CI/CD, Data Engineering, Azure, Terraform, DevOps, Cloud Computing, Continuous Integration, Continuous Deployment, Azure DevOps, Infrastructure as Code, Automation,Cloud Services, Azure Cloud, Terraform Automation, Data Pipeline
Hashtags:
#CICD #DataEngineering #Azure #Terraform #DevOps #CloudComputing #ContinuousIntegration #ContinuousDeployment #AzureDevOps #InfrastructureAsCode #Automation #CloudServices #AzureCloud #TerraformAutomation #DataPipeline #DevSecOps
16
views
Robust Data Pipelines with Apache Spark, DBT and Azure | End-to-End Data Engineering Project
As decided by the community, here is a teaser for the Apache Spark, Databricks, DBT and Cloud Provider project.
Timestamp:
0:00 Introduction
0:49 System Architecture
3:01 Creating resource groups on Azure
5:02 Setting up the medallion architecture storage account
8:46 Setting up Azure Data Factory
10:18 Azure Key Vault setup for secrets
14:19 Azure database with automatic data population
25:32 Azure Data Factory pipeline orchestration
47:00 Setting up Databricks
49:50 Azure Databricks Secret Scope and Key Vault
54:33 Verifying Databricks - Key Vault - Secret Scope Integration
1:06:00 Azure Data Factory - Databricks Integration
1:21:19 DBT Setup
1:24:15 DBT Configuration with Azure Databricks
1:32:12 DBT Snapshots with Azure Databricks and ADLS Gen2
1:45:06 DBT Datamarts with Azure Databricks and ADLS Gen2
1:55:00 DBT Documentation
1:58:58 Outro
If you find our content valuable, support us by joining our channel membership, where you'll get exclusive access to behind-the-scenes content, Q&A sessions, and much more!
https://www.youtube.com/channel/UCAEOtPgh29aXEt31O17Wfjg/join
💬 Join the Conversation:
We love hearing from you! Share your thoughts, questions, or experiences related to data engineering or this project in the comments below. Don't forget to like, subscribe, and hit the bell icon to stay updated with our latest content.
Tags:
Big Data, Data Engineering, Apache Spark, Databricks, DBT, Azure, Cloud Computing, Data Analytics, ETL, Data Warehouse, Technology, Analytics, Machine Learning, Data Science
Hashtags:
#BigData, #DataEngineering, #ApacheSpark, #Databricks, #DBT, #Azure, #CloudComputing, #DataAnalytics, #ETL, #DataWarehouse, #TechTalk, #MachineLearning, #DataScience, #BigDataAnalytics
🙏 Thank You for Watching!
Remember to subscribe and hit the bell icon for notifications. Stay curious and keep exploring the fascinating world of data engineering!
7
views
Realtime Streaming with Apache Flink | End to End Data Engineering Project
In this video, you will be building an end-to-end data engineering project using some of the most powerful technologies in the industry: Apache Flink, Kafka, Elasticsearch, and Docker. In this video, we dive deep into the world of real-time data processing and analytics, guiding you through every step of creating a robust, scalable data pipeline.
Timestamp
0:00 Introduction
0:55 The system architecture
08:00 Sales Analytics Data Generation
19:10 Producing Data into Kafka Broker
25:00 Setting up Apache Flink project
32:28 Consuming data from Kafka with Apache Flink
43:30 Starting Apache Flink on Mac
54:25 Writing Kafka Streams to Postgres Database
1:20:00 Aggregating Transactions per Category into Postgres
1:36:00 Aggregating Transactions Per Day into Postgres
1:39:46 Aggregating Transactions Per Month into Postgres
1:51:52 Writing Kafka Streams Data into Elasticsearch
2:05:00 Reindexing Data on Elasticsearch with Timestamp
2:10:52 Creating Streaming Dashboard on Elasticsearch
2:22:46 Realtime Dashboard Results
2:24:14 Recap
2:25:34 Outro
👦🏻 My Linkedin: https://www.linkedin.com/in/yusuf-ganiyu-b90140107/
🚀 Twitter: https://twitter.com/YusufOGaniyu
📝 Medium: https://medium.com/@yusuf.ganiyu
🌟 Please LIKE ❤️ and SUBSCRIBE for more AMAZING content! 🌟
🔗 Useful Links and Resources:
✅ Code: https://github.com/airscholar/FlinkCommerce.git
✅ Medium Article: https://medium.com/@yusuf.ganiyu/realtime-data-engineering-project-with-airflow-kafka-spark-cassandra-and-postgres-804bcd963974
✅ Docker Compose Documentation: https://docs.docker.com/compose/
✅ Apache Kafka Official Site: https://kafka.apache.org/
✅ Apache Flink Official Documentation: https://nightlies.apache.org/flink/flink-docs-stable/
✅ Confluent Docs: https://docs.confluent.io/home/overview.html
✅ Maven Repository: https://mvnrepository.com/
✨ Tags ✨
Big Data Engineering, Apache Flink, Kafka, Elasticsearch, Docker, Data Engineering, Realtime Data Processing, Big Data, Data Pipeline, Streaming Data, Data Analytics, Tech Tutorial, Data Science, Flink Streaming, Kafka Streaming, Elasticsearch Tutorial, Docker Containers, Data Engineering Project, Realtime Analytics, Big Data Technologies, Data Engineering Tutorial, Data Engineering Projects, Data Engineer
✨Hashtags✨
#ApacheFlink, #Kafka, #Elasticsearch, #Docker, #DataEngineering, #RealtimeData, #BigData, #DataPipeline, #TechTutorial, #DataScience, #StreamingData, #Flink, #KafkaStreams, #ElasticsearchTips, #DockerContainers, #DataEngineeringProjects, #RealtimeAnalytics, #BigDataTech, #LearnDataEngineering, #dataengineers
25
views
Realtime Change Data Capture Streaming | End to End Data Engineeering Project
In this video, we dive deep into the world of Change Data Capture (CDC) and how it can be implemented for real-time data streaming using a powerful tech stack. You will use the integration of technologies like Docker, Postgres, Debezium, Kafka, Apache Spark, and Slack to create an efficient and responsive data pipeline.
Master Data Engineering by enrolling at datamasterylab.com
Timestamps:
0:00 Introduction
0:35 The system architecture
14:40 Getting live data into postgres db
31:00 Connecting to Postgres with Debezium and Kafka from the UI
34:35 Previewing Debezium data on Kafka
37:25 Getting full data from Postgres with Debezium
39:55 Setting up debezium connector from the terminal
41:00 Handling decimal values on debezium
46:00 Getting the user that changed data on postgres with time
53:17 Creating a more robust data capture on postgres
1:03:31 Outro
Don't forget to SUBSCRIBE, LIKE, COMMENT and SHARE for more exciting videos!
🔹 Connect with Us:
Follow us on Twitter: x.com/@DataMasterylab
Follow us on LinkedIn: https://www.linkedin.com/in/yusuf-ganiyu-b90140107
Tags:
Change Data Capture, CDC, Real-Time Streaming, Docker, Postgres, Debezium, Kafka, Apache Spark, Slack, Data Engineering, Data Pipeline, Tech Tutorial, Software Development, Data Streaming, IT Education, End to End Data Engineering
Hashtags:
#ChangeDataCapture #RealTimeData #Docker #Postgres #Debezium #Kafka #ApacheSpark #Slack #DataEngineering #TechTutorial #SoftwareDevelopment #DataStreaming #ITEducation #bigdataanalytics #EndToEndDataEngineeringProject
7
views
Apache Flink For Sales Analytics - End to End Data Engineering
In this video you will setup end-to-end data engineering project for Sales Analytics using Apache Flink, a leading framework for big data processing.
🔍 What You'll Learn:
✅ Apache Flink Basics: Get to grips with the fundamentals of Apache Flink, a powerful open-source stream processing framework.
✅ Data Ingestion and Processing: Learn how to ingest and process sales data from CSV files using Flink's DataSet API.
✅ Complex Data Transformations: Understand how to perform joins, aggregations, and sorting on large datasets.
✅ Custom Output Formats: See how to create custom output formats to write processed data back to the file system.
👨💻 In This Tutorial:
We've developed a real-world example of a Flink application that performs comprehensive sales analysis. The application reads sales and product data, joins these datasets, and computes total sales per category. It then sorts the results and writes them back to a CSV file, showcasing the power and ease of handling big data with Flink.
📝 Key Concepts Covered:
👉 Reading CSV data into Flink
👉 Using POJOs for data representation
👉 Joining datasets on key fields
👉 Aggregating data with map and reduce functions
👉 Sorting data in descending order of sales
👉 Writing custom output formats
💡 Perfect For:
👍🏻 Data Engineers and Analysts looking to enhance their big data processing skills.
👍🏻 Beginners in Apache Flink eager to learn through practical examples.
👍🏻 Anyone interested in understanding how sales data can be analyzed and processed in a big data environment.
🔗 Source Code:
Get the full source code of the project here: https://github.com/airscholar/ApacheFlink-SalesAnalytics
📚 Pre-Requisites:
Basic understanding of Java programming and familiarity with concepts of big data and data processing.
🎥 Stay Tuned:
Subscribe to our channel for more tutorials on Apache Flink and other big data technologies. Hit the bell icon to get notified about our latest updates!
👍 Like, Share, and Comment:
Enjoyed this tutorial? Like and share the video with your friends and colleagues. Have questions or suggestions? Drop them in the comments section below!
🔗 Follow Us:
Website: datamasterylab.com
LinkedIn: https://www.linkedin.com/in/yusuf-ganiyu-b90140107
Twitter: https://twitter.com/datamasterylab
My Twitter: https://twitter.com/YusufOGaniyu
#ApacheFlink #DataEngineering #SalesAnalytics #BigData #FlinkTutorial #DataProcessing #RealTimeAnalytics
7
views
AWS EMR (Elastic Map Reduce) For Data Engineers
In this comprehensive guide, we dive into the powerful world of AWS, focusing on how data engineers like you can harness the robust compute capabilities of Amazon Elastic Map Reduce (EMR). Whether you're new to cloud computing or looking to enhance your skills, this video is your one-stop resource for mastering EMR.
What You'll Learn:
🌟 Understanding AWS EMR: Get to grips with the basics of AWS EMR and its role in cloud computing.
💻 Creating a Spark Job: Step-by-step instructions on how to set up and configure your first Spark job in the AWS environment.
🚀 Submitting Jobs to EMR: Learn the ins and outs of job submission and management within the EMR ecosystem.
📊 Visualization Techniques: Discover how to effectively visualize the processing workflow using EMR Spark and Yarn clusters.
🛠 Best Practices: Tips and tricks for optimizing your data processing tasks on AWS EMR.
📈 Real-World Applications: Understand how these skills apply in real-world data engineering scenarios.
🔔 Don't forget to subscribe and hit the bell icon to stay updated on our latest videos in cloud computing, data engineering, and more!
Hashtags:
#AWS #EMR #DataEngineering #CloudComputing #BigData #Spark #YarnCluster #AWSLearning #TechTutorial
Tags:
AWS, Amazon Web Services, EMR, Elastic Map Reduce, Data Engineering, Cloud Computing, Big Data, Spark, Yarn, AWS Tutorial, Data Processing, Cloud Technology, AWS Certification, Data Visualization
👍 Like, Share, and Comment: Your support means a lot! If you found this video helpful, please like, share, and comment below with your thoughts or any questions you might have.
📚 Further Resources: Check out the description below for links to additional resources and related content to further your learning journey.
EMR Documentation: https://docs.aws.amazon.com/emr/
Github: https://github.com/airscholar/EMR-for-data-engineers
Stay Connected:
🌐 Visit our Website: datamasterylab.com
📸 Follow us on Instagram: https://www.instagram.com/airscholar/
🐦 Tweet us on Twitter: https://twitter.com/datamasterylab
🔗 Connect with us on LinkedIn: https://www.linkedin.com/in/yusuf-ganiyu-b90140107
Timestamps:
0:00 - Introduction
1:26 - Setting up EMR on AWS
13:59 - Setting up Cloud9 on AWS
23:00 - Setting Up a Spark Job with S3
24:30 - Submitting Jobs to EMR
31:08 - Visualizing the Process on EMR Spark and Hadoop UI
35:00 - Submitting to the Cluster with EMR Steps
37:45 - Results
10
views
Apache Airflow on Steriods for Data Engineers
In this course, you will create an end to end data engineering project with the combination of Apache Airflow, Docker, Spark Clusters, Scala, Python and Java. You will create basic jobs with multiple programming language, submit them to the spark cluster for processing and see live results.
MORE FREE COURSES: https://datamasterylab.com
⏳ Timestamps:
00:00 Introduction
00:57 Creating The Spark Cluster and Airflow on Docker
11:00 Creating Spark Job with Python
28:51 Creating Spark Job with Scala
37:37 Building and Compiling Scala Jobs
43:23 Creating Spark Job with Java
58:51 Building and Compiling Java Jobs
1:06:15 Cluster computation results
✅ Don't forget to LIKE, COMMENT, SHARE and SUBSCRIBE to our channel for more data engineering projects.
🔗 Resource Links:
Github Code: https://github.com/airscholar/SparkingFlow
Java JDK: https://www.oracle.com/uk/java/technologies/downloads/
Scala SBT installation: https://www.scala-sbt.org/download.html
Maven Installation: https://maven.apache.org/install.html
Spark SQL mvn: https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.13/3.2.1
📢 Stay connected:
Follow us on Twitter(X): https://twitter.com/datamasterylab
Connect with us on LinkedIn: https://www.linkedin.com/in/yusuf-ganiyu-b90140107
Like us on Facebook: https://www.facebook.com/datamasterylab/
🏷️ HashTags:
#ApacheAirflowCourse #DataEngineeringWithAirflow #AirflowOnDocker #SparkDataProcessing #ScalaForSpark #JavaDataEngineering #MavenProjects #BigDataAnalytics #WorkflowAutomation #FullCourse #FreeCourse #Educational #dataengineering
👍 If you found this course helpful, please LIKE and SHARE the video, and leave your thoughts in the COMMENTS below.
🔔 For more tutorials and complete courses, make sure to SUBSCRIBE to our channel and hit the bell icon for notifications!
12
views
Realtime Socket Streaming | End to End Data Engineering Project
In this video, you will be building a real-time data streaming pipeline with a dataset of 7 million records. We'll utilize a powerful stack of tools and technologies, including TCP/IP Socket, Apache Spark, OpenAI Large Language Model (LLM), Kafka, and Elasticsearch.
📚 What You'll Learn:
👉 Setting up and configuring TCP/IP for data transmission over Socket.
👉 Streaming Data With Apache Spark from Socket
👉 Realtime Sentiment Analysis with OpenAI LLM (ChatGPT)
👉 Prompt Engineering
👉 Setting up Kafka for real-time data ingestion and distribution.
👉 Using Elasticsearch for efficient data indexing and search capabilities.
✨ Timestamps: ✨
0:00 Introduction
01:10 Creating Spark Master-worker architecture with Docker
10:40 Setting up the TCP IP Socket Source Stream
23:25 Setting up Apache Spark Stream
42:56 Setting up Kafka Cluster on confluent cloud
47:12 Getting Keys for Kafka cluster and Schema Registry
1:12:53 Realtime Sentiment Analysis with OpenAI LLM (ChatGPT)
1:24:10 Setting up Elasticsearch deployment on Elastic cloud
1:30:50 Realtime Data Indexing on Elasticsearch
1:36:05 Testing and Results
1:41:50 Outro
👦🏻 My Linkedin: https://www.linkedin.com/in/yusuf-ganiyu-b90140107/
🚀 Twitter: https://twitter.com/YusufOGaniyu
📝 Medium: https://medium.com/@yusuf.ganiyu
🌟 Please LIKE ❤️ and SUBSCRIBE for more AMAZING content! 🌟
🔗 Useful Links and Resources:
✅ Code: https://github.com/airscholar/E2EDataEngineering
✅ Medium Article: https://medium.com/@yusuf.ganiyu/real-time-streaming-for-sentiment-analysis-with-sockets-spark-openai-kafka-and-elasticsearch-a577b35a7cb9
✅ Customer Reviews Dataset: https://www.yelp.com/dataset/
✅ Confluent Cloud Docs: https://docs.confluent.io/cloud/current/overview.html
✅ Elasticsearch Documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
✅ Docker Compose Documentation: https://docs.docker.com/compose/
✅ Apache Kafka Official Site: https://kafka.apache.org/
✅ Apache Spark Official Site: https://spark.apache.org/
✨ Tags ✨
Data Engineering, Kafka, Apache Spark, Cassandra, PostgreSQL, Zookeeper, Docker, Docker Compose, ETL Pipeline, Data Pipeline, Big Data, Streaming Data, Real-time Analytics, Kafka Connect, Spark Master, Spark Worker, Schema Registry, Control Center, Data Streaming, Real-time Data Streaming, OpenAI LLM, Elasticsearch, Data Processing, Data Analytics, TCP/IP, Streaming Solutions, Data Ingestion, Real-time Analysis, Spark Configuration, OpenAI Integration, Kafka Topics, Elasticsearch Indexing, Data Storage, Stream Processing, Machine Learning Integration
✨ Hashtags ✨
#confluent #DataEngineering #TCP #TCPIP #sockets #socketstreaming #Kafka #ApacheSpark #Docker #ETLPipeline #DataPipeline #DataStreaming #OpenAI #Elasticsearch #RealTimeData #BigData #TechTutorial #StreamingAnalytics #MachineLearning #DataFlow #SparkStreaming #DataScience #AIIntegration #RealTimeAnalytics #StreamingData #realtimestreaming #realtime
31
views
Reddit Data Pipeline | AWS End to End Data Engineering
🚀 In this video, we walk you through the integration of Reddit, Airflow, Celery, Postgres, S3, AWS Glue, Athena, and Redshift to create a seamless ETL process. 📊🔍
What You Will Learn 📝:
🌐 How to extract data from Reddit using its API.
🔄 Setting up and orchestrating ETL processes with Apache Airflow and Celery.
📦 Storing efficiently with Amazon S3 using Airflow.
🧠 Leveraging AWS Glue for data cataloging and ETL jobs.
📜 Querying and transforming data with Amazon Athena.
🏢 Setting up Redshift Cluster and Best practices for loading data into Amazon Redshift for analytics.
⏰ Timestamps:
0:00 Introduction
1:27 Setting up Apache airflow with Celery Backend and Postgres
9:20 Reddit Data Pipeline with airflow
41:00 Cleaning and Transforming Reddit Data
50:00 Connecting to AWS from Airflow
1:11:17 AWS Glue data transformation
1:22:13 Querying Data with Athena
1:24:47 Setting up Redshift Data Warehouse
1:27:26 Redshift Data Warehouse Query Tool
1:29:00 Loading Data into Data Warehouse
1:32:25 Charting with Redshift Data Warehouse
🔗 Useful Links:
Reddit API Documentation: https://www.reddit.com/wiki/api/
Apache Airflow Official Site: https://airflow.apache.org/docs/
AWS Glue Documentation: https://docs.aws.amazon.com/glue/latest/dg/catalog-and-crawler.html
💬 Let us know in the comments if you have any questions or if there's another topic you'd like us to cover next!
🌟 Don't forget to like, share, and subscribe for more data tutorials! 🌟
14
views
DBT and BigQuery Beginners Crash Course in 30 minutes
Join us as we dive deep into the powerful combination of DBT and BigQuery, the game-changers in modern data engineering. Whether you're a beginner or looking to refine your skills, this tutorial has got you covered!
Unlock the full potential of your data and stay ahead in the analytics game!
⚡️ What You'll Learn:
00:00 🌐 Introduction to DBT & BigQuery
01:27 🛠 Setting up DBT and BigQuery from Scratch
08:00 🔗 Linking DBT and BigQuery
16:13 📝 Writing SQL-based Transformations with DBT
21:27 🔄 Converting Tables to Views with DBT
23:00 📊 Seeding data to BigQuery with DBT
25:00 💡 Writing tests with DBT
27:16 📑 Generating Documentation with DBT
🌟 Please LIKE ❤️ and SUBSCRIBE for more AMAZING content! 🌟
🌟 Hashtags:
#DataEngineering #DBT #BigQuery #AnalyticsTutorial #DataScience2023
🌟 Tags:
DBT, BigQuery, Data Engineering, SQL Transformations, Google Cloud Platform, Data Analytics, Modern Data Stack, Data Warehousing, ETL Process
13
views
Python Webscraping for Beginners
In this tutorial, we'll be taking a deep dive into web scraping and how you can extract valuable information from news websites using Python and the BeautifulSoup library. We'll walk you through each step, starting from installing the necessary packages, understanding the basics of HTML structure, all the way to writing an effective and efficient Python script. By the end of this video, you'll be proficient in scraping data from various news websites, enabling you to analyze and utilize the data for your needs.
We'll cover:
✅ Setting up your Python environment for web scraping
✅ Understanding the structure of a webpage
✅ How to use BeautifulSoup to parse HTML
✅ Writing a Python script to extract data from news websites
✅ Best practices and how to respect a website's robots.txt rules
✅ Exploring ways to clean and use your newly scraped data
GitHub Code - https://github.com/airscholar/PunchScraper
Beautifulsoup Documentation - https://www.crummy.com/software/BeautifulSoup/bs4/doc/
#DataScraping, #WebScraping, #DataExtraction, #DataCrawling, #Python, #DataMining, #WebData, #DataAutomation, #DataAnalysis, #DataCollection, #DataProcessing, #DataScience, #DataEngineering, #DataVisualization, #DataInsights, #CodingTutorial, #TechTutorial, #ProgrammingTips, #DataTools, #DataSkills
10
views
Mastering Data Fabrication: Enhance Testing & Protect User Privacy with Python
Are you looking to elevate your application testing while prioritizing user privacy? Dive into the world of data fabrication with us! Learn how to craft realistic, yet entirely fictitious data sets that not only enhance your testing but also adhere to data privacy standards.
What You'll Learn:
✅ The importance of using fake data in application testing.
✅ How to generate basic data elements like names and addresses.
✅ Techniques to create advanced data types, including images and paragraphs.
✅ Best practices to ensure compliance with data regulations.
✅ Mastering 'Faker' and other essential tools for data fabrication.
Resources:
Github Code: https://github.com/airscholar/FakeDataGen
Hashtags & Tags:
#FakeData, #ApplicationTesting, #DataPrivacy, #FakerTool, #DataFabrication
Tags: Faker, Data Generation, Application Testing, Data Privacy, Data Compliance
3
views
Harnessing ChatGPT with Pandas: Revolutionizing Data Analysis with AI
🎉 Ready to revolutionize your Data Analysis with the power of AI? Dive in as we merge the capabilities of Pandas with Artificial Intelligence, unlocking a new realm of data processing and visualization. 🚀
What You'll Learn:
📊 Enhancing Pandas, Python's renowned data manipulation library, with AI.
🧹 Automating data cleaning: Eliminate missing values and dataset errors.
🎨 Crafting smarter, AI-powered visualizations with Pandas.
🔮 Predictive analytics: Forecast trends and make precise predictions for various metrics.
Resources:
📁 Github Code Repository
📘 Deep Dive Medium Article
🤖 PandasAI Documentation
🐼 Pandas Official Documentation
🔑 OpenAI API Keys
Timestamps:
0:00 Introduction
1:11 Dataset Creation
12:53 AI Integration
20:36 Plotting Charts
23:36 Bar Charts
25:45 Multiple Bar Charts
26:40 Pie charts
🌟 Join our data-driven journey, and don't forget to like, subscribe, and share with fellow data aficionados!
Stay updated for more tech insights and tutorials. Happy coding! 💻
Hashtags:
#DataAnalysis #ArtificialIntelligence #Pandas #PredictiveAnalytics #DataVisualization #AIIntegration #PythonCoding
Tags:
Data Analysis, AI, Pandas, Data Cleaning, AI-Powered Visualization, Predictive Analytics, Python, Dataset Creation, AI Integration, Plotting Charts, Bar Charts, Pie Charts
6
views
How to remove background from any image [Python + CLI]
👋 Hey there, Pythonistas! Welcome to today's step-by-step tutorial on how to easily remove backgrounds from any image using Python. You don't need to be a programming whiz to follow along—this tutorial is perfect for beginners!
🔗 Resources
Download Code: https://github.com/airscholar/Background-removal.git
Python Download: https://www.python.org/downloads/
Required Libraries: [PIL, Rembg]
📚 What You'll Learn
1️⃣ Essential Python Libraries for Image Manipulation
2️⃣ Reading and Displaying Images with Python
3️⃣ Saving Your Edited Image
📈 Prerequisites
Basic understanding of Python syntax
Installed Python environment (preferably 3.x)
🛠️ Tools Used
- Python 3.x
- PIL (Pillow)
- Rembg
Please LIKE, SHARE AND SUBSCRIBE!
Outline:
0:00 - Introduction
1:40 - Documentation
3:46 - Removing background via Python
9:37 - Removing background via CLI
13:00 - Removing multiple images background
14:02 - Results
15:14 - Outro
3
views
How to Create QR Codes with Python
👋 Welcome to today's tutorial on how to create and customize QR codes using Python. This guide is beginner-friendly, so don't worry if you're new to programming!
📚 What You'll Learn:
1️⃣ Installing necessary Python libraries
2️⃣ Generating a Basic QR Code
3️⃣ Customizing your QR Code with Logo
4️⃣ Customizing your QR Code with Background
5️⃣ Saving your QR Code as an image
🔗 Resources:
Download the Code: https://github.com/airscholar/qrcode-creator.git
Python Download: https://www.python.org/downloads/
🛠️ Tools Used:
Python 3.10
qrcode[pil] library
👍 If you found this video helpful, please give it a thumbs up, leave a comment, and share it with your friends!
🔔 Don't forget to subscribe and ring the notification bell to stay updated with our new tutorials.
OUTLINE:
0:00 - Introduction
1:55 - Creating Plain QR Codes
06:10 - Creating QR Codes with Logo
11:45 - Creating QR codes with custom background
15:54 - Outro
3
views
Realtime Data Streaming | End To End Data Engineering Project
In this video, you will be building a real-time data streaming pipeline, covering each phase from data ingestion to processing and finally storage. We'll utilize a powerful stack of tools and technologies, including Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra—all neatly containerized using Docker.
📚 What You'll Learn:
👉 Setting up a data pipeline with Apache Airflow
👉 Streaming data with Kafka and Kafka Connect
👉 Using Zookeeper for distributed synchronization
👉 Data processing with Apache Spark
👉 Data storage solutions with Cassandra and PostgreSQL
👉 Containerizing your data engineering environment with Docker
✨ Timestamps: ✨
0:00 Introduction
0:53 System architecture
3:47 Getting data from API with Airflow
17:10 Docker Compose for the architecture
26:09 Streaming data into Kafka
44:29 Apache Spark and Cassandra setup
49:33 Streaming data into cassandra
1:27:05 Outro
👦🏻 My Linkedin: https://www.linkedin.com/in/yusuf-ganiyu-b90140107/
🚀 Twitter: https://twitter.com/YusufOGaniyu
📝 Medium: https://medium.com/@yusuf.ganiyu
🌟 Please LIKE ❤️ and SUBSCRIBE for more AMAZING content! 🌟
🔗 Useful Links and Resources:
✅ Code: https://github.com/airscholar/e2e-data-engineering.git
✅ Medium Article: https://medium.com/@yusuf.ganiyu/realtime-data-engineering-project-with-airflow-kafka-spark-cassandra-and-postgres-804bcd963974
✅ Docker Compose Documentation: https://docs.docker.com/compose/
✅ Apache Kafka Official Site: https://kafka.apache.org/
✅ Apache Spark Official Site: https://spark.apache.org/
✅ Apache Airflow Official Site: https://airflow.apache.org/
✅ Cassandra: https://cassandra.apache.org/
✅ Confluent Docs: https://docs.confluent.io/home/overview.html
✨ Tags ✨
Data Engineering, Apache Airflow, Kafka, Apache Spark, Cassandra, PostgreSQL, Zookeeper, Docker, Docker Compose, ETL Pipeline, Data Pipeline, Big Data, Streaming Data, Real-time Analytics, Kafka Connect, Spark Master, Spark Worker, Schema Registry, Control Center, Data Streaming
✨ Hashtags ✨
#confluent #DataEngineering #ApacheAirflow #Kafka #ApacheSpark #Cassandra #PostgreSQL #Docker #ETLPipeline #DataPipeline #StreamingData #RealTimeAnalytics
29
views
Creating Mathematical Formulars with Python
Are you tired of the tedious process of formatting mathematical equations for your academic papers, presentations, or code? Today, I've got something that will change the game for you! 🎉 You will learn how to effortlessly convert your Python functions into beautifully formatted LaTeX equations. 🚀
👉 What You'll Learn:
✅ What is LaTeX and its importance
✅ Why you should use Latexify
✅ Basic to advanced usage with real-life examples
✅ Limitations and much more!
📚 Read More on Medium:
Don't miss out on our in-depth Medium article covering everything you need to know about Google's Latexify: Google's Latexify: Crafting Mathematical Formulas the Pythonic Way
👍 Like What You See?
Don't forget to like, share, and subscribe for more content like this! Your support helps us create more awesome videos!
Timestamps:
0:00 Introduction
0:45 What is LaTeX?
1:36 Why Use Latexify?
2:03 Installation
2:35 Basic Usage
10:46 Advanced Features
15:18 Documentation and review
16:06 Outro
🔗 Links & Resources:
👉 Colab File Used in the video: https://colab.research.google.com/drive/1kqt0-aEEO88rG3VER7HBQAgLKn5Ee7Vd
👉 Latexify GitHub Repo: https://github.com/google/latexify_py
👉 LaTeX Official Website: https://www.latex-project.org/
👉 Medium Article: https://medium.com/@yusuf.ganiyu/googles-latexify-crafting-mathematical-formulas-the-pythonic-way-613fe4ef2600
🏷️ Tags:
Python, LaTeX, Latexify, Mathematics, DataScience, Academia, Research, Coding, Programming, OpenSource
🌟 Hashtags:
#Python #LaTeX #Latexify #Mathematics #DataScience #Academia #Research #Coding #Programming #OpenSource
59
views
Realtime Streaming with Kafka and Telegram | End to End Data Engineering Project
In this comprehensive tutorial, you will build an end-to-end data engineering pipeline for real-time YouTube Analytics. Each time there's an activity on any video you or playlist of your choice, you get instant notification on Telegram.
📋 What You Will Learn:
✅ How to fetch data from YouTube API using Python
✅ Setting up a Kafka ecosystem using Docker and Confluent containers
✅ Processing and streaming data using ksqlDB
✅ Sending data to external systems with connectors
✅ Real-time notifications on Telegram
✅ Python Advanced Concepts
✅ Google cloud configuration for Youtube
🛠 Technologies Used:
✅ Python
✅ Google Cloud
✅ Docker
✅ Telegram
🔗 Useful Links:
✅ GitHub Repo for this Project: https://github.com/airscholar/YoutubeAnalytics.git
✅ Confluent Official Documentation: https://docs.confluent.io/home/overview.html
✅ YouTube API Documentation: https://developers.google.com/youtube/v3/docs
✅ Kafka-Python Documentation: https://kafka-python.readthedocs.io/en/master/
🌟 Please LIKE ❤️ and SUBSCRIBE for more AMAZING content! 🌟
🗨 Comments & Questions
Have questions or ran into issues? Drop a comment below and I'll do my best to help you out!
TIMESTAMPS:
0:00 Introduction
2:21 Setting up the system architecture on Docker
17:46 Control Center Demo
23:20 Getting Youtube API Key from Google Cloud
27:21 Fetching Data From Youtube with Python
37:53 Streaming Data to Kafka
45:13 Advanced Python Concept
59:31 Stream Processing with KSQLDB
1:09:50 Setting up Telegram Bot
1:13:08 Connecting to external systems from Kafka (Telegram)
1:27:18 Outro
✨ Tags ✨
Data Engineering, Kafka, Zookeeper, Docker, Docker Compose, ETL Pipeline, Data Pipeline, Big Data, Streaming Data, Real-time Analytics, Kafka Connect, Schema Registry, Control Center, Data Streaming, Google Cloud, Youtube API, Telegram, KSqlDb, Confluent Connect
✨ Hashtags ✨
#confluent #DataEngineering #ApacheAirflow #Kafka #telegram #PostgreSQL #Docker #ETLPipeline #DataPipeline #StreamingData #RealTimeAnalytics #docker #python #java
8
views
How to Blur Faces with Python | Face Anonymization
Learn how to anonymize faces in your images by blurring or pixelating them using Python. This tutorial covers the step-by-step process of detecting and blurring faces, ensuring privacy and anonymity in your projects. Whether you're working on a personal project or managing sensitive data, this tutorial will guide you through a reliable way to handle face anonymization.
📌 Timestamps:
0:00 - Introduction
0:51 - Documentation and Libraries Installations
4:20 - Writing the code
10:20 - Translucent Anonymization on CPU
20:47 - Pixelate Anonymization on CPU
25:05 - Pixelate Anonymization on GPU
29:00 - Comparison of results
34:12 - Outro
🌟 Please LIKE ❤️ and SUBSCRIBE for more AMAZING content! 🌟
🔗 Links:
Code Repository: https://github.com/airscholar/Face-Anonymizer.git
OpenCV Documentation: https://opencv.org/documentation/
MTCNN GitHub Repo: https://github.com/ipazc/mtcnn
🔑 Tags:
#FaceBlurring #FacePixelation #OpenCV #MTCNN #PythonTutorial #ImageProcessing #DataAnonymization #FaceDetection #computervision
2
views
Football Data Analytics | Azure End To End Data Engineering Project
In this tutorial, we dive deep into the world of football data engineering. We'll walk through the entire process of extracting football data from Wikipedia using Apache Airflow, storing it in Azure Data Lake, migrating the data with Azure Data Factory, querying with Azure Synapse, and finally visualizing our findings in Tableau.
🔗 Timestamps:
0:00 - Introduction
3:15 - Setting up the Infrastructure
8:45 - Extracting data from Wikipedia
35:00 - Cleaning the Data
47:00 - Transforming the Data
53:00 - Enriching the dataset with Lat and Long
1:01:11 - Writing the cleaned data to file
1:14:40 - Outro
🌟 Please LIKE ❤️ and SUBSCRIBE for more AMAZING content! 🌟
📚 Resources & Links:
0. Code: https://github.com/airscholar/FootballDataEngineering.git
1. Apache Airflow Documentation: https://airflow.apache.org/docs/
2. Azure Data Lake Documentation: https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction
3. Azure Data Factory Documentation: https://learn.microsoft.com/en-us/azure/data-factory/
4. Azure Synapse Documentation: https://learn.microsoft.com/en-us/azure/synapse-analytics/overview-what-is
5. Medium Link: https://medium.com/@yusuf.ganiyu/football-data-analytics-from-wikipedia-through-azure-and-apache-airflow-to-tableau-5edeb035cc0b
⚡️Tags ⚡️
Football, Data Engineering, Apache Airflow, Docker, Azure Data Lake, Azure Data Factory, Azure Synapse, Tableau, Data Visualization, Wikipedia Data Extraction, Tutorial
⚡️Hashtags ⚡️
#FootballData #DataEngineering #ApacheAirflow #Azure #Tableau #WikipediaData #DataVisualization #docker
18
views
Football Data Analytics | Azure End To End Data Engineering Project - Part 2
In this tutorial, we dive deep into the world of football data engineering. We'll walk through the entire process of extracting football data from Wikipedia using Apache Airflow, storing it in Azure Data Lake, migrating the data with Azure Data Factory, querying with Azure Synapse, and finally visualizing our findings in Tableau.
🔗 Timestamps:
0:00 - Introduction
1:00 - Creating a free Azure account
2:15 - Setting up storage account
7:30 - Pushing Data from Airflow to Azure Data Lake
12:13 - Setting up Data Factory
16:21 - Data Integration with Data Factory
26:26 - Setting up Synapse Analytics
37:33 - Writing Complex Queries with Synapse
58:30 - Outro
🌟 Please LIKE ❤️ and SUBSCRIBE for more AMAZING content! 🌟
📚 Resources & Links:
1. Apache Airflow Documentation: https://airflow.apache.org/docs/
2. Azure Data Lake Documentation: https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction
3. Azure Data Factory Documentation: https://learn.microsoft.com/en-us/azure/data-factory/
4. Azure Synapse Documentation: https://learn.microsoft.com/en-us/azure/synapse-analytics/overview-what-is
5. Medium Link: https://medium.com/@yusuf.ganiyu/football-data-analytics-from-wikipedia-through-azure-and-apache-airflow-to-tableau-5edeb035cc0b
⚡️Tags ⚡️
Football, Data Engineering, Apache Airflow, Docker, Azure Data Lake, Azure Data Factory, Azure Synapse, Tableau, Data Visualization, Wikipedia Data Extraction, Tutorial
⚡️Hashtags ⚡️
#FootballData #DataEngineering #ApacheAirflow #Azure #Tableau #WikipediaData #DataVisualization #docker
14
views
Football Data Analytics | Azure End To End Data Engineering Project - Part 3
In this tutorial, we dive deep into the world of football data engineering. We'll walk through the entire process of extracting football data from Wikipedia using Apache Airflow, storing it in Azure Data Lake, migrating the data with Azure Data Factory, querying with Azure Synapse, and finally visualizing our findings in Tableau.
🔗 Timestamps:
0:00 - Introduction
1:00 - Creating a free Azure account
2:15 - Setting up storage account
7:30 - Pushing Data from Airflow to Azure Data Lake
12:13 - Setting up Data Factory
16:21 - Data Integration with Data Factory
26:26 - Setting up Synapse Analytics
37:33 - Writing Complex Queries with Synapse
58:30 - Outro
🌟 Please LIKE ❤️ and SUBSCRIBE for more AMAZING content! 🌟
📚 Resources & Links:
1. Apache Airflow Documentation: https://airflow.apache.org/docs/
2. Azure Data Lake Documentation: https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction
3. Azure Data Factory Documentation: https://learn.microsoft.com/en-us/azure/data-factory/
4. Azure Synapse Documentation: https://learn.microsoft.com/en-us/azure/synapse-analytics/overview-what-is
5. Medium Link: https://medium.com/@yusuf.ganiyu/football-data-analytics-from-wikipedia-through-azure-and-apache-airflow-to-tableau-5edeb035cc0b
⚡️Tags ⚡️
Football, Data Engineering, Apache Airflow, Docker, Azure Data Lake, Azure Data Factory, Azure Synapse, Tableau, Data Visualization, Wikipedia Data Extraction, Tutorial
⚡️Hashtags ⚡️
#FootballData #DataEngineering #ApacheAirflow #Azure #Tableau #WikipediaData #DataVisualization #docker
17
views