Many organizations have migrated their data warehouses to datalake solutions in recent years.
With the convergence of the data warehouse and the data lake, a new data management paradigm has emerged that combines the best of 2 approaches: the botton-up of big data and the top-down of a classic data warehouse.
TalkPyData: Data Engineering
Description
In this talk, I will explain the current challenges of a datalake and how we can approach a
moderm data architecture with the help of pyspark, hudi, delta.io or iceberg.
We will see how organize data in a data lake to support real-time processing of applications
and analyzes across all varieties of data sets, structured and unstructured, how provides
the scale needed to support enterprise-wide digital transformation and creates one unique source of data
for multiple audiences.
Mauro Pelucchi is a senior data scientist and big data engineer
responsible for the design of the "Real-Time Labour Market Information System on Skill Requirements" for CEDEFOP.
He currently works as Head of Global Data Science @ EMSI Burning-Glass with the goal to develop innovative models, methods and deployments of labour market data and other data to meet customer requirements and prototype new potential solutions. His main tasks are related to advanced machine learning modelling, labour market analyses, and the design of big data pipelines to process large datasets of online job vacancies.
In collaboration with the University of Milano-Bicocca, he took part in many research projects related to the labour market intelligence systems.
He collaborates with the University of Milano-Bicocca as a lecturer at the Master Business Intelligence and Big Data Analytics and with the University of Bergamo as a lecturer in Computer Engineering.