Showing posts with label Data Lake. Show all posts
Showing posts with label Data Lake. Show all posts

Wednesday, February 15, 2023

Data Lakehouses and Apache Hudi

Kyle Weller (@KyleJWeller, Head of Product @onehousehq) talks about the latest trends in  OSS Data Lakes, Data Warehouses, and the evolution to “Data Lakehouses” with Apache Hudi

SHOW: 694

CLOUD NEWS OF THE WEEK - http://bit.ly/cloudcast-cnotw

NEW TO CLOUD? CHECK OUT - "CLOUDCAST BASICS"

SHOW SPONSORS:

SHOW NOTES:

Topic 1 - Welcome to the show. Tell us a little bit of your background, and where you focus your efforts at Onehouse?

Topic 2 - Your focus is on an emerging open source project, Apache Hudi. Before we dive into the project and technologies, we’re always interested in the background of what drove the creation of new projects. What problems existed before Hudi? 

Topic 3 - Let’s dive into Hudi. Data lakes, Delta Lakes, Lake houses, Icebergs. What is going on with all these water metaphors?  

Topic 4 - Hudi is focused on streaming data lakes. What are some of the things (types of applications) that need a streaming data lake? Where do transactions come into play? Where do data warehouse capabilities come into play?

Topic 5 - Stitching together open source projects and platforms can be complicated. How does the Onehouse platform simplify all of this for either data scientists or platform teams?

Topic 6 - What are some examples of how companies are using Onehouse and Hudi today? 

FEEDBACK?

  • Email: show at the cloudcast dot net
  • Twitter:
Drunk Agile
Dan Vacanti and Prateek Singh drink whisk(e)y and discuss various facets of agile...

Listen on: Apple Podcasts   Spotify

Wednesday, July 22, 2020

Introduction to Data Mesh

Zhamak Dehghani (@zhamakd, Portfolio Tech Director @ThoughtWorks) talks about the concepts behind Data Mesh, the challenges and problems of Data Lakes / Data Warehouses, and how Cloud-native principles can be applied to Data. 

SHOW: 459

SHOW SPONSOR LINKS:


CLOUD NEWS OF THE WEEK - http://bit.ly/cloudcast-cnotw

PodCTL Podcast is Back (Enterprise Kubernetes) - http://podctl.com

SHOW NOTES:

Topic 1 - Welcome to the show. We were introduced to you through the O’Reilly events, but you’ve been involved in software development and architecture for quite a while. Tell us a little bit about your background and your focus areas at ThoughtWorks.

Topic 2 - About a year ago, you introduced this new concept called “Data Mesh”. Before we get into that, give us a little bit of background on the problems that previous generations of Data Warehouses or Data Lakes created. 

Topic 3 - Lets begin to walk through how Data Mesh is different from Data Lake. We’re not talking about just dumping all the various data sources into one “pool”, there’s a concept of “domains” within this big pool of data. What are the new concepts of source and consumption?

Topic 4 - Explain the concept of how pipelines are tied into Data Mesh and how this allows the creation of new products/features from the Data Mesh.

Topic 5 - You talk about the data being truthful, and then you bring an SRE concept of SLO into the truthfulness of the data. Explain how that might work? 

Topic 6 - Once a Data Mesh is in place, what are the “roles” (or teams) that have specific tasks, and who are the typical consumers of the Data Mesh platform?


FEEDBACK?