Wednesday, September 25, 2019

Dashboards, Metrics and Observability

SHOW: 416

DESCRIPTION: Brian talks with Björn Rabenstein (Engineer at @Grafana) about the intersection of Dashboards, Metrics, Monitoring and Observability.

SHOW SPONSOR LINKS:


CLOUD NEWS OF THE WEEK:

SHOW INTERVIEW LINKS:

SHOW NOTES:

Topic 1 - Welcome to the show. Tell about your background prior to joining Grafana Labs. (worked with Julius Volz at Soundcloud, guest on The Cloudcast on Eps.263 and Eps.319). 

Topic 2 - I saw a tweet the other day that said, “CIO directive to cut contracts because they have 37 monitoring tools and still the reliability is poor...". Your talk at VelocityConf is about the hype around observability and monitoring. What is the state of Ops visibility? 

Topic 3 - Let’s start by talking about good hygiene and good practices. What types of things should Ops teams, SREs and even Developers always been doing to have good visibility of their environments?

Topic 4 - What are the big mistakes that companies make, or what anti-patterns are becoming more pervasive? 

Topic 5 - As a builder of tools, and an operator of tools, what are some of the things you wish more Dev knew, but maybe don’t know what to ask?

FEEDBACK?

Wednesday, September 18, 2019

Chaos Engineering and Team Health

SHOW: 415

DESCRIPTION: Brian talks with Paul Osman (@paulosman, SRE Engineering Manager @UnderArmour) about aligning business value to Chaos Engineering, measuring its impact, and changing team culture to embrace the chaos.

SHOW SPONSOR LINKS:


CLOUD NEWS OF THE WEEK:

SHOW INTERVIEW LINKS:

SHOW NOTES:

Topic 1 - Welcome to the show. Before we get into Chaos Engineering, let’s talk a little bit about your background and some of the things you did prior to joining Under Armour. 

Topic 2 - We’ve talked about Chaos Engineering a few times on the show before. At a company level, what are some of the things (Connected Health) where it makes sense for Under Armour to be investing in Chaos Engineering and developing expertise around this discipline?

Topic 3 - Walk us through how a team at Under Armour thinks about Chaos Engineering, from the business need to think about scheduling it (or not scheduling it), measuring it, and then communicating the results back within your team and to management.

Topic 4 - I think people think that Chaos is a periodic event, like a DR test, but in reality, it needs to be somewhat of an on-going activity. How do you connect the dots between this on-going Chaos and actual problems in your systems - and how/when to measure problems (or what to measure)?

Topic 5 - What is the most difficult part about getting the team culture to understand that Chaos is an important part of day-to-day activities and dealing with “failure” being part of the system?

FEEDBACK?

Wednesday, September 11, 2019

Knative Serverless

SHOW: 414

DESCRIPTION: Brian talks with Sebastien Goasguen (@sebgoa, CTO/Co-Founder at @TriggerMesh) about the evolution of the Knative project.

SHOW SPONSOR LINKS:


CLOUD NEWS OF THE WEEK:

SHOW INTERVIEW LINKS:

SHOW NOTES:

Topic 1 - Welcome back to the show. Almost a year ago, you launched TriggerMesh with Mark Hinkle. How is the business doing? 

Topic 2 - A couple of years ago, you helped create a technology called Kubeless, to do Serverless/FaaS on Kubernetes. And then Knative came along. For people that aren’t familiar with Knative, can you give us a Tl;DR on what it is and how it has evolved as a standard for Kubernetes?

Topic 3 - Let’s talk about the different elements of Knative and how each one of them is evolving - Build, Serving and Eventing. 

Topic 4 - Can we talk about the differences between “Serverless” and “Functions-as-a-Service”, especially in the context of different frameworks, and event sources?

Topic 5 - Triggermesh has been very early in delivering Serverless or Functions-as-a-Service via Knative. What are some of the lessons you’ve learned (use-cases, customer preferences, area of education) over the last year?

Topic 6 - Do you have any insight into some of the things that might be coming next in Knative? 

FEEDBACK?

Wednesday, September 4, 2019

Everything is a Little Bit Broken

SHOW: 413

DESCRIPTION: Brian talks with Heidi Waterhouse, (@wiredferret, Developer Advocate @LaunchDarkly) about the challenges of balancing stability and agility, from a technology perspective and a cultural perspective. 

SHOW SPONSOR LINKS:


CLOUD NEWS OF THE WEEK:

SHOW INTERVIEW LINKS:

SHOW NOTES:

Topic 1 - Welcome to the show. We’re going to talk about broken systems today, but before we get into that, let’s talk about your background, and what types of things you work on at LaunchDarkly.

Topic 2 - As we start seeing companies adopt a lot of these new technologies and methods (Agile, DevOps, Microservices, Distributed Systems, Cloud-Native Apps, Continuous Integration, etc.) we’re seeing them go through this interesting transformation of having to think differently about how things should work and how they might break. This is an area that you talk about quite a bit. 

Topic 3 - There is a 5 9s mentality and there is a release daily intro production mentality that sort of seem at odds with each other. How do we start figuring out how to manage that big space in between those two world views? Or can they be the same? 

Topic 4 - By adding in error budgets, layered access, and other accommodations for failure and for designing our systems for function over form or purity - we learn how to add resiliency to their system by learning to trust but mitigate their reliance on the perfect performance of their underlying tools.

Topic 5 - You get to talk to a lot of developers and architects. What are some of the best ways that you’ve seen them not only grasp these concepts but communicate them up to their management chains so they educate them about the terminology and concepts?


FEEDBACK?