Modern SRE teams operate in increasingly complex environments that span multiple clouds, services, and identity systems. Maintaining reliability depends on understanding what exists across this estate, how it is configured, and how it changes over time. Yet this data is often fragmented across APIs, consoles, and configuration stores that do not speak the same language.
We set out to address this challenge by adopting CloudQuery, a framework that lets teams extract and normalize cloud, SaaS, and infrastructure data into relational tables. By consolidating this data, we built a continuously updated inventory that provides a shared view of our cloud environment. This inventory has become a foundation for reliability engineering, incident response, governance, and product decision-making.
In this talk, we will share how we used CloudQuery to unify asset and configuration data, the architectural and operational decisions we made along the way, and how this visibility now empowers not only SRE but teams across the company. Topics include:
- Why we chose CloudQuery and how we integrated it into our infrastructure
- Building custom plugins to extend coverage and extract the data we care about most
- How this system supports reliability workflows: incident response, on-call investigations, and capacity reviews
- How shared visibility strengthened collaboration between SRE, Product, and GRC teams
This talk is aimed at SREs, platform engineers, and operations teams who want to improve reliability through better visibility and shared data. Attendees will come away with practical lessons on building an internal asset inventory, extending tooling to fit their environment, and using that data to inform technical and organizational decisions.
Session Outline
- Introduction and Motivation (5 minutes)
- The visibility problem: fragmented data across clouds and services
- Why asset and configuration data matter for reliability
- The goals that led us to build a unified inventory
- Selecting and Integrating CloudQuery (7 minutes)
- Why we chose CloudQuery as the foundation
- How it runs within our infrastructure and existing pipelines
- Extending functionality with custom plugins for internal systems
- Using the Inventory for Reliability (8 minutes)
- How we query the inventory to support drift detection, dependency analysis, and change tracking
- Connecting inventory data to reliability metrics and operational reviews
- Operationalizing the System (8 minutes)
- Managing rate limits, scheduling, and data freshness
- Integrations with monitoring, alerting, and incident response workflows
- Impact and Lessons Learned (7 minutes)
- How this data transformed our reliability practices
- How Product and GRC teams now use the same data to inform their work
- Key lessons and what we would do differently next time
- Q&A (10 minutes)
Audience Take-Aways
- A practical approach to building a unified cloud inventory using extensible data collection tooling.
- Lessons learned from running and scaling an asset inventory pipeline in production.
- How integrating this data into incident response and review processes improves reliability outcomes.
- How shared visibility across SRE, Product, and GRC teams enables faster decisions and stronger operational alignment.



