OpenStack in the Real World

Audience:

A fully integrated OpenStack cluster, especially one utilizing Ironic, touches all parts of an environment. When it works well -- it's great! Users are able to provision instances quickly and manage them without fail. Providing this simple, abstracted interface can make it easy to take for granted the complicated set of events which have to succeed for that instance to be provisioned.

Hardware failures, network blips, and power outages are only a small number of the issues you may encounter in the real world. As the complexity stacks in the real world, so does the potential failure points -- OpenStack generally manages this well. However, as typical in IT, when the job is done well, nobody notices; OpenStack is a victim of this perceptional bias, when it works it's in the background, but when it fails it's difficult to ignore.

We'll discuss several ways to mitigate this perception, including:

  • Ensuring monitoring differentiates physical failures from software failures
  • Using capacity planning and availability zones to improve user experience around failure
  • Internal marketing successes to your non-technical managers and staff
  • Setting reasonable uptime expectations
Time:
Thursday, October 31, 2024 - 11:00