Netdata & Ansible example: ML demo room

Automating Monitoring Configuration for Efficiency and Consistency

We are always trying to lower the barrier to entry when it comes to monitoring and observability and one place we have consistently witnessed some pain from users is around adopting and approaching configuration management tools and practices as your infrastructure grows and becomes more complex.

To that end, we have begun recently publishing our own little example ansible project used to maintain and manage the servers used in our public Machine Learning Demo room.

This post introduces this project as a somewhat simple example of using Ansible with Netdata. Read on to learn more, but more importantly feel free to explore the repo and see how it all hangs together.

Project Structure

If you are not that familiar with Ansible, the official getting started tutorial is a great place to start. Some of the main concepts, “inventory”, “task”, “playbook”, map fairly direclty onto the project structure below.

Like any great tool there is a lot of flexibility and different ways to achieve your goals. For our user case here, managing a handful of somewhat homogenous servers that make up the ML demo room, the below general structure has worked fine so far for us.

  • host_vars - some yaml files for host specific variables live in here.
  • playbooks - various playbooks (collections of tasks) for common maintenance and configuration management activities.
  • tasks - yaml files defining low level tasks, related tasks group in folders (e.g. Netdata tasks live in tasks/netdata).
  • templates - templated files, typically configuration files live in here (all use Jinja2).
  • vars - different variable files for each system or component live in here. Used by templates and tasks.
  • inventory.yaml - A list of all the hosts managed by this Ansible project as well as one or two global default variables.

Thats pretty much all there is to it. The best way to get started really is to have a look at some examples of tasks (for example this task to define some ML based alerts) and playbooks (for example restart Netdata and check its status before and after) and then start simple and adapt them to your needs.

Below are some useful links and resources for both users new to configuration management and those looking to do more advanced tasks and complex specific Netdata related use-cases.

Useful Resources

If you’re interested in configuration management use cases in observability in general or specific to Netdata then come hang in our discord or community forums!