Netdata Team

Netdata Team

November 30, 2022

Release 1.37.0: Infinite scalability, database tiering, and much more

Continuing to Innovate and Improve with New Release Features

Another release of the Netdata Monitoring solution is here!

We focused on these key areas:

Infinite scalability of the Netdata Ecosystem

Default Database Tiering, offering months of data retention for typical Netdata Agent installations with default settings and years of data retention for dedicated Netdata Parents.

Overview Dashboards at Netdata Cloud got a ton of improvements to allow slicing and dicing of data directly on the UI and overcome the limitations of the web technology when thousands of charts are presented on one page.

Integration with Grafana for custom dashboards, using Netdata Cloud as an infrastructure-wide time-series data source for metrics

PostgreSQL monitoring completely rewritten offering state of the art monitoring of the database performance and health, even at the table and index level.

Release v1.37

IMPORTANT NOTICE

This release fixes two security issues, one in streaming authorization and another at the execution of alarm notification commands. All users are advised to update to this version or any later! Credit goes to Stefan Schiller of SonarSource.com for identifying both of them. Thank you, Stefan!

Netdata release v1.37 introduction

Another release of the Netdata Monitoring solution is here!

We focused on these key areas:

  1. Infinite scalability of the Netdata Ecosystem
  2. Default Database Tiering, offering months of data retention for typical Netdata Agent installations with default settings and years of data retention for dedicated Netdata Parents.
  3. Overview Dashboards at Netdata Cloud got a ton of improvements to allow slicing and dicing of data directly on the UI and overcome the limitations of the web technology when thousands of charts are presented on one page.
  4. Integration with Grafana for custom dashboards, using Netdata Cloud as an infrastructure-wide time-series data source for metrics
  5. PostgreSQL monitoring completely rewritten offering state of the art monitoring of the database performance and health, even at the table and index level.

Read more about this release in the following sections!

Table of contents

❗ We’re keeping our codebase healthy by removing features that are end of life. Read the deprecation notices to check if you are affected.

Netdata open-source growth

  • Over 61,000 Github Stars
  • Almost four million monitored servers
  • Almost 85 million sessions served
  • Rapidly approaching a half million total nodes in Netdata Cloud

Release highlights

Infinite scalability

Scalability is one of the biggest challenges of monitoring solutions. Almost every commercial or open-source solution assumes that metrics should be centralized to a time-series database, which is then queried to provide dashboards and alarms. This centralization, however, has two key problems:

  1. The scalability of the monitoring solutions is significantly limited, since growing these central databases can quickly become tricky, if it is possible at all.
  2. To improve scalability and control the monitoring infrastructure cost, almost all solutions limit granularity (the data collection frequency) and cardinality (the number of metrics monitored).

At Netdata we love high fidelity monitoring. We want granularity to be “per second” as a standard for all metrics, and we want to monitor as many metrics as possible, without limits.

Read more about our improvements to scalability The only way to achieve our goal is by scaling out. Instead of centralizing everything into one huge time-series database, we have many smaller centralization points that can be used seamlessly all together like a giant distributed database. **This is what Netdata Cloud does!** It connects to all your Netdata agents and seamlessly aggregates data from all of them to provide infrastructure and service level dashboards and alarms.

image

Netdata Cloud does not collect or store all the data collected; that is one of its most beautiful and unique qualities. It only needs active connections to the Netdata Agents having the metrics. The Netdata Agents store all metrics in their own time-series databases (we call it dbengine, and it is embedded into the Netdata Agents).

In this release, we introduce a new way for the Agents to communicate their metadata to the cloud. To minimize the amount of traffic exchanged between Netdata Cloud and Agents, we only transfer a very limited information of metadata. We call this information contexts, and it is pretty much limited to the unique metric names collected, coupled with the actual retention (first and last timestamps) that each agent has available for query.

At the same time, to overcome the limitations of having hundreds of thousands of Agents concurrently connected to Netdata Cloud, we are now using EMQX as the message broker that connects Netdata Agents to Netdata Cloud. As the community grows, the next step planned is to have such message brokers in five continents, to minimize the round-trip latency for querying Netdata Agents through Netdata Cloud.

We also see Netdata Parents as a key component of our ecosystem. A Netdata Parent is a Netdata Agent that acts as a centralization point for other Netdata Agents. The idea is simple: any Netdata Agent (Child) can delegate all its functions, except data collection, to any other Netdata Agent (Parent), and by doing so, the latter now becomes a Netdata Parent. This means that metrics storage, metrics querying, health monitoring, and machine learning can be handled by the Netdata Parent, on behalf of the Netdata Children that push metrics to it.

This functionality is crucial for our ecosystem for the following reasons:

  1. Some nodes are ephemeral and may vanish at any point in time. But we need their metric data.
  2. Other nodes may be too sensitive to run all the features of a Netdata Agent. On such nodes we needed a way to use the absolute minimum of system resources for anything else except the core application that the node is hosting. So, on these Netdata Agents we can disable metrics storage, health monitoring, machine learning and push all metrics to another Netdata Agent that has the resources to spare for these tasks.
  3. High availability of metric data. In our industry, “one = none.” We need at least 2 of everything and this is true for metric data too. Parents allow us to replicate databases, even having different retention on each, thus significantly improving the availability of metrics data.

In this release we introduce significant improvements to Netdata Parents:

  1. Streaming Compression
    The communication between Netdata Agents is now compressed using LZ4 streaming compression, saving more than 70% of the bandwidth. TLS communication was already implemented and can be combined with compression.
  2. Active-Active Parents Clusters
    A Parent cluster of 2+ nodes can be configured by linking each of the parents to the others. Our configuration can easily take care of the circular dependency this implies. For 2 nodes you configure : A->B and B<-A. For 3 nodes: A->B/C, B->A/C, C->A/B. Once the parents are setup, configure Netdata Agents to push metrics to any of them (for 2 Parent nodes: A/B, for 3 Parent nodes: A/B/C). Each Netdata Agent will send metrics to only one of the configured parents at a time. But any of them. Then the Parent agents will re-stream metrics to each other.
  3. Replication of past data
    Now Parents can request missing data from each other and the origin data collecting Agent. This works seamlessly when two agents connect to each other (both have to be latest version). They exchange information about the retention each has and they automatically fill the gaps of the Parent agent, ensuring no data are lost at the Parents, even if a Parent was offline for some time (the default max replication duration is 1 day, but it can be tuned in stream.conf - and the connecting Agent Child needs to have data for at least that long in order for them to be replicated).
  4. Performance Improvements
    Now Netdata Parents can digest about 700k metric values per second per origin Agent. This is a huge improvement over the previous one of 400k. Also, when establishing a connection, the agents can accept about 2k metadata definitions per second per origin Agent. We moved all metadata management to a separate thread, and now we are experiencing 80k metric definitions per second per origin Agent, making new Agent connections enter the metrics streaming phase almost instantly.

All these improvements establish a huge step forward in providing an infinitely scalable monitoring infrastructure.

Database retention

Many users think of Netdata Agent as an amazing single node-monitoring solution, offering limited real-time retention to metrics. This changed slightly over the years as we introduced dbengine for storing metrics and even with the introduction of database tiering at the previous release, allowing Netdata to downscale metrics and store them for a longer duration.

As of this release, we now enable tiering by default! So, a typical Netdata Agent installation, with default settings, will now have 3 database tiers, offering a retention of about 120 - 150 days, using just 0.5 GB of disk space!

This is coupled with another significant achievement. Traditionally, the Agent dashboard showed only currently collected metrics. The dashboard of Netdata Cloud however, should present all the metrics that were available for the selected time-frame, independently of whether they are currently being collected or not. This is especially important for highly volatile environments, like Kubernetes, that metrics come and go all the time.

So, in this release, we rewrote the query engine of the Netdata Agent to properly query metrics independently of them being currently collected or not. In practice, the Agent is now sliced in two big modules: data collection and querying. These two parts do not depend on each other any more, allowing dashboards to query metrics for any time-frame there are data available.

This feature of querying past data even for non-collected metrics is available now via Netdata Cloud Overview dashboards.

New and improved system service integration

We have completely rewritten the part of the installer responsible for setting up Netdata as a system service. This includes a number of major improvements over the old code, including the following:

  • Instead of deciding which type of system service to install based on the distribution name and release, we now actively detect which service manager is in use and use that. This provides significantly better behavior on non-systemd systems, many of which were not actually getting the correct service type installed.
  • On FreeBSD systems, we now correctly install the rc.d script for Netdata to /usr/local/etc/rc.d instead of /etc/rc.d.
  • We now correctly enable and disable the agent as a system service correctly for all service managers we officially support. In particular, this means that users who are using a supported service manager should not need to do anything to enable the service.
  • Similarly, we now properly start the agent through the system service manager for all supported service managers.
  • We now have improved support for installing as a system service under WSL, including support for systemd in WSL, and correct fallbacks to LSB or initd style init scripts. This should make using Netdata under WSL much easier.
  • We now support installing service files for Netdata on offline systemd or OpenRC systems. This should greatly simplify installing the agent in containers or as part of setting up a virtual machine template.
  • Numerous minor improvements.

Additionally, this release includes a number of improvements to our OpenRC init script, bringing it more in-line with best practices for OpenRC init scripts, fixing a handful of bugs, and making it easier to run Netdata under OpenRC’s native process supervision.

We plan to continue improving this area in upcoming release cycles as well, including further improvements to our OpenRC support and preliminary support for installing Netdata as a service on systems using Runit.

Plugins function extension

As of this release, plugins can now register functions to the agent that can be executed on demand to provide real time, detailed and specific chart data. Via streaming, the definitions of functions are now transmitted to a parent and seamlessly exposed to the agent.

Disk based data indexing

Agents now build an optimized disk-based index file to reduce memory requirements up to 90%. In turn, the Agent startup time improved by 1,000% (You read this right; this is not a typo!).

Overview dashboard

The Overview dashboard is the key dashboard of the Netdata ecosystem. We are constantly putting effort into improving this dashboard so that it will eventually be unnecessary to use anything else.

Unlike the Netdata Agent dashboard, the Netdata Cloud Overview dashboard is multi-node, providing infrastructure and service level views of the metrics, seamlessly aggregating and correlating metrics from all Netdata Agents that participate in a war room.

We believe that dashboards should be fully automated and out-of-the-box, providing all the means for slicing and dicing data without learning any query language, without editing chart definitions, and without having a deep understanding of the underlying metrics, so that the monitoring system is fully functional and ready to be used for troubleshooting the moment it is installed.

Read more about our improvements to the Overview dashboard

Moving towards this goal, in this release we introduce the following improvements:

  1. A complete rewrite of the underlying core of the dashboard offers now huge performance improvements on dashboards with thousands of charts. Before this work, when the dashboard had thousands of charts, several seconds were required to jump from the top of the dashboard to the end. Now it is instant.
  2. We went through all the data collection plugins and metrics and we added labels to all of them, allowing the default charts on the Overview dashboard to pivot the charts, slicing and dicing the data according to these labels. For example, network interfaces charts can be pivoted by device name or interface type, while at the same time filtered by any of the labels, dimensions, instances or nodes. image
  3. We have started working on new summary tiles to outlook the sections of the dashboard in a more dynamic manner. This work has just started and we expect to introduce a lot of new changes heading into the next releeaseimage

Single node dashboard improvement

The Single Node view dashboard now uses the same engine as the Overview.

With this, you get a more consistent experience, but also:

  • The ability to run metric correlations across many nodes in your infrastructure.
  • All the grouping and filtering functions of the overview.
  • Reduced memory usage on the agent, as the old endpoints get deprecated.

We are working to bring similar improvements to the local Agent dashboard. In the meantime, it will look different than the Single Node view on Netdata Cloud. On Netdata Cloud we use composite charts, instead of separate charts, for each instance.

image

Netdata data source plugin for Grafana

This initial release of the Netdata data source plugin aims to maximize the troubleshooting capabilities of Netdata in Grafana, making them more widely available. It combines Netdata’s powerful collector engine with Grafana’s amazing visualization capabilities!

Read more about our source plugin for Grafana

explorer_9ae3iwJHsD

We expect that the Open-Source community will take a lot of value from this plugin, so we don’t plan on stopping here. We want to keep improving this plugin! We already have some enhancements on our backlog, including the following plans:

  • Enabling variable functionality
  • Allowing filtering with multiple key-value combinations)
  • Providing sample templates for certain use-cases, e.g. monitoring PostgreSQL

We would love to get you involved in this project! If you have ideas on things you would like to see or you just want to share a cool dashboard you have setup, you’re more than welcome to contribute.

Check out our blogpost and youtube video on this new plugin to see how it can work best for you.

New Unseen node state

To provide better visibility on different causes for why a node is Offline, we broke this status in to two separate statuses, so that you can now distinguish cases where a node never connected to Netdata Cloud successfully.

The following list presents our current node’s statuses and their meaning:

  • Live: Node is actual collecting and streaming metrics to Cloud
  • Stale: Node is currently offline and no streaming metrics to Cloud. It can show historical data from a parent node
  • Offline: Node is currently offline, not streaming metrics to Cloud and not available in any parent node
  • Unseen: Nodes have never been connected to Cloud, they are claimed but no successful connection was established

There are different reasons why a node can’t connect; the most common explanation for this falls into one of the following three categories:

  • The claiming process of the kickstart script was unsuccessful
  • Claiming on an older, deprecated version of the Agent
  • Network issues while connecting to the Cloud

For some guidelines on how to solve these issues, check our docs here.

Blogposts & Demo space use-case rooms

To better showcase the potentialities and upgrades of Netdata, we have made available multiple rooms in our Demo space to allow you to experience the power and simplicity of Netdata with live infrastructure monitoring.

PostgreSQL monitoring

Netdata’s new PostgreSQL collector offers a fully revamped comprehensive PostgreSQL DB monitoring experience. 100+ PostrgreSQL metrics are collected and visualized across 60+ composite charts. Netdata now collects metrics at per database, per table and per index granularity (besides the metrics that are global to the entire DB cluster) and lets users explore which table or index has a specific problem such as high cache miss, low rows fetched ratio (indicative of missing indexes) or bloat that’s eating up valuable space. The new collector also includes built-in alerts for several problem scenarios that a user is likely to run into on a PostgreSQL cluster. For more information, read our docs or our blogfor a deep dive into PostgreSQL and why these metrics matter.

image

Redis monitoring

Netdata’s Redis collector was updated to include new metrics crucial for database performance monitoring such as latency and new built-in alerts. For the full list of Redis metrics now available, read our docs or our blog for a deeper dive into Redis monitoring.

image

Cassandra monitoring

Netdata now monitors Cassandra, and comes with 25+ charts for all key Cassandra metrics. The collected metrics include throughput, latency, cache (key cache + row cache), disk usage and compaction, as well as JVM runtime metrics such as garbage collection. Any potential errors and exceptions that occur on your Cassandra cluster are also monitored. For more information read our docs or our blog.

image

Tech debt and Infrastructure improvements

To further improve Netdata Cloud and your user experience, multiple points around tech debt and infrastructure improvements have been completed. To name some of the key achievements:

  • An huge improvement has been made on our Overview tab on Netdata Cloud; we improved the performance around the navigation on the Table of Contents (TOC) and the charts on the viewport, contributing to a much better UX
  • The repos that support our FE have all been upgraded to node 16, putting us on the Active Long Term Support (LTS) version
  • We’ve replaced our MQTT broker VerneMQ with EMQX, which brings much more stability to the product.

Internal improvements

Asynchronous storing of metadata

We have improved the speed of chart creation by 70x. According to lab tests creating 30,000 charts with 10 dimensions each, we achieved a chart creation rates of 7000 charts/second (vs 100 charts/second prior)

Per host alert processing.

Alert processing for a host (e.g. child connected to a parent) is now done on its own host. Time-consuming health related initialization functions are deferred as needed and parallelized to improve performance.

Dictionary code improvements

Code improvements have been made to make use of dictionaries, better managing the life cycle of objects (creation, usage, and destruction using reference counters) and reducing explicit locking to access resources.

Acknowledgments

We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer is essential to our success. We thank you and look forward to continue to grow together to build a remarkable product.

  • @HG00 for improving RabbitMQ collector readme.
  • @KickerTom for improving Makefiles.
  • @MAH69IK for adding an option to retry on telegram API limit error.
  • @Pulseeey for adding CloudLinux OS detection during installation and update.
  • @candrews for improving netdata.service.
  • @uplime for fixing a typo in netdata-installer.sh.
  • @vobruba-martin for adding TCP socket connection support and the state path modification.
  • @yasharne for adding ProxySQL collector.

Contributions

Collectors

⚙️ Enhancing our collectors to collect all the data you need.

New collectors

Show 9 more contributions

Improvements

🐞 Improving our collectors one bug fix at a time.

Show 71 more contributions
  • Allow statsd tags to modify chart metadata on the fly (stats.d.plugin) (#14014, @ktsaou)
  • Add Cassandra icon to dashboard info (go.d/cassandra) (#13975, @ilyam8)
  • Add ping dashboard info and alarms (go.d/ping) (#13916, @ilyam8)
  • Add WMI Process dashboard info (go.d/wmi) (#13910, @thiagoftsm)
  • Add processes dashboard info (go.d/wmi) (#13910, @thiagoftsm)
  • Add TCP dashboard description (go.d/wmi) (#13878, @thiagoftsm)
  • Add Cassandra dashboard description (go.d/cassandra) (#13835, @thiagoftsm)
  • Respect NETDATA_INTERNALS_MONITORING (python.d.plugin) (#13793, @ilyam8)
  • Add ZFS hit rate charts (proc.plugin) (#13757, @vlvkobal)
  • Add alarms filtering via config (python.d/alarms) (#13701, @andrewm4894)
  • Add ProxySQL dashboard info (go.d/proxysql) (#13669, @ilyam8)
  • Update PostgreSQL dashboard info (go.d/postgres) (#13661, @ilyam8)
  • Add _collect_job label (job name) to charts (python.d.plugin) (#13648, @ilyam8)
  • Re-add chrome to the webbrowser group (apps.plugin) (#13642, @Ferroin)
  • Add labels to charts (tc.plugin) (#13634, @ktsaou)
  • Improve the gui and email app groups and improve GUI coverage (apps.plugin) (#13631, @Ferroin)
  • Update Postgres “connections” dashboard info (go.d/postgres) (#13619, @ilyam8)
  • Assorted updates for apps_groups.conf (apps.plugin) (#13618, @Ferroin)
  • Add spiceproxy to proxmox group (apps.plugin) (#13615, @ilyam8)
  • Improve coverage of Linux kernel threads (apps.plugin) (#13612, @Ferroin)
  • Improve dashboard info for WAL and checkpoints (go.d/postgres) (#13607, @shyamvalsan)
  • Update logind dashboard info (go.d/logind) (#13597, @ilyam8)
  • Add collecting power state (python.d/nvidia_smi) (#13580, @ilyam8)
  • Improve PostgreSQL dashboard info (go.d/postgres) (#13573, @shyamvalsan)
  • Add apt group to apps_groups.conf (apps.plguin) (#13571, @andrewm4894)
  • Add more monitoring tools to apps_groups.conf (apps.plugin) (#13566, @andrewm4894)
  • Add docker dashboard info (go.d/docker) (#13547, @ilyam8)
  • Add discovering chips, and features at runtime (python.d/sensors) (#13545, @ilyam8)
  • Add summary dashboard for PostgreSQL (go.d/postgres) (#13534, @shyamvalsan)
  • Add jupyter to apps_groups.conf (apps.plugin) (#13533, @andrewm4894)
  • Improve performance and add co-re support for more modules (ebpf.plugin) (#13530, @thiagoftsm)
  • Use LVM UUIDs in chart ids for logical volumes (proc.plugin) (#13525, @vlvkobal)
  • Reduce CPU and memory usage (ebpf.plugin) (#13397, @thiagoftsm)
  • Add ‘domain’ label to charts (go.d/whoisquery) (#1002, @ilyam8)
  • Add ‘source’ label to charts (go.d/x509check) (#1001, @ilyam8)
  • Add ‘host’ label to charts (go.d/portcheck) (#1000, @ilyam8)
  • Add ‘url’ label to charts (go.d/httpcheck) (#999, @ilyam8)
  • Remove pipeline instance from family and add it as a chart label (go.d/logstash) (#998, @ilyam8)
  • Add http cache io/iops metrics (go.d/nginxplus) (#997, @ilyam8)
  • Add resolver metrics (go.d/nginxplus) (#996, @ilyam8)
  • Add MSSQL metrics (go.d/wmi) (#991, @thiagoftsm)
  • Add IIS data collection job (go.d/web_log) (#977, @thiagoftsm)
  • Add IIS metrics (go.d/wmi) (#972, @thiagoftsm)
  • Add services metrics (go.d/wmi) (#961, @thiagoftsm)
  • Resolve ‘hostname’ in job name (go.d.plugin) (#959, @ilyam8)
  • Add processes metrics (go.d/wmi) (#953, @thiagoftsm)
  • Resolve ‘hostname’ in URL (go.d.plugin) (#941, @ilyam8)
  • Add TCP metrics (go.d/wmi) (#938, @thiagoftsm)
  • Add collection of Table_open_cache_overflows (go.d/dns_query) (#936, @ilyam8)
  • Allow to set a list of record types in config (go.d/dns_query) (#912, @ilyam8)
  • Create a chart per server instead of a dimension per server (go.d/dns_query) (#911, @ilyam8)
  • Respect NETDATA_INTERNALS_MONITORING env variable (go.d.plugin) (#908, @ilyam8)
  • Add query status chart (go.d/dns_query) (#903, @ilyam8)
  • Add collection of agent metrics (go.d/consul) (#900, @ilyam8)
  • Create a chart per health check (go.d/consul) (#899, @ilyam8)
  • Add collection of master link status (go.d/redis) (#856, @ilyam8)
  • Add collection of master slave link metrics (go.d/redis) (#851, @ilyam8)
  • Add collection of time elapsed since last RDB save (go.d/redis) (#850, @ilyam8)
  • Add ping latency chart (go.d/redis) (#849, @ilyam8)
  • Check for ‘connect’ privilege before querying database size (go.d/postgres) (#845, @ilyam8)
  • Allow to set data collection job labels in config (go.d.plugin) (#840, @ilyam8)
  • Improve histogram buckets dimensions (go.d/postgres) (#833, @ilyam8)
  • Add acquired locks utilization chart (go.d/postgres) (#831, @ilyam8)
  • Add _collect_job label (job name) to charts (go.d.plugin) (#814, @ilyam8)
  • Add TCP socket connection support and the state path modification (go.d/phpfpm) (#805, @vobruba-martin)
  • Create a dimension for every unit state (go.d/systemdunits) (#795, @ilyam8)
  • Improve Galera state and status charts (#779, @ilyam8)
  • Add discovering dhcp-ranges at runtime (go.d/dnsmasq_dhcp) (#778, @ilyam8)
  • Add collecting image and volume stats (go.d/docker) (#777, @ilyam8)
  • Add Percona MySQL compatibility (go.d/mysql) (#776, @ilyam8)
  • Add collection of additional user statistics metrics (#775, @ilyam8)

Bug fixes

Show 24 more contributions
  • Fix eBPF crashes on exit (ebpf.plugin) (#14012, @thiagoftsm)
  • Fix not working on Oracle linux (ebpf.plugin) (#13935, @thiagoftsm)
  • Fix retry logic when reading network interfaces speed (proc.plugin) (#13893, @ilyam8)
  • Fix systemd chart update (ebpf.plugin) (#13884, @thiagoftsm)
  • Fix handling qemu-1- prefix when extracting virsh domain (#13866, @ilyam8)
  • Fix collection of carrier, duplex, and speed metrics when network interface is down (proc.plugin) (#13850, @vlvkobal)
  • Fix various issues (ebpf.plugin) (#13624, @thiagoftsm)
  • Fix apps plugin users charts description (apps.plugin) (#13621, @ilyam8)
  • Fix chart id length check (cgroups.plugin) (#13601, @ilyam8)
  • Fix not respecting update_every for polling (python.d/nvidia_smi) (#13579, @ilyam8)
  • Fix containers name resolution when Docker is a snap package (cgroups.plugin) (#13523, @ilyam8)
  • Fix handling string and float values (go.d/nvme) (#993, @ilyam8)
  • Fix handling ExpirationDate with space (go.d/whoisquery) (#974, @ilyam8)
  • Fix query queryable databases (go.d/postgres) (#960, @ilyam8)
  • Fix not respecting headers config option (go.d/pihole) (#942, @ilyam8)
  • Fix dns_queries_percentage metric calculation (go.d/pihole) (#922, @ilyam8)
  • Fix data collection when auth.bind query is not supported (go.d/dnsmasq) (#902, @ilyam8)
  • Fix data collection when too many db tables and indexes (go.d/postgres) (#857, @ilyam8)
  • Fix creation of bloat charts if no bloat metrics collected (go.d/postgres) (#846, @ilyam8)
  • Fix unregistering connStr at runtime (go.d/postgres) (#843, @ilyam8)
  • Fix bloat size percentage calculation (go.d/postgres) (#841, @ilyam8)
  • Fix charts when binary log and MyISAM are disabled (go.d/mysql) (#763, @ilyam8)
  • Fix data collection jobs cleanup on exit (go.d.plugin) (#758, @ilyam8)
  • Fix handling the case when no images are found (go.d/docker) (#739, @ilyam8)

Other

Show 11 more contributions
  • Don’t let slow disk plugin thread delay shutdown (#14044, @MrZammler)
  • Remove nginx_plus collector (python.d.plugin) (#13995, @ilyam8)
  • Enable collecting ECC memory errors by default (#13970, @ilyam8)
  • Make Statsd dictionaries multi-threaded (#13938, @ktsaou)
  • Remove NFS readahead histogram (proc.plugin) (#13819, @vlvkobal)
  • Merge netstat, snmp, and snmp6 modules (proc.plugin) (#13806, @vlvkobal)
  • Rename dockerd job on lock registration (python.d/dockerd) (#13537, @ilyam8)
  • Remove python.d/* announced in v1.36.0 deprecation notice (python.d.plugin) (#13503, @ilyam8)
  • Remove blocklist file existence state chart (go.d/pihole) (#914, @ilyam8)
  • Remove instance-specific information from chart families (go.d/portcheck) (#790, @ilyam8)
  • Remove spaces in “HTTP Response Time” chart dimensions (go.d/httpcheck) (#788, @ilyam8)

Documentation

📄 Keeping our documentation healthy together with our awesome community.

Updates

Show 24 more contributions

Health

Engine

Notifications

  • Add an option to retry on telegram API limit error (#13119, @MAH69IK)
  • Set default curl connection timeout if not set (#13529, @ilyam8)

Alarms

Show 12 more contributions
  • Use ‘host’ label in alerts info (health.d/ping.conf) (#13955, @ilyam8)
  • Remove pihole_blocklist_gravity_file_existence_state (health.d/pihole.conf) (#13826, @ilyam8)
  • Fix the systemd_mount_unit_failed_state alarm name (health.d/systemdunits.conf) (#13796, @tkatsoulas)
  • Add 1m delay for tcp reset alarms (health.d/tcp_resets.conf) (#13761, @ilyam8)
  • Add new Redis alarms (health.d/redis.conf) (#13715, @ilyam8)
  • Fix inconsistent alert class names (#13699, @ralphm)
  • Disable Postgres last vacuum/analyze alarms (health.d/postgres.conf) (#13698, @ilyam8)
  • Add node level AR based example (health.d/ml.conf) (#13684, @andrewm4894)
  • Add Postgres alarms (health.d/postgres.conf) (#13671, @ilyam8)
  • Adjust systemdunits alarms (health.d/systemdunits.conf) (#13623, @ilyam8)
  • Add Postgres total connection utilization alarm (health.d/postgres.conf) (#13620, @ilyam8)
  • Adjust mysql_galera_cluster_size_max_2m lookup to make time in warn/crit predictable (health.d/mysql.conf) (#13563, @ilyam8)

Packaging / Installation

Changes

Show 28 more contributions
  • Fix writing to stdout if static update is successful (#14058, @ilyam8)
  • Update go.d.plugin to v0.45.0 (#14052, @ilyam8)
  • Provide improved messaging in the kickstart script for existing installs managed by the system package manager (#13947, @Ferroin)
  • Add CAP_NET_RAW to go.d.plugin (#13909, @ilyam8)
  • Record installation command in telemetry events (#13892, @Ferroin)
  • Overhaul generation of distinct-ids for install telemetry events (#13891, @Ferroin)
  • Prompt users about updates/claiming on unknown install types (#13890, @Ferroin)
  • Fix duplicate error code in kickstart.sh (#13887, @Ferroin)
  • Properly guard commands when installing services for offline service managers (#13848, @Ferroin)
  • Fix service installation on FreeBSD. (#13842, @Ferroin)
  • Improve error and warning messages in the kickstart script (#13825, @Ferroin)
  • Properly propagate errors from installer/updater to kickstart script (#13802, @Ferroin)
  • Fix runtime directory ownership when installed as non-root user (#13797, @Ferroin)
  • Stop pulling in netcat as a mandatory dependency (#13787, @Ferroin)
  • Add Ubuntu 22.10 to supported distros, CI, and package builds (#13785, @Ferroin)
  • Allow netdata installer to install and run netdata as any user (#13780, @ktsaou)
  • Update libbpf to v1.0.1 (#13778, @thiagoftsm)
  • Further improvements to the new service installation code (#13774, @Ferroin)
  • Use /bin/sh instead of ls to detect glibc (#13758, @MrZammler)
  • Add CloudLinux OS detection to the updater script (#13752, @Pulseeey)
  • Add CloudLinux OS detection to kickstart (#13750, @Pulseeey)
  • Fix handling of temporary directories in kickstart code. (#13744, @Ferroin)
  • Fix a typo in netdata-installer.sh (#13514, @uplime)
  • Add CAP_NET_ADMIN for go.d.plugin (#13507, @ilyam8)
  • Update PIDFile in netdata.service to avoid systemd legacy path warning (#13504, @candrews)
  • Overhaul handling of installation of Netdata as a system service. (#13451, @Ferroin)
  • Fix existing install detection for FreeBSD and macOS (#13243, @Ferroin)
  • Assorted cleanup in the OpenRC init script (#13115, @Ferroin)

Other Notable Changes

⚙️ Greasing the gears to smooth your experience with Netdata.

Improvements

Show 9 more contributions
  • Add replication of metrics (gaps filling) during streaming (#13873, @vkalintiris)
  • Remove anomaly rates chart (#13763, @vkalintiris)
  • Add disabling netdata monitoring section of the dashboard (#13788, @ktsaou)
  • Add host labels for ephemerality and nodes with unstable connections (#13784, @underhood)
  • Allow netdata plugins to expose functions for querying more information about specific charts (#13720, @ktsaou)
  • Improve Health engine performance by adding a thread per host (#13712, @MrZammler)
  • Improve streaming performance by 25% on the child (#13708, @ktsaou)
  • Improve agent shutdown time (#13649, @stelfrag)
  • Add disabling Cloud functionality via NETDATA_DISABLE_CLOUD environment variable (#13106, @ilyam8)

Bug Fixes

🐞 Increasing Netdata’s reliability, one bug fix at a time.

Show 46 more contributions

Code organization

Changes

Show 92 more contributions

Deprecation and product notices

Forthcoming deprecation notice

The following items will be removed in our next minor release (v1.38.0):

Patch releases (if any) will not be affected.

Component Type Will be replaced by
python.d/dockerd collector go.d/docker
python.d/logind collector go.d/logind
python.d/mongodb collector go.d/mongodb
fping collector go.d/ping

All the deprecated components will be moved to the netdata/community repository.

Deprecated in this release

In accordance with our previous deprecation notice, the following items have been removed in this release:

Component Type Replaced by
python.d/postgres collector go.d/postgres

Notable changes and suggested actions

Kickstart unrecognized option error

In an effort to improve our kickstart script even more, documented here and here, a change will be made in the next major release that will result in users receiving an error if they pass an unrecognized option, rather than allowing them to pass through the installer code.

New documentation structure

In the coming weeks, we will be introducing a new structure to Netdata Learn. Part of this effort includes having healthy redirects, instructions, and landing pages to minimize confusion and lost bookmarks, but users may still encounter broken links or errors when loading moved or deleted pages. Users can feel free to submit a Github Issues if they encounter such a problem, or reach out to us on Discord or the Community forum with questions or ideas on how our docs can best serve you.

External plugin packaging (Possible action required)

In a forthcoming release, many external plugins will be moved to their own packages in our native packages to allow enhanced control over what plugins you have installed, to preseve bandwidth when updating, and to avoid some potentially undesirable dependencies. As a result of this, at some point during the lead-up to the next minor release, the following plugins will no longer be installed by default on systems using native packages, and users with any of these plugins on an existing install will need to manually install the packages in order to continue using them:

  • nfacct
  • ioping
  • slabinfo
  • perf
  • charts.d

Note: Static builds and locally built installs are unaffected. Netdata will provide more details once the changes go live.

Netdata Release Meetup

Join the Netdata team on the 1st of December, at 5PM UTC, for the Netdata Release Meetup, which will be held on the Netdata Discord.

Together we’ll cover:

  • Release Highlights
  • Acknowledgements
  • Q&A with the community

RSVP now - we look forward to meeting you.

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in Netdata, feel free to contact us through one of the following channels:

Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata. Github Issues: Make use of the Netdata repository to report bugs or open a new feature request. Github Discussions: Join the conversation around the Netdata development process and be a part of it. Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base. Discord: Jump into the Netdata Discord and hangout with like-minded sysadmins, DevOps, SREs and other troubleshooters. More than 1300 engineers are already using it!