We have recently added a more detailed anomaly rate chart to Netdata that breaks out the overall node anomaly rate by type, this lets you more easily see what parts of your infrastructure might be experiencing an uptick in anomalies when you see the overall node anomaly rate increase.
type is generally the prefix of the chart id in Netdata and controls where charts live within the menu on the overview page, for example the
mem.available chart has a type of
mem which in part controls why it lives under the "Memory" section of the menu.
You can read a bit more about
type and chart id's and specifications in general in the docs on Netdata Learn.
So having anomaly rates by type can let you quickly get a feel for what "physical" parts of your infrastructure are experiencing an uptick in anomalies when you see the overall node anomaly rate increase.
Anomaly rate by
Since every metric in Netdata also has a corresponding
anomaly-bit, its easy to calculate anomaly rates on any subset of metrics across your infrastructure. This is what we are doing here, we are simply calculating the anomaly rate over all metrics that have a given
type and then collecting that as a "synthetic" or "derived" metric.
Example 1 - spike in
Here is an example of how you might find this useful. In the below screenshot we can see that within the highlighted area, the overall node anomaly rate has suddenly increased to around 3%. You can clearly see this in the
anomaly_detection.anomaly_rate chart. This gives us an idea there is something going on, but not much more than that.
Now with the addition of the
anomaly_detection.type_anomaly_rate chart just below, we can see that this spike is mostly made up of spikes within the
If we quickly navigate to the network related parts of the dashboard, sure enough we can see some anomalies in the
net_packets charts where we have increases
received traffic that has been flagged as anomalous.
Example 2 - some
Similarly, in the period following our initial anomaly i was doing some troubleshooting on my disks and so we also see a little spike in the node anomaly rate but this time the
anomaly_detection.type_anomaly_rate chart is suggesting its something related to various
And, sure enough, looking at some of my
disk charts i can see the activity that has been flagged as anomalous.
Both of the examples above illustrate how you can easily use the
anomaly_detection.type_anomaly_rate chart to quickly see what might be "underneath" the overall node anomaly rate when it increases.
Next steps -
user anomaly rates
This is our first step (done in this PR) in providing lower level anomaly rates below the summary node anomaly rate. We are planning on adding anomaly rates by
user soon so that you can easily see what applications and users are experiencing an uptick in anomalies. You can follow along in this GitHub issue which is just part of our wider "Machine Learning Roadmap" project which is also publicly available on GitHub.
Of course you can always also use the Anomaly Advisor to get a more detailed bottom up view of all anomalies in a highlighted window or the "AR%" button to surface the highest chart anomaly rates within each menu section. Really this is just another quick way to view things that might come in useful.
Try it for yourself in our "Machine Learning" demo room here.