Find out how to effectively and easily monitor and troubleshoot Chrony using Netdata
What is Chrony
Chrony is an open source, low-level utility for managing the system clock. It can be used to maintain the accuracy of the computer's clock across a network, or even in the absence of an internet connection. Chrony is designed to be more accurate and resilient than the traditional utilities such as ntpd, and can adjust the system clock even in the presence of large time offsets and/or network outages. Chrony also offers a number of features such as automatic time synchronization, access control lists, and logging.+
Monitoring Chrony with Netdata
The prerequisites for monitoring Chrony with Netdata are to have Chrony and Netdata installed on your system.
Netdata auto discovers hundreds of services, and for those that aren't discvovered, you can use manual discovery with a one line configuration. For more information on configuring Netdata for Chrony monitoring please read the collector documentation.
You should now see the Chrony section on the Overview tab in Netdata Cloud already populated with charts about all the metrics you care about.
Netdata has a public demo space (no login required) where you can explore different monitoring use-cases and get a feel for Netdata.
What Chrony metrics are important to monitor?
- The stratum indicates the distance (hops) to the computer with the reference clock. The higher the stratum number the more the timing accuracy and stability degrades.
- Any error in the system clock is corrected by slightly speeding up or slowing down the system clock until the error has been removed and then returning to the system clock’s normal speed. A consequence of this is that there will be a period when the system clock (as read by other programs) will be different from chronyd\s estimate of the current true time (which it reports to NTP clients when it is operating as a server). The reported value is the difference due to this effect.
- The total of the network path delays to the stratum-1 computer from which the computer is ultimately synchronised.
- The total dispersion accumulated through all the computers back to the stratum-1 computer from which the computer is ultimately synchronised. Dispersion is due to system clock resolution statistical measurement variations etc.
- The estimated local offset on the last clock update. A positive value indicates the local time (as previously estimated true time) was ahead of the time sources.
- The root mean square (RMS) offset of the system clock from true time. Large offsets may indicate a problem with the clock or network synchronization.
- The frequency is the rate by which the system’s clock would be wrong if chronyd was not correcting it. It is expressed in ppm (parts per million). For example a value of 1 ppm would mean that when the system’s clock thinks it has advanced 1 second it has actually advanced by 1.000001 seconds relative to true time.
- The residual frequency for the currently selected reference source. This reflects any difference between what the measurements from the reference source indicate the frequency should be and the frequency currently being used. The reason this is not always zero is that a smoothing procedure is applied to the frequency.
- The estimated error bound on the frequency.
- The interval between clock updates. Shorter intervals may improve accuracy but may also increase network load.
- The time elapsed since the last measurement from the reference source was processed.
- The current leap status of the source. Statuses include the following:
- Normal - indicates the normal status (no leap second).
- InsertSecond - indicates that a leap second will be inserted at the end of the month.
- DeleteSecond - indicates that a leap second will be deleted at the end of the month.
- Unsynchronised - the server has not synchronized properly with the NTP server.
- The number of servers and peers that are online and offline. The following explains the status options:
- **Online** - the server or peer is currently online (i.e. assumed by chronyd to be reachable).
- **Offline** - the server or peer is currently offline (i.e. assumed by chronyd to be unreachable and no measurements from it will be attempted).
- **BurstOnline** - a burst command has been initiated for the server or peer and is being performed. After the burst is complete the server or peer will be returned to the online state.
- **BurstOffline** - a burst command has been initiated for the server or peer and is being performed. After the burst is complete the server or peer will be returned to the offline state.
- **Unresolved** - the name of the server or peer was not resolved to an address yet.
Troubleshooting Chrony with Netdata
Netdata has built-in alerts to reduce the monitoring burden for you.
If you would like to update the alert thresholds for any of these alerts or want to create your own alert for another metric – please follow the instructions here.
By default you will receive email notifications whenever an alert is triggered – if you would not like to receive these notifications you can turn them off from your profile settings.
Anomaly Advisor lets you quickly identify if the system you are monitoring has any anomalies and allows you to drill down into which metrics are behaving anomalously.
To learn more about how to use Anomaly Advisor to troubleshoot your Apache web server check out the documentation or visit the anomalies tab in the demo space to play with it right now.
Metric Correlations lets you quickly find metrics and charts related to a particular window of interest that you want to explore further. By displaying the standard Netdata dashboard, filtered to show only charts that are relevant to the window of interest, you can get to the root cause sooner.
Let us hear from you
If you haven’t already, sign up now for a free Netdata account!
We’d love to hear from you – if you have any questions, complaints or feedback please reach out to us on Discord or Github.