The Challenge of Metrics Visualization
In general, the usual display of time series data is a line graph - makes sense, it's the most direct visual translation of the data. The problem I keep running into is not with the graphs or the data, but expectations. In my work the biggest consumers of graphs are engineering teams and engineers love data, and they want all of it. That's a good thing, but it causes misunderstandings when they start looking at their graphs.
A visualization is by its very nature an abstraction from the raw data. It's the translation of the data into a format that the human brain can process in a more intuitive way. The raw data is great for some types of analysis, sort it on value and you'll quickly determine which metric is highest or lowest. But show the average person a huge table of metrics and numbers and their brain is not going to be able to see patterns and trends in it. That's where graphs come in. Translate that pile of numbers into a graph and your brain will quickly spot the trends and evaluate the shape of the data. To be effective the graph must be an accurate representation of the data - scale matters, as does being able to trust that the data behind the graph hasn't been manipulated to fundamentally change its values. The visualization isn't meant to give you access to the individual values of tens of thousands of data points on a line, it's meant to show you how that data looks and is trending over time.
The challenge in creating a visualization from a large amount of metric data, then, is that the limiting factor is the display technology. The Dell 3007WFP 30" monitor has a max resolution of 2,560 horizontal pixels. Apple's 27" Thunderbolt display has the same horizontal resolution. If you really splurge and get something like Dell's 32" Ultra HD monitor you can bump that up to 3,840 pixels. Those are at the outside of what you can expect for a display. Say you want to maximize the investment and you put a dashboard on the Ultra HD monitor and set it up so that there are three columns of graphs. Allowing for a minimal amount of space between the graphs themselves, each graph is giving you somewhere just over 1,000 pixels of horizontal display. Now pull in a week's worth of data from your server, sampled at 1-minute intervals and you're dealing with 10,080 individual data points - nearly 10 times what even the Ultra HD screen is able to display on a graph in that three-column layout. Even if you go full-screen with a single graph you still have more than double the amount of data than is possible to display in the pixel space you have. If your sampling interval is more frequent than one minute the issue is just compounded. So how do you show that much data in the limited space you have? Downsampling.
Dealing with Downsampling
Make no mistake, even if you feed the full, raw data stream into a graph you're getting a downsampled representation. It's just not possible to draw 10,000+ horizontal data points in 2,500 pixels so the visualization is effectively downsampled by that limitation. Try adding some interactivity to your graph, like highlighting an individual data point on the line, and things are just going to get messy. The solution is to downsample the data to a level appropriate for the display area you have, but care must be taken to ensure the character of the data is not changed during downsampling, potentially causing the visualization to be misleading.
Most downsampling of time-series data is done using one of a few common methods, things like sum, average, maximum value, or minimum value. Each divides the data into a number of buckets, usually based on a time interval, and performs the chosen calculation, as you would expect:
- Sum Returns the sum of all data values in the bucket.
- Average Returns the average of all data values in the bucket.
- Maximum Value Checks each value in the bucket and returns the largest single value.
- Minimum Value Checks each value in the bucket and returns the smallest single value.
In any of these cases the visual representation of the line could be altered, and with the Sum downsampling in particular the scale of the y-axis will change. I've been unhappy with these accepted downsampling methods for some time, but I haven't been able to find a better solution. Until now - enter Largest-Triangle-Three-Buckets (which I'm going to abbreviate as LTTB from here on out) downsampling.
Practical LTTB Application in StatusWolf
The current release of StatusWolf uses the traditional downsampling methods as detailed above, pushing the choice of method and time interval to the user.
In the next release that will be replaced with a much simpler choice:
All previous downsampling code in StatusWolf has been removed and replaced with a function that implements LTTB. Besides the greater visual fidelity of the line, this change also enables StatusWolf to resample the data live if the size of the graph changes, e.g. when maximizing a graph, changing the number of dashboard columns, resizing the browser window.
To demonstrate how LTTB maintains the visual integrity of the graph, this example shows a week's worth of data graphed with no downsampling applied vs. with LTTB applied based on the size of the graph element:
One week worth of data (321,579 data points), with no downsampling applied.
The same data set with LTTB downsampling applied based on the size of the graph element, now displayed in 300 data points.
As an additional benefit, the downsampled version is more responsive in the browser as well. Even though the graph line in either case is a single SVG element added to the DOM, D3 still tracks the X and Y axis values for interactivity. Graphing 300,000+ points doesn't crash the browser (as it would if adding a DOM element for each point), but it definitely makes it unhappy and degrades the experience for the end user.
Keeping the Raw Data Available
Changing the downsampling method so that it is done live in the browser has also enabled StatusWolf to make the entire raw data set available to the user should they wish to download it and work with it further.