Retrieving downsampled periods efficiently

michaelr524 · September 30, 2018, 1:43pm

Hi Everybody,

I’m designing a system for quick and efficient exploration (visualisation) of large amounts of financial time series.
I’ll describe shortly my naive design below and would really like to hear comments on how it can be improved (performance, flexibility,…) from knowledgable Influx users and people who have done similar work before.

Data is downsampled. A separate measure for each time period:
raw - raw data points with nanosecond resolution
seconds - 1 point per second
minutes - 1 point per minute
hours - 1 point per hours
days - 1 point per day
weeks - 1 point per week
months - 1 point per month
years - 1 point per year

Example query:
On a 600 points chart canvas display the points for the following period: 2018-01-16 12:12:12 - 2018-09-16 12:12:12

Naive algorithm pseudo code:

for each period in [years, months, weeks, .., raw]

	points_count = select count(1) from period
	
	if points_count >= 600 or period.is_raw()
		points = select fields from period where time between start and end
		aggregated = aggregate(points, 600)
		return aggregated

Is this a worthy solution?
Any ideas how to improve?

Sincerely,
Michael

mhfrantz · September 30, 2018, 4:01pm

Assume you will have to filter the points_count query by time as well:

points_count = select count(1) from period where time between start and end

The downside to this approach is that you will potentially run several queries before figuring out which period has adequate resolution. For example, if you are doing a very small time range, and you end up querying from raw, you will have executed eight points_count queries before you get to raw.

Instead, you can perform static analysis on the time range to determine which period you should use. Rather than a points_count query, you can calculate how many years, months, weeks, etc. are in the time range using date/time math and then use that value to choose the period.

BTW, this use case is related to this feature request: Intelligent Rollups and Querying of Aggregated Data · Issue #7198 · influxdata/influxdb · GitHub

michaelr524 · September 30, 2018, 4:19pm

Thanks for thinking through this!

In this link from #7198 they do exactly the static analysis that you’ve described: influxgraph/utils.py at 7be6d2aa7bf7e7c516c25216a024ca1026c1c2ed · InfluxGraph/influxgraph · GitHub

Topic		Replies	Views
InfluxDB Need assistance with downsampling InfluxQL influxdb , influxql , query	4	915	May 27, 2021
User experience with small sample sizes	0	316	December 16, 2020
Downsample a year of data by weeks and month	1	2179	February 14, 2018
Downsampling - TICKscript, all measurement data (wildcard) Store	4	1109	January 14, 2020
Writing and Querying for values at a specific point in time Store	3	1445	March 30, 2017

Retrieving downsampled periods efficiently

Related topics