Retrieving downsampled periods efficiently

Hi Everybody,

I’m designing a system for quick and efficient exploration (visualisation) of large amounts of financial time series.
I’ll describe shortly my naive design below and would really like to hear comments on how it can be improved (performance, flexibility,…) from knowledgable Influx users and people who have done similar work before.

Data is downsampled. A separate measure for each time period:
raw - raw data points with nanosecond resolution
seconds - 1 point per second
minutes - 1 point per minute
hours - 1 point per hours
days - 1 point per day
weeks - 1 point per week
months - 1 point per month
years - 1 point per year

Example query:
On a 600 points chart canvas display the points for the following period: 2018-01-16 12:12:12 - 2018-09-16 12:12:12

Naive algorithm pseudo code:

for each period in [years, months, weeks, .., raw]

	points_count = select count(1) from period
	
	if points_count >= 600 or period.is_raw()
		points = select fields from period where time between start and end
		aggregated = aggregate(points, 600)
		return aggregated

Is this a worthy solution?
Any ideas how to improve?

Sincerely,
Michael

Assume you will have to filter the points_count query by time as well:

points_count = select count(1) from period where time between start and end

The downside to this approach is that you will potentially run several queries before figuring out which period has adequate resolution. For example, if you are doing a very small time range, and you end up querying from raw, you will have executed eight points_count queries before you get to raw.

Instead, you can perform static analysis on the time range to determine which period you should use. Rather than a points_count query, you can calculate how many years, months, weeks, etc. are in the time range using date/time math and then use that value to choose the period.

BTW, this use case is related to this feature request: Intelligent Rollups and Querying of Aggregated Data · Issue #7198 · influxdata/influxdb · GitHub

Thanks for thinking through this!

In this link from #7198 they do exactly the static analysis that you’ve described: influxgraph/utils.py at 7be6d2aa7bf7e7c516c25216a024ca1026c1c2ed · InfluxGraph/influxgraph · GitHub