Timeseries overlay for regression testing (or relative timeseries)

sebito91 · June 15, 2017, 4:17pm

The current implementation of InfluxDB is excellent at handling timeseries data provided it’s consistently moving forward. Yes you can write in the past, but the point here is that you query for data in a specific time range to visualize what happens over that specific range.

In the case of regression tests, you may want to overlay the results from a variety of iterations over the same time range. The problem with this currently is that each run would have to have the exact same time slice for each iteration to overlay properly.

The workaround for this at the moment is to artificially start your time at a known zero endpoint (say timestamp == 0000000000000000000) and then overlays would work, in theory. But this isn’t necessarily a clean or effective method of overlaying information. The question then becomes:

Can we look to add functionality in the future that would provide relative timestamping, or shifting, similar to what tools like splunk would do? This would allow any set of runs to be graphed on top of each other to show quick diffs
Is writing at an artificial timestamp of zero the best way to do this with current toolset?

This conversation starter (hopefully) is intended to discuss the tools within influxdb itself. We can definitely do this by pulling the data into pandas with influxdb-python, as an example, but is there a way we co do this natively?

pauldix · June 16, 2017, 5:45pm

I think the only way to do this natively for the moment is to use timestamps starting at zero. What I’m thinking here is something like:

test,iteration=1 val=233 0

So you have separate series for every iteration. So you’d get the data through:

SELECT val from test GROUP BY iteration

However, that could have performance problems when you have many shards and the DB must quickly query all of them to pull back the data. You’d have to do some testing to see.

I’ve seen this request for a few different use cases so it certainly makes sense to add something like this in. The new query language will have functions to transform series like doing interpolation or normalizing the time stamps.

Does every iteration always have the same number of data points in it? Can you show an example schema of what the data looks like? That’ll help me design for the future

sebito91 · June 17, 2017, 2:33am

Thanks for the reply @pauldix!

I’m thinking of cases like regression testing for new hardware components, or new firmware on a baseline machine. In most cases samples are taken over time at (mostly) regular intervals; they don’t need to be exactly inline as I’d expect them to be standard timeseries measures just over different ranges. Ideally we wouldn’t use something like timeshift, that’s kinda wonky. Starting from time=0 seems the most logical step for now, but would be cool to think about this as a possible additional feature for v2.0.

One opensource tool that’s somewhat interesting for this is the phoronix-test suite, or even more simple things like a standardized linux perf tests over time. Not only would we be able to correlate test data, but we could overlay standard telegraf status like CPU or mem, or anything else to help tell the story.

This is purely a pony feature, so definitely not required for the next phase…but would be cool if we could incorporate it

eostermueller · July 3, 2017, 2:06pm

@sebito91 does this request describe what you’re asking for?

github.com/influxdata/influxdb

Support lag variables

opened 02:56PM - 18 Dec 13 UTC

closed 10:35PM - 18 Mar 16 UTC

pauldix

area/functions kind/feature-request

We should support bringing in lag variables so people can calculate changes and …things like that. For example you have the following data: ``` time, value: 1, 6 2, 7 3, 4 ``` And you run this query: `select value, lag(value, 1) from some_series` you would get: ``` time, value, value_lag_1 1, 6, null 2, 7, 6 3, 4, 7 ``` So then you could do a query like: `select value - lag(value, 1) as change from some_series`: ``` time, change 1, 6 2, 1 3, -3 ``` The argument to the lag function tells it how many lagging points you want to include. For example if you do `select value, lag(value, 2)` you'd get: ``` time, value, value_lag_1, value_lag_2 1, 6, null, null 2, 7, 6, null 3, 4, 7, 6 ```

@pauldix wrote:

Does every iteration always have the same number of data points in it?

Every iteration will be roughly the same time duration – about 10 minutes for us.
I can imagine that some tests would be a few seconds shorter/longer. Shutdown time for a load test might vary slightly each jmeter test.

We plan on using this functionality to push JMeter load test data to influxdb:
http://jmeter.apache.org/usermanual/realtime-results.html#influxdb_configuration

We will run nightly tests, about 10 minutes each. Then we’d like to compare the results of recent load tests.
Or, we might compare selected load tests from the last 5 years of data to see whether performance of our web applications (as seen thru jmeter data) has improved or degraded.

Hesitantly, we might use this as a temporary solution:

If we’re comparing data (say response time) from 4 different tests, we’ll create one query for each of the 4 tests as described here:

github.com/maxsivanov/influxdb-timeshift-proxy

a question

opened 07:53PM - 26 May 17 UTC

closed 07:30AM - 29 May 17 UTC

eostermueller

Hello, Say we run four different 1-hour tests over the course of a week, coll…ecting data using this: http://jmeter.apache.org/usermanual/realtime-results.html#influxdb ...and we'd like to see response time from all four tests compared to each other on a single graph, kind like this: https://github.com/grafana/grafana/issues/171#issuecomment-113533494 Could you suggest what kind of query we'd use, perhaps with your "shift_x_seconds" syntax? Thanks for this interesting project, --Erik

…and then we’ll configure all four queries to render in a single grafana graph.

You could use influxdb-timeshift-proxy to evaluate options for implementing a native influxdb feature.

–Erik

Salamanca · July 5, 2018, 4:34pm

when the timeshift function will be officially available?

the timeshift roadmap is key to deciding to implement my business solution in graphite or influxdb

Regards,
Jesús

Topic		Replies	Views
Consider points outside of time range Dashboards influxdb , grafana , query	1	924	January 24, 2023
View data from a time serie in a specific range of time Dashboards influxdb , query , python	3	1025	May 22, 2023
Copy data with SELECT INTO and current time Store	2	4766	November 21, 2019
Reset / shift time InfluxDB 2 influxdb , grafana	1	488	September 28, 2020
No query results with large time range InfluxDB 2 query	8	2853	August 3, 2021

Timeseries overlay for regression testing (or relative timeseries)

Related topics