Timeseries overlay for regression testing (or relative timeseries)

The current implementation of InfluxDB is excellent at handling timeseries data provided it’s consistently moving forward. Yes you can write in the past, but the point here is that you query for data in a specific time range to visualize what happens over that specific range.

In the case of regression tests, you may want to overlay the results from a variety of iterations over the same time range. The problem with this currently is that each run would have to have the exact same time slice for each iteration to overlay properly.

The workaround for this at the moment is to artificially start your time at a known zero endpoint (say timestamp == 0000000000000000000) and then overlays would work, in theory. But this isn’t necessarily a clean or effective method of overlaying information. The question then becomes:

  1. Can we look to add functionality in the future that would provide relative timestamping, or shifting, similar to what tools like splunk would do? This would allow any set of runs to be graphed on top of each other to show quick diffs
  2. Is writing at an artificial timestamp of zero the best way to do this with current toolset?

This conversation starter (hopefully) is intended to discuss the tools within influxdb itself. We can definitely do this by pulling the data into pandas with influxdb-python, as an example, but is there a way we co do this natively?

I think the only way to do this natively for the moment is to use timestamps starting at zero. What I’m thinking here is something like:

test,iteration=1 val=233 0

So you have separate series for every iteration. So you’d get the data through:

SELECT val from test GROUP BY iteration

However, that could have performance problems when you have many shards and the DB must quickly query all of them to pull back the data. You’d have to do some testing to see.

I’ve seen this request for a few different use cases so it certainly makes sense to add something like this in. The new query language will have functions to transform series like doing interpolation or normalizing the time stamps.

Does every iteration always have the same number of data points in it? Can you show an example schema of what the data looks like? That’ll help me design for the future :slight_smile:

1 Like

Thanks for the reply @pauldix!

I’m thinking of cases like regression testing for new hardware components, or new firmware on a baseline machine. In most cases samples are taken over time at (mostly) regular intervals; they don’t need to be exactly inline as I’d expect them to be standard timeseries measures just over different ranges. Ideally we wouldn’t use something like timeshift, that’s kinda wonky. Starting from time=0 seems the most logical step for now, but would be cool to think about this as a possible additional feature for v2.0.

One opensource tool that’s somewhat interesting for this is the phoronix-test suite, or even more simple things like a standardized linux perf tests over time. Not only would we be able to correlate test data, but we could overlay standard telegraf status like CPU or mem, or anything else to help tell the story.

This is purely a pony feature, so definitely not required for the next phase…but would be cool if we could incorporate it :smiley:

1 Like

@sebito91 does this request describe what you’re asking for?

@pauldix wrote:

Does every iteration always have the same number of data points in it?

Every iteration will be roughly the same time duration – about 10 minutes for us.
I can imagine that some tests would be a few seconds shorter/longer. Shutdown time for a load test might vary slightly each jmeter test.

We plan on using this functionality to push JMeter load test data to influxdb:

We will run nightly tests, about 10 minutes each. Then we’d like to compare the results of recent load tests.
Or, we might compare selected load tests from the last 5 years of data to see whether performance of our web applications (as seen thru jmeter data) has improved or degraded.

Hesitantly, we might use this as a temporary solution:

If we’re comparing data (say response time) from 4 different tests, we’ll create one query for each of the 4 tests as described here:

…and then we’ll configure all four queries to render in a single grafana graph.

You could use influxdb-timeshift-proxy to evaluate options for implementing a native influxdb feature.


when the timeshift function will be officially available?

the timeshift roadmap is key to deciding to implement my business solution in graphite or influxdb