Flux - data aggregation in local time zone

I am working with meteorological data and need to produce local statistics. For example, determining the low and high temperature within the local day. I can use flux to do this for a single local day, but I have not found the flux solution for processing months of data in a single operation.

It is easy to process months of data by aggregating the data in UTC time, but how can I offset this default aggregation to my local day?

I have read countless postings and the docs and I cannot find a clear solution for this seemingly simple problem.

Thanks!

Everything is in UTC time. That is normal everywhere whether it is databases, languages or even system times. When you see local time, that is just an adjustment for the UI or API. MySQL, Java, PHP, Linux system calls, etc all use UTC unless you specify a timezone. Speaking from experience, you really do want to work within UTC until you need to present it to users. Communication between systems is much much easier. The time for which something occurred is always going to be the same, just your perspective of what day/time of day it was in your part of the Earth is what changed. Many web projects now base their local time on your browser’s reported timezone. That means the information is always UTC in the database. Usually a JS library like moment is used to adjust these values in the web browser. So your queries in the database, should be geo constraining, I am guessing, and not worried about timezone so much. Leave that to the UI layer.

I totally agree with you. I have no problem working with UTC and have for years. Many meteorological products are provided in UTC times.

My challenge is not to escape using UTC, but rather knowing how to run daily queries (local time) with flux to gather specific statistics. In short, UTC times need to be offset.

Consider this from NOAA for one metric; daily maximum temperature: “Highest temperature in °F during the 24hrs from midnight to midnight standard time.” This definition is quite typical and the basis for several climatological measures.

So, working solely in UTC is fine, but I can’t simply aggregate daily data on UTC time zone days (midnight to midnight UTC) based on the definitions above. This I can do, but the metrics do not conform the NOAA definition.

As I mentioned, I can achieve my goal for a single day, but I cannot figure out how to get flux to shift UTC for the daily aggregates when I try to process months of data (data acquired every minute) on the local day to produce daily statistics. This is where I could really use some help.

Thanks!

OK I think I know what you mean. I see the documentation has it planned but that it isn’t implemented in flux yet. They recommend you use influxQL at the moment. Honestly, I would do this programmatically myself until it is ready. I would make it a task in Spring Boot or a job in K8 and save the calculation each night and run it once a day per target. You know the offset to use for standard time and the query will be the same for every target. A little metadata database for all of the tasks that need to be run nightly would be required. Then roll it up the daily calculations for monthly queries whenever needed.

This link https://docs.influxdata.com/influxdb/v2.0/reference/flux/language/options/ takes you to this link
https://github.com/influxdata/flux/issues/406

So it looks like you are stuck putting some glue code together outside of FLUX.

Thanks for replying back. Given that the flux language can almost do the entire job very efficiently, I was hoping to avoid writing code to pull out data to calculate the statistics and then insert it back into InfluxDB. This is not a difficult chore.

This lack of this very simple time functionality in flux discourages the use of InfluxDB for a much broader class of IoT projects that must work on or provide metrics on local/standard time. Even I am beginning to wonder if I should start looking at other solutions if I have to start writing code for some simple tasks. This is not about being lazy, but it is just a practical consideration of the overall ROI given the current state of platform maturity. It is truly a disappointing thought given that the InfluxDB platform has so incredibly much offer (including the power of the flux language).

I see that this issue has been around since 2018 and there are numerous postings for people having similar time-related challenges. Can someone please provide some insight as to where a fix for this problem is in terms of priority or even a timeline?

Thanks!

Can’t you just run the task with the offset computed in the flux script before doing the query? It makes it less readable but it isn’t a complete show stopper right? You would need to tailor the query per timezone anyway right? I list of variables to run through possibly. I am still getting up to speed on flux myself.

If there was a way to iterate what worked for a single day, I would have used that. Iterating over a block of code does not appear to be possible.

I looked at the various flux functions that could be used to offset time and ran quite a number of tests using each. The only way I found to reliably adjust UTC to local time was to use the timeShift function. Then I leveraged the aggregateWindow function with a 24-hour duration. This approach appears to work.

The downside is that the timeShift duration is fixed, and is not tied automatically to NTP. A simple built-in function to adjust for time zone is missing and it would make this time adjustment incredibly straight-forward. Deployment of sensors to multiple time zones is a bit more cumbersome.

@dcaho, I fully agree, just recently was trying to get some time based daily / monthly stats in local time and:

  • InfluxQL has no monthly / yearly windows
  • Flux has no local time
    So none of them seems to be complete to do time based analysis (unless I miss something) which looks very surprising to me knowing that we are talking about a time series database.

I ended up the same - using timeShift function before window but this is not really accurate, especially that the historical query results will change with every DST change.

@MichalR has observed the same issues I am finding.

After considerable digging into the details since my last posting, I find that Flux does not accurately do the job. The ONLY way I have been able to obtain accurate, consistent, and predictable results is to leverage the API from my own code (back to @kramik1’s original suggestion).

The handling of time within Flux has a lot of issues. What I have seen:

  • Windows (including aggregate) cover their full duration regardless of what you might exclude. For example, I wanted to skip the first 6 days of a month within a Window duration of 1 month. This is not possible. Window duration is absolute and overrides other filters.

  • The time in the InfluxDB do not match the submitted epoch times being added to the data point. There is always a difference of at least a few seconds. For example, if you try to submit the epoch for 2021-01-01T23:59:59 UTC, what is logged in InfuxDB is 2021-01-02T00:00:00 UTC. Unless I back off several seconds, the logged time is always to the next day. This should be precise.

@dcaho thank’s for this eye opening info on issues with time windows in Flux. Everything works fine up to 1h and then it’s actually impossible to use aggregateWindow. I have only worked with InfluxDB 2.0 for two month’s and love the approach to timeseries with the Flux language, but this is very annoying.

However I have today decided to address the issue by adding a “local_day” field with text values e.g. “2021-05-07” for timeseries that meaningfully can be aggregated above the hour level and then using …|> group(columns: [“local_day”]) |> sum()/count()/mean()/… instead of aggregateWindow. The timeseries are sampled in seconds, but will be pre-aggregated to hour before this attempt.

Besides this only solves for my own local timezone - is there another approach, that is more agile? I have not found a function in Flux, that can convert UTC to another timezone (why?) - that would be the most simple solution for a workaround.