I need to extract from an InfluxDB3 Enterprise table with
17 Rows
40 columns
four of which tags
the datetime of the youngest and oldest rows. My expetation was that this would have been extremelly fast being InfluxDB optimized for timeseries. On countrary, I’m experiencing a big trouble since it takes, on my PC, more than four minutes and a lot of memory. One of the queries I am trying on a Jupyter Notebook is the following one
Your data is not an issue at all, you can try to modify the query a bit by adding a WHERE clause such as :
query = f’‘’
SELECT FIRST(bid), LAST(bid)
FROM “{measurement}”
WHERE time >= now() - 30d
‘’’
If your main use case is “give me the latest row(s) very fast,” also consider configuring a Last Values Cache on that table that will optimize the query further.
this may fix the last datetime, provided, but not granthed, that the last datetime in the DB was close to now(); but how about the first datetime of the entire table? We have no clues to limit the research range I guess.
Claude/Antropic and chatGPT gave me a suggestion that seems to work, but I have not found enough mentions in any documentation and this made me a bit uncomfortable; I copy the workaround found here following
Your query looks good as system.parquet_files is a valid, metadata‑based way to get approximate global start/end times very quickly. However there can be recent data in memory/WAL buffer which has not been written to parquet file yet so bear that in mind. Yes, using AI is great but also ask it to explain the answer in more detail, specially SQL query so you know what is going on and you can also advise reading our documentation for further clarity as that is most up to date resource.
Thank you @suyash do you think there is a more robust and still fast way to get te first datetime when we really don’t know how to reduce the investigation range? Secondarly, is there a way to understand if there is pending recent data not yet ingested and in case a way to force InfluxDB to write it and update system parquet files?
Thank you.
PS: in my original post .. it was 17 million rows, not just 17!
I have a problem using that strategy interrogating system.parquet_files; it seems that the parquet metadata files are not uptadate at every new writing so that for instance if I start to write on a new table and 10 minutes later I interrogate the meatadata parquet files the system don’t find them. Is there a way to force the update of the system.parquet_files? alternatevelly how can set and use the Last Values Cache for retrieving quickly the min and max datetime in the table?