I’m creating a time series database to track the number of image pulls in our private docker repository.
I was able to import the logs from a PostgreSQL DB to InfluxDB2 using
InfluxDBClient python library.
The schema is very simple. Below is basically how my data point looks like:
p = Point("image_pull") p.tag("image_tag", image_tag) p.time(pull_time) p.field(cnt, 1) # Always 1 and only 1 field
The data was successfully populated. I want to query for how many docker images were pulled for the past 2 days. I tried the following query (place in a file called query.txt):
from (bucket: "test_repos") |> range(start: -2d) |> window(every: 1d) |> filter(fn: (r) => r._measurement == "image_pull" and r._field == "cnt" ) |> sum() |> limit(n:5)
I queried by running:
influx query --file query.txt
What I don’t understand is:
- Why the
influxreturn so many lines despite of the
range(start: -2d)and the
- I was still seeing 1 for
_value:intfield. I’m expecting it to be the total number of image pulls for day.
I’m expecting a result similar to the SQL statement below
SELECT date(timestamp_field), count(*) FROM logs GROUP BY date(timestamp_field)
Thank you in advance