Hello:
I’m creating a time series database to track the number of image pulls in our private docker repository.
I was able to import the logs from a PostgreSQL DB to InfluxDB2 using InfluxDBClient
python library.
The schema is very simple. Below is basically how my data point looks like:
p = Point("image_pull")
p.tag("image_tag", image_tag)
p.time(pull_time)
p.field(cnt, 1) # Always 1 and only 1 field
The data was successfully populated. I want to query for how many docker images were pulled for the past 2 days. I tried the following query (place in a file called query.txt):
from (bucket: "test_repos")
|> range(start: -2d)
|> window(every: 1d)
|> filter(fn: (r) =>
r._measurement == "image_pull" and
r._field == "cnt"
)
|> sum()
|> limit(n:5)
I queried by running:
influx query --file query.txt
What I don’t understand is:
- Why the
influx
return so many lines despite of therange(start: -2d)
and thelimit()
statement - I was still seeing 1 for
_value:int
field. I’m expecting it to be the total number of image pulls for day.
I’m expecting a result similar to the SQL statement below
SELECT
date(timestamp_field), count(*)
FROM
logs
GROUP BY
date(timestamp_field)
Thank you in advance