Hi all,
I have 5,300,000 data points, all in the same series, to be loaded into Influxdb. The data points are split into files of 5,000 lines each. I use a bash script that submits a file load via curl every 30 seconds. It takes quite a few hours to run (!) but to each request I get an HTTP response code 204.
When I go into influx and
SELECT COUNT(*) FROM “digital”
I get the count 654,363 NOT 5,300,000
The database has been created with a duration of 36500d as the data I have is old (2011-2017). I am using a precision of ‘s’.
I am running Influxdb OSS 1.7.4 in a docker container with a persisted volume for /var/lib/influxdb.
Can anyone give me any idea where I should look for the problem or any ideas how to fix this?
Thanks,
Stephen
UPDATE:
I have looked at the influxdb logs and there are no lines that look like errors.
What’s your shard duration? For this kind of upload, it’s worth noting that default shard duration of a RP of that length (https://docs.influxdata.com/flux/v0.x/stdlib/universe/top/) would be 7 days. This means that uploading data that is ~7 years old would open up ~350 shard groups…which is probably too many.
For this upload, I’d suggest setting a shard duration of up to 7 years (not less than 1 year, probably) prior to upload.
Well Sam that fixed the issue almost perfectly. I now load most of the data (most means 99.99999%), there are a couple of hundred data points that get skipped in the 32 million. I have no idea why, there does not seem to be any similarity in the ones skipped.