I’d also suggest taking a look at this FAQ entry regarding backfilling sparse data.
For backfilling data, there are a couple is things that need to be adjusted depending on the shape of your data.
-
Range of time - If you are backfilling years of data, you will most likely need to increase the
shard duration
on your retention policy as the default of1w
will end up creating lots of shards. If you do not plan on deleting the data, the larger the duration the better. -
Density - If you have sparse data, for example, stock ticker data with 1 value per day for years, you will also need to increase your
shard duration
to avoid creating lots of small sparse shards. -
Cache Config - Each shard has a cache of recently written points. By default, these are snapshotted to disk after the shard goes cold. The default is
10m
. When backfilling, you frequently end up writing to lot of shards in a short period of time if the defaultshard duration
is used. It’s recommended to lower yourcache-snapshot-write-cold-duration
to10s
during the backfilling so that the shard is snapshotted more quickly once you move to the next.