Migration from CSV to InfluxDB using influxdb-python

#1

System info: InfluxDB version 1.6
Operating system: CentOS Linux release 7.3
CSV file (37GB) and InfluxDB server reside on a 16 core, 32GB RAM VM

Steps to reproduce:

  1. My schema consists of 50 measurements with 1000 different tag values (so 50K unique series) and field values contain 1 float, 3 int and 1 timestamp value. The timestamp values are in microsecond (‘u’) precision. I have adhered to all the recommendations given to migrate historical data. The retention policy for the DB is 3years, the shard duration is 52w and I have changed the cache-snapshot-write-cold-duration to 10s.
  2. I have used the influxdb-python client to pick up points from the CSV and use the default JSON format to call the write_points API with 6000 points in the JSON body.

Expected behavior: The migration of 37GB data from a CSV file should be fairly fast.

Actual behavior: I have been running the script since 18 hours and the migration is still going on. If I check the physical location where influxdb stores the data, it shows that it has transferred only 4GB till now. I know there should not be a correlation between the size of the CSV file and the data stored in the physical location of influxdb server but I have no other way to gauge the progress.

[EDIT] The physical location of influx looks like this after 19 hours (asup_metrics being the DB name and historical_data is the retention policy name):
700K ./_internal/monitor/9
856K ./_internal/monitor/10
640K ./_internal/monitor/7
8.6M ./_internal/monitor/11
684K ./_internal/monitor/8
3.6M ./_internal/monitor/103
552K ./_internal/monitor/5
600K ./_internal/monitor/6
17M ./_internal/monitor
28K ./_internal/_series/01
32K ./_internal/_series/05
28K ./_internal/_series/06
24K ./_internal/_series/02
28K ./_internal/_series/00
24K ./_internal/_series/07
28K ./_internal/_series/03
28K ./_internal/_series/04
224K ./_internal/_series
17M ./_internal
58M ./asup_metrics/historical_data/130
59M ./asup_metrics/historical_data/131
4.4G ./asup_metrics/historical_data/129
4.6G ./asup_metrics/historical_data
312K ./asup_metrics/_series/01
312K ./asup_metrics/_series/05
312K ./asup_metrics/_series/06
312K ./asup_metrics/_series/02
316K ./asup_metrics/_series/00
312K ./asup_metrics/_series/07
316K ./asup_metrics/_series/03
308K ./asup_metrics/_series/04
2.5M ./asup_metrics/_series
4.6G ./asup_metrics
4.6G .

Could anyone kindly point out where I might be going wrong?

#2

The size of the resultant InfluxDB could be much less than your CSV file. The CSV file is all ASCII characters, and InfluxDB should store the datetime and data values in much-more compact data formats. So you might actually be much farther along than you think.

Are both the CSV file and the InfluxDB resident on the same server? If not, and you have network delays on your reads and writes, that will slow things way down.

#3

Both the InfluxDB and the CSV file are on the same server. Is there anyway I can gauge the progress of the migration because 17 hours for a 37GB file is way too much I guess. My aim was to ultimately transfer 1.3TB of data but I do not think that is possible.

#4

Maybe someone else can help. I am very new to InfluxDB, so I only had generalities to contribute.