Hi there, I need to insert millions of rows (about 500.000.000) into influxdb (1.7.9).
I have 5 kind of measurements taken over 10 years, sampled every seconds or every ten seconds, thus I need to bulk import the raw files that I prepared with inline protocol format.
I tried the telegraf plugin input.csv, which seemed slow, so I tried input.influx with inline protocol format which seems better but still too slow. (In fact, my laptop keeps freezing (because of almost 100% CPUs and memory usage) and I constantly need to reboot it to be able to continue working.)
I am now trying to input the data with input.tail, which still freezes my laptop, however, I think data are imported faster.
Can anybody give me some advice on how to improve the import and make it stable to stop it freezing? I highly appreciate any advice for better practice of importing these billions of metrics in hundred of files. Thanks a lot in advance!
Here is my telegraf-input-env.conf:
# Telegraf Configuration
[global_tags]
# dc = "us-east-1" # will tag all metrics with dc=us-east-1
# rack = "1a"
## Environment variables can be used as tags, and throughout the config file
# user = "$USER"
# Configuration for telegraf agent
[agent]
interval = "60s"
round_interval = true
metric_batch_size = 5000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "30s"
flush_jitter = "60s"
precision = ""
debug = true
quiet = false
logfile = ""
hostname = ""
omit_hostname = true
###############################################################################
# OUTPUT PLUGINS #
###############################################################################
# Configuration for sending metrics to InfluxDB
[[outputs.influxdb]]
database = "db_name"
# retention_policy = ""
# write_consistency = "any"
# timeout = "5s"
# username = "telegraf"
# password = "metricsmetricsmetricsmetrics"
# user_agent = "telegraf"
# udp_payload = "512B"
# tls_ca = "/etc/telegraf/ca.pem"
# tls_cert = "/etc/telegraf/cert.pem"
# tls_key = "/etc/telegraf/key.pem"
# insecure_skip_verify = false
# http_proxy = "http://corporate.proxy:3128"
# http_headers = {"X-Special-Header" = "Special-Value"}
# content_encoding = "identity"
# influx_uint_support = false
###############################################################################
# INPUT PLUGINS #
###############################################################################
# Stream a log file, like the tail -f command
[[inputs.tail]]
## files to tail.
## These accept standard unix glob matching rules, but with the addition of
## ** as a "super asterisk". ie:
## "/var/log/**.log" -> recursively find all .log files in /var/log
## "/var/log/*/*.log" -> find all .log files with a parent dir in /var/log
## "/var/log/apache.log" -> just tail the apache log file
##
## See https://github.com/gobwas/glob for more examples
##
files = ["../../data/prepared_data/*.dat"]
## Read file from beginning. Default false
from_beginning = true
## Whether file is a named pipe
pipe = false
## Method used to watch for file updates. Can be either "inotify" or "poll".
watch_method = "poll"
## Data format to consume.
## Each data format has its own unique set of configuration options, read
## more about them here:
## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
data_format = "influx"
and here is a sample of how my data look like:
environmental temp=-2.9,air=912.81,prec=459.12,datetime=“2011-11-10 00:00:10” 1389347210000000000
environmental prec=0.0,datetime=“2011-11-10 00:00:10” 1229347210000000000
environmental temp=-1.29,air=929.8,prec=0.0,datetime=“2011-11-10 00:00:20” 1189347220000000000
environmental temp=-0.23,air=219.8,prec=0.0,datetime=“2011-11-10 00:00:30” 1489347230000000000