The csv files have one file per device and channel. So, each device has up to 9 files, one for each channel of data (temperature, battery, x, y, z, a, b, c, d).
I ran a test yesterday using multiple instances of [[inputs.tail]]. Along the lines you suggested, I “generalized” the measurement name. Each [[inputs.tail]] used a unique name_override. I included an example of the config file below (note there are multiple [[inputs.tail]] sections, for each channel type, not shown to keep the size of this post shorter).
I have two processors, a regex processor to parse the path and extract the uid, channel and gateway and create a tag with the metadata. I also have an enum processor to convert the channel.
Finally a simple output to influx.
I ran a test with about a dozen csv files. Nine of the csv files were about 230MB each (about 10 millions points each). I had top running (top -h) and I noticed that when telegraf was done most of the memory was used and did not free. It appears to me that telegraf consumed about 2GB of memory. When I restarted telegraf, the memory freed up. Could this be due to a memory leak somewhere?
I have a question about the file and tail plugins. Is there a way to tell when it has finished processing a file? I would like to get rid of the csv file after telegraf has processed the data.
I considered the http_listener_v2 plugin but there is a 500MB limit to what you can send. It is rare but possible that a file I send could exceed that size.
Regarding the regex processor, my regex has three named subgroups. At the moment I use three tags processors so the regex runs three times. I have not tried this yet, can you have multiple “replacement” and “result_key” parameters in the same regex processor? This would avoid having to run the regex three times. For example:
[[processors.regex.tags]]
key = “path”
pattern = “^/motescan/(?P<gateway>scannet[0-9]+)/csv/l(?P<uid>[0-9A-Fa-f]+)[0-9]+\.csv_channel(?P<channel>[A-Za-z]?[A-Za-z0-9]+)\.csv$”
replacement = “${gateway}”
result_key = “gateway”
replacement = “${uid}”
result_key = “uid”
replacement = “${channel}”
result_key = “channel”
telegraf.conf
[[inputs.tail]]
files = [“/motescan/scannet03/csv/l*.csv_channel_z.csv”]
name_override = “witap.payload.data.acceleration”
from_beginning = false
pipe = false
watch_method = “inotify”
data_format = “grok”
grok_patterns = [“^%{NUMBER:timestamp:ts-epoch},%{NUMBER:z:float}$”]
[inputs.tail.tags]
applicationName = “telegraf-test-csv”
device_type = “mote”
sensor_type = “acceleration”
country = “ca”
city = “edmonton”
organization = “scanimetrics”
[[processors.regex]]
namepass = “witap.payload.data.*”
order = 1
[[processors.regex.tags]]
key = “path”
pattern = “^/motescan/(?P<gateway>scannet[0-9]+)/csv/l(?P<uid>[0-9A-Fa-f]+)[0-9]+\.csv_channel(?P<channel>[A-Za-z]?[A-Za-z0-9]+)\.csv$”
replacement = “${gateway}”
result_key = “gateway”
[[processors.regex.tags]]
key = “path”
pattern = “^/motescan/(?P<gateway>scannet[0-9]+)/csv/l(?P<uid>[0-9A-Fa-f]+)[0-9]+\.csv_channel(?P<channel>[A-Za-z]?[A-Za-z0-9]+)\.csv$”
replacement = “${uid}”
result_key = “uid”
[[processors.regex.tags]]
key = “path”
pattern = “^/motescan/(?P<gateway>scannet[0-9]+)/csv/l(?P<uid>[0-9A-Fa-f]+)[0-9]+\.csv_channel(?P<channel>[A-Za-z]?[A-Za-z0-9]+)\.csv$”
replacement = “${channel}”
result_key = “channel”
[[processors.enum]]
namepass = “witap.payload.data.*”
order = 2
[[processors.enum.mapping]]
field = “channel”
[processors.enum.mapping.value_mappings]
batt = “battery”
temp = “temperature”
x = “acceleration_x”
y = “acceleration_y”
z = “acceleration_z”
a = “strain_a”
b = “strain_b”
c = “strain_c”
d = “strain_d”
[[outputs.influxdb]]
namepass = “witap.payload.data.*”
urls = [“http://x.x.x.x:8086”] # required
database = “P100” # required
retention_policy = “”
write_consistency = “any”
timeout = “5s”
[outputs.influxdb.tagpass]
gateway = [ “scannet03” ]