I am trying to have multiple measurements in the influxdb database. The tag keys for all measurements are common, only field keys differ. The tag keys are showing properly, but the field keys are getting mixed up. The files are in CSV format and there are too many columns, so I am facing difficulty in finding out some error.
The data is being pushed from kinesis to telegraf, and I am only using one stream. So is it due to the column quantity, that the field keys are getting merged?
Should I use different streams for each format or use different databases?
If the solution would be cost-effective that would be great.
@Pratik_Das_Baghel - It’s hard to give specific advice without a sample of your csv, your telegraf configuration, and an example of what you mean by “mixed up” field keys. I would double check your telegraf config to ensure the CSV parsing is happening as you like. Test it with a smaller version of your big csv. You probably want to keep the number of field keys to a few hundred at most in any measurement for practical ease. I don’t know the maximum number of field keys possible but I believe it is over 10k.
It sounds like you might have multiple CSV formats? If so, double check the configuration for each format.
Let us know what you figure out.
urls = [“http://influxdb:8086”]
database = “omfinal2”
measurement = [“type”]
#all other fields defined
data_format = csv
csv_header_count = 1
csv_tag_columns = [“ver”,“node”,“operatorName”,“eNBName”,“PlmnIdentity”,“cellName”,“eNodeBId”,“cellId”,“type”]
csv_timestamp_column = “datetime”
csv_timestamp_format = “2006-01-02T15:04:05”
Here is my original csv format:
Since I am managing 4 types of file so, I am appending a type column which denotes the file type and also I have merged date and time column to get proper timestamp format. The new csv file content are in proper syntax.
So, I am using 4 types of format - cu_om, cu_plmn_om,cu_rp_om, rp_om.
This is data pushed into kinesis:
I am using processor.converter to convert this ‘type’ column to the measurement name. For 1 file it is working fine in telegraf and influxdb, but for more than 1 files, the field columns are getting mixed up in sense … that my first file contains some columns which are not present in others… but in field keys they are showing in that measurements also, where they should not be present.
Here - Cpuusage and upto PlmnIdentity is not present in rp-om they were present in other file, but its showing here only.
So, we are are thinking of 2 options - either use 4 streams or use tagpass option for tag column type. But i am getting confused that how to use it for 4 types of files, so if you have any idea of using tagpass it would be great.
I think I’m beginning to understand the problem you are describing. I don’t quite yet understand what you mean with
that my first file contains some columns which are not present in others… but in field keys they are showing in that measurements also, where they should not be present. but will think about it.
One thing I did notice is that the CSV data type parser will let you pick the measurement with this option
csv_measurement_column see https://github.com/influxdata/telegraf/tree/master/plugins/parsers/csv. That might be clearer in your config instead of using the converter processor.
You’re not using the
merge option for CSVs, right? I’ve seen surprising behavior with merge.
My understand is that each row of a CSV file will become a separate Metric/Point.
Kinesis operates on “messages” - how many CSV rows are in each of your messages? Is the type (rp_om, etc) all the same within each message? The CSV parser will process a single message as a whole CSV. In this case, if you have mixed types within 1 message, I expect that each metric coming out will share the same tag keys and field keys. This could cause the problem you are seeing. (Your kinesis screenshot wasn’t that helpful to me because I’m not familiar with the kinesis web ui.)
I’m not sure how tagpass filtering will help you as it is a global setting.
I could see that 4 streams and 4 telegraf instances could work but I believe you should be able to get this working with just one telegraf. Double check and let me know how you are putting the 4 types into kinesis messages - a sample/example would be helpful.
@philjb. The problem is solved now thank you.
What is your solution? Glad you solved it.
Used 4 different kinesis streams and modified config file accordingly.
@philjb i have another doubt regarding continuous query format. Can you help me with that? Err - expected field arguments in mean()