Influxdb corrupted measurements created

Hello,

I am using telegraf with influxdb.
Everything worked well with influxdb 1.2.0 and telegraf 1.2
After upgrade to influxdb 1.2.1 and telegraf 1.2.1, I started to get a lot of troubles: a lot of measurements are created with strange names, as if there were corruption in data received.
For instance:

show measurements
�����rface=eth0
-telegrerface=eth0
value=hema_table_size_index_length

I have several hundreds of such measurements in a few days.

I have upgraded influxdb (1.2.2) and telegraf (1.3): same issue

Is it a known problem?
Any idea?

Many thanks for your help

@LeJav Can you go ahead and open an issue on telegraf with this information?

@jackzampolin: the problem is that I am not sure if this is a telegraf problem or an influxdb problem.
You think that this is a telegraf problem?

@LeJav Sounds to be more likely a telegraf problem. Can you post your configuration here?

@jackzampolin; I have 2 servers sending data to influxdb. The measurements names which are created show clearly that the problem exists for both servers. Both were working well before update.
Here is the config. for 1:

grep -v -e “^[ ]*#” -e “^$” telegraf.config

[global_tags]
[agent]
interval = "20s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "20s"
flush_jitter = "0s"
precision = ""
debug = false
quiet = true
logfile = ""
hostname = ""
omit_hostname = false
[[outputs.influxdb]]
urls = [“http://xxxxxxx:8086”]
database = “telegraf” # required
retention_policy = ""
write_consistency = “any"
timeout = “5s"
username = “xxxxxxxxx"
password = “xxxxxxxxxx"
user_agent = “telegraf”
[[inputs.cpu]]
percpu = false
totalcpu = true
collect_cpu_time = true
[[inputs.disk]]
mount_points = [”/rootfs”,”/rootfs/data”]
ignore_fs = [“tmpfs”, “devtmpfs”, “devfs”]
[[inputs.diskio]]
devices = [“vda1”,“vdb1”]
[[inputs.mem]]
[[inputs.swap]]
[[inputs.docker]]
endpoint = "unix:///var/run/docker.sock"
container_names = []
timeout = "10s"
perdevice = true
total = false
tagexclude = [“engine_host”, “memory_total”, “unit”, “container_image”, “container_version”]
fielddrop = [“max_usage”, “usage”, “fail_count”, “limit”, “total_pgmafault”, “cache”, “mapped_file”, “total_inactive_file”, “pgpgout”, “rss”, “total_mapped_file”, “writeback”, “unevictable”, “pgpgin”, “total_unevictable”, “pgmajfault”, “total_rss_huge”, “total_writeback”, “total_inactive_anon”, “rss_huge”, “hierarchical_memory_limit”, “total_pgfault”, “total_active_file”, “active_anon”, “total_active_anon”, “total_pgpgout”, “inactive_anon”, “active_file”, “pgfault”, “inactive_file”, “total_pgpgin”, “usage_percent”, “container_id”, “usage_system”, “throttling_periods”, “throttling_throttled_periods”, “throttling_throttled_time”, “memory_total”]
[[inputs.http_response]]
address = "https://xxxxxxxxxx"
response_timeout = "10s"
method = "GET"
follow_redirects = false
[[inputs.http_response]]
address = “https://yyyyyyyyyyy"
response_timeout = “10s"
method = “GET"
follow_redirects = false
[[inputs.mysql]]
servers = [“xxxxxxxx:yyyyyyyyy@tcp(zzzzzzzzz:3306)/”]
perf_events_statements_digest_text_limit = 120
perf_events_statements_limit = 250
perf_events_statements_time_limit = 86400
table_schema_databases = [“xxxxx”,“yyyyyy”]
gather_table_schema = true
gather_process_list = true
gather_user_statistics = false
gather_info_schema_auto_inc = false
gather_innodb_metrics = false
gather_slave_status = false
gather_binary_logs = false
gather_table_io_waits = false
gather_table_lock_waits = false
gather_index_io_waits = false
gather_event_waits = false
gather_file_events_stats = false
gather_perf_events_statements = false
interval_slow = “30m"
namedrop=[“info_schema_table_version”, “mysql_variables”]
taginclude=[“host”,“schema”,“table”,“user”]
fieldpass=[“value”, “aborted_connects”, “busy_time”, “bytes_received”, “bytes_sent”, “connection_errors_accept”, “connection_errors_internal”, “connection_errors_max_connections”, “connection_errors_peer_address”, “connection_errors_select”, “connection_errors_tcpwrap”, “connections”, “empty_queries”, “flush_commands”, “handler_commit”, “handler_delete”, “handler_read_first”, “handler_read_key”, “handler_update”, “handler_write”, “innodb_available_undo_logs”, “innodb_buffer_pool_pages_total”, “innodb_buffer_pool_read_requests”, “innodb_buffer_pool_reads”, “innodb_buffer_pool_write_requests”, “innodb_data_read”, “innodb_data_reads”, “innodb_data_writes”, “innodb_data_written”, “innodb_log_waits”, “innodb_log_write_requests”, “innodb_log_writes”, “innodb_num_open_files”, “innodb_num_pages_page_compressed”, “innodb_row_lock_current_waits”, “innodb_row_lock_time”, “innodb_row_lock_time_avg”, “innodb_row_lock_time_max”, “innodb_row_lock_waits”, “innodb_rows_deleted”, “innodb_rows_inserted”, “innodb_rows_read”, “innodb_rows_updated”, “max_statement_time_exceeded”, “max_used_connections”, “memory_used”, “open_files”, “open_tables”, “queries”, “rows_read”, “slow_launch_threads”, “slow_queries”, “threads_connected”, “threads_running”, “threads_altering_table”, “threads_executing”, “threads_idle”, “connections”]
[[inputs.nginx]]
urls = [“http://xxxxxxx/yyyyyyyyyy”]
[[inputs.logparser]]
files = [”/data/telegraf_logs/nginx/nginx.log”]
from_beginning = false
[inputs.logparser.grok]
patterns = [”^%{TIMESTAMP} %{DATA:nginx_host:tag} %{DATA:user_agent:tag} %{DATA:username:tag} %{DATA:method:tag} %{INT:status:tag} %{INT:request_len:int} %{INT:response_len:int} (?:%{NUMBER:gzip_ratio:float}|-) %{NUMBER:req_time:float} (?:%{NUMBER:upstream_time:float}|-) %{DATA:server_name:tag} %{DATA:website:tag}$”]
measurement = "nginx_logs"
custom_patterns = ‘’‘
TIMESTAMP [%{HTTPDATE:ts:ts-httpd}]
’’’

One more note: for the config I have sent, telegraf and influxdb are running in 2 docker containers on the same host, so no network issue for them.

Another comment: when I say that I started to get troubles with telegraf 1.2.1, I am not sure that this was not already 1.3, because I generate telegraf from sources.
Maybe this could be related to
#2251: InfluxDB output: use own client for improved through-put and less allocations.
?

I’m a little at a loss here. Maybe @daniel has some ideas?

I don’t know of any reports like this, can you verify that the problem remains with the official 1.2.1 package?

I will try and keep you informed

I have downgraded telegraf to 1.2.1 and now everything is OK. No more bad measurement created.
Note that I have built telegraf from source, with “git checkout -q --detach 1.2.1” before make.

$ telegraf version
Telegraf v1.2.1 (git: HEAD 3b6ffb3)

1 Like

Do you want that I test again with the latest git repo?

Hello,

I have updated my telegraf release to stable 1.4.4.
I have used https://dl.influxdata.com/telegraf/releases/telegraf_1.4.4-1_amd64.deb
And I have exactly the same issue.
A lot of measurements have been created…
Here is an extract:

show measurements
name: measurements
name


-net-prerface=eth0
-prd1
-telegraf-prd1
0000000erface=eth0
0000000hema_table_size_data_length
0000000hema_table_size_index_length
0641740erface=eth0
0677680erface=eth0
1
1057846erface=eth0
1058698t=prod1.azuneed.com
1062954erface=eth0
1181000erface=eth0
13857.9erface=eth0
1510633erface=eth0
1510671erface=eth0
4008000erface=eth0
4360000erface=eth0
5106416erface=eth0

Nobody has got this issue?

influxdb is stable 1.3.5

Thanks for your help!

Can you open a github issue for this on the Telegraf github? Can you also mention in the issue how long it takes when you start with an empty database for these corrupt measurements to appear.

I will submit an issue.
I could investigate more, and I have identified that the problem is related to nginx.
I access influxdb through a nginx reverse proxy.
Each time that a measurement is created with bad data, I have a 400 error code for the request in nginx log.
I will try to investigate more why nginx is returning 400 code.

Please take a look at this issue which I thought was fixed in 1.4.2. Maybe we need to reopen the issue?

Hi Daniel,

Many thanks for the suggestion: it is exactly the same issue.
I have left a comment on https://github.com/influxdata/telegraf/issues/2854
I will try the “content-encoding=gzip” parameter.