Influxdb corrupted measurements created

LeJav · April 3, 2017, 4:51pm

Hello,

I am using telegraf with influxdb.
Everything worked well with influxdb 1.2.0 and telegraf 1.2
After upgrade to influxdb 1.2.1 and telegraf 1.2.1, I started to get a lot of troubles: a lot of measurements are created with strange names, as if there were corruption in data received.
For instance:

show measurements
Ã��rface=eth0
-telegrerface=eth0
value=hema_table_size_index_length
…

I have several hundreds of such measurements in a few days.

I have upgraded influxdb (1.2.2) and telegraf (1.3): same issue

Is it a known problem?
Any idea?

Many thanks for your help

jackzampolin · April 3, 2017, 6:22pm

@LeJav Can you go ahead and open an issue on telegraf with this information?

LeJav · April 3, 2017, 9:46pm

@jackzampolin: the problem is that I am not sure if this is a telegraf problem or an influxdb problem.
You think that this is a telegraf problem?

jackzampolin · April 3, 2017, 10:48pm

@LeJav Sounds to be more likely a telegraf problem. Can you post your configuration here?

LeJav · April 4, 2017, 5:50am

@jackzampolin; I have 2 servers sending data to influxdb. The measurements names which are created show clearly that the problem exists for both servers. Both were working well before update.
Here is the config. for 1:

grep -v -e “^[1]*#” -e “^$” telegraf.config

[global_tags]
[agent]
interval = “20s”
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = “0s”
flush_interval = “20s”
flush_jitter = “0s”
precision = “”
debug = false
quiet = true
logfile = “”
hostname = “”
omit_hostname = false
[[outputs.influxdb]]
urls = [“http://xxxxxxx:8086”]
database = “telegraf” # required
retention_policy = “”
write_consistency = “any”
timeout = “5s”
username = “xxxxxxxxx”
password = “xxxxxxxxxx”
user_agent = “telegraf”
[[inputs.cpu]]
percpu = false
totalcpu = true
collect_cpu_time = true
[[inputs.disk]]
mount_points = [“/rootfs”,“/rootfs/data”]
ignore_fs = [“tmpfs”, “devtmpfs”, “devfs”]
[[inputs.diskio]]
devices = [“vda1”,“vdb1”]
[[inputs.mem]]
[[inputs.swap]]
[[inputs.docker]]
endpoint = “unix:///var/run/docker.sock”
container_names =
timeout = “10s”
perdevice = true
total = false
tagexclude = [“engine_host”, “memory_total”, “unit”, “container_image”, “container_version”]
fielddrop = [“max_usage”, “usage”, “fail_count”, “limit”, “total_pgmafault”, “cache”, “mapped_file”, “total_inactive_file”, “pgpgout”, “rss”, “total_mapped_file”, “writeback”, “unevictable”, “pgpgin”, “total_unevictable”, “pgmajfault”, “total_rss_huge”, “total_writeback”, “total_inactive_anon”, “rss_huge”, “hierarchical_memory_limit”, “total_pgfault”, “total_active_file”, “active_anon”, “total_active_anon”, “total_pgpgout”, “inactive_anon”, “active_file”, “pgfault”, “inactive_file”, “total_pgpgin”, “usage_percent”, “container_id”, “usage_system”, “throttling_periods”, “throttling_throttled_periods”, “throttling_throttled_time”, “memory_total”]
[[inputs.http_response]]
address = “https://xxxxxxxxxx”
response_timeout = “10s”
method = “GET”
follow_redirects = false
[[inputs.http_response]]
address = “https://yyyyyyyyyyy”
response_timeout = “10s”
method = “GET”
follow_redirects = false
[[inputs.mysql]]
servers = [“xxxxxxxx:yyyyyyyyy@tcp(zzzzzzzzz:3306)/”]
perf_events_statements_digest_text_limit = 120
perf_events_statements_limit = 250
perf_events_statements_time_limit = 86400
table_schema_databases = [“xxxxx”,“yyyyyy”]
gather_table_schema = true
gather_process_list = true
gather_user_statistics = false
gather_info_schema_auto_inc = false
gather_innodb_metrics = false
gather_slave_status = false
gather_binary_logs = false
gather_table_io_waits = false
gather_table_lock_waits = false
gather_index_io_waits = false
gather_event_waits = false
gather_file_events_stats = false
gather_perf_events_statements = false
interval_slow = “30m”
namedrop=[“info_schema_table_version”, “mysql_variables”]
taginclude=[“host”,“schema”,“table”,“user”]
fieldpass=[“value”, “aborted_connects”, “busy_time”, “bytes_received”, “bytes_sent”, “connection_errors_accept”, “connection_errors_internal”, “connection_errors_max_connections”, “connection_errors_peer_address”, “connection_errors_select”, “connection_errors_tcpwrap”, “connections”, “empty_queries”, “flush_commands”, “handler_commit”, “handler_delete”, “handler_read_first”, “handler_read_key”, “handler_update”, “handler_write”, “innodb_available_undo_logs”, “innodb_buffer_pool_pages_total”, “innodb_buffer_pool_read_requests”, “innodb_buffer_pool_reads”, “innodb_buffer_pool_write_requests”, “innodb_data_read”, “innodb_data_reads”, “innodb_data_writes”, “innodb_data_written”, “innodb_log_waits”, “innodb_log_write_requests”, “innodb_log_writes”, “innodb_num_open_files”, “innodb_num_pages_page_compressed”, “innodb_row_lock_current_waits”, “innodb_row_lock_time”, “innodb_row_lock_time_avg”, “innodb_row_lock_time_max”, “innodb_row_lock_waits”, “innodb_rows_deleted”, “innodb_rows_inserted”, “innodb_rows_read”, “innodb_rows_updated”, “max_statement_time_exceeded”, “max_used_connections”, “memory_used”, “open_files”, “open_tables”, “queries”, “rows_read”, “slow_launch_threads”, “slow_queries”, “threads_connected”, “threads_running”, “threads_altering_table”, “threads_executing”, “threads_idle”, “connections”]
[[inputs.nginx]]
urls = [“http://xxxxxxx/yyyyyyyyyy”]
[[inputs.logparser]]
files = [“/data/telegraf_logs/nginx/nginx.log”]
from_beginning = false
[inputs.logparser.grok]
patterns = [“^%{TIMESTAMP} %{DATA:nginx_host:tag} %{DATA:user_agent:tag} %{DATA:username:tag} %{DATA:method:tag} %{INT:status:tag} %{INT:request_len:int} %{INT:response_len:int} (?:%{NUMBER:gzip_ratio:float}|-) %{NUMBER:req_time:float} (?:%{NUMBER:upstream_time:float}|-) %{DATA:server_name:tag} %{DATA:website:tag}$”]
measurement = “nginx_logs”
custom_patterns = ‘’’
TIMESTAMP [%{HTTPDATE:ts:ts-httpd}]
‘’’

↩︎

LeJav · April 4, 2017, 6:00am

One more note: for the config I have sent, telegraf and influxdb are running in 2 docker containers on the same host, so no network issue for them.

LeJav · April 4, 2017, 12:48pm

Another comment: when I say that I started to get troubles with telegraf 1.2.1, I am not sure that this was not already 1.3, because I generate telegraf from sources.
Maybe this could be related to
#2251: InfluxDB output: use own client for improved through-put and less allocations.
?

jackzampolin · April 4, 2017, 4:58pm

I’m a little at a loss here. Maybe @daniel has some ideas?

daniel · April 5, 2017, 12:27am

I don’t know of any reports like this, can you verify that the problem remains with the official 1.2.1 package?

LeJav · April 5, 2017, 6:09am

I will try and keep you informed

LeJav · April 6, 2017, 7:30am

I have downgraded telegraf to 1.2.1 and now everything is OK. No more bad measurement created.
Note that I have built telegraf from source, with “git checkout -q --detach 1.2.1” before make.

$ telegraf version
Telegraf v1.2.1 (git: HEAD 3b6ffb3)

LeJav · April 26, 2017, 10:03am

Do you want that I test again with the latest git repo?

LeJav · November 14, 2017, 7:20pm

Hello,

I have updated my telegraf release to stable 1.4.4.
I have used https://dl.influxdata.com/telegraf/releases/telegraf_1.4.4-1_amd64.deb
And I have exactly the same issue.
A lot of measurements have been created…
Here is an extract:

show measurements
name: measurements
name

-net-prerface=eth0
-prd1
-telegraf-prd1
0000000erface=eth0
0000000hema_table_size_data_length
0000000hema_table_size_index_length
0641740erface=eth0
0677680erface=eth0
1
1057846erface=eth0
1058698t=prod1.azuneed.com
1062954erface=eth0
1181000erface=eth0
13857.9erface=eth0
1510633erface=eth0
1510671erface=eth0
4008000erface=eth0
4360000erface=eth0
5106416erface=eth0
…

Nobody has got this issue?

influxdb is stable 1.3.5

Thanks for your help!

daniel · November 15, 2017, 10:22pm

Can you open a github issue for this on the Telegraf github? Can you also mention in the issue how long it takes when you start with an empty database for these corrupt measurements to appear.

LeJav · November 16, 2017, 7:13pm

I will submit an issue.
I could investigate more, and I have identified that the problem is related to nginx.
I access influxdb through a nginx reverse proxy.
Each time that a measurement is created with bad data, I have a 400 error code for the request in nginx log.
I will try to investigate more why nginx is returning 400 code.

daniel · November 16, 2017, 11:50pm

Please take a look at this issue which I thought was fixed in 1.4.2. Maybe we need to reopen the issue?

LeJav · November 17, 2017, 6:03pm

Hi Daniel,

Many thanks for the suggestion: it is exactly the same issue.
I have left a comment on Metric corruption when using http2 nginx reverse proxy with influxdb output · Issue #2854 · influxdata/telegraf · GitHub
I will try the “content-encoding=gzip” parameter.

Topic		Replies	Views
Drop corrupted measurements influxdb	2	3133	December 18, 2018
Telegraf don't generate measurement Telegraf influxdb , telegraf	14	2240	March 30, 2021
Corrupted Measurement Telegraf influxdb	5	1707	March 5, 2020
The chain from Telegraf (influxdb_v2) to InfluxDB (cloud) seems to silently drop measurements that have non-ascii in name or tags influxdb , telegraf	0	437	November 22, 2020
Telegraf not reading variables on /etc/defaut/telegraf InfluxDB 2	2	248	June 13, 2023

Influxdb corrupted measurements created

grep -v -e “[1]*#” -e “^$” telegraf.config

Related topics

grep -v -e “^[1]*#” -e “^$” telegraf.config