Write failed for shard #### field type conflict service=write

artburkart · August 3, 2017, 6:01pm

Hi, I’m running into an issue with my influxdb server where I get this output:

[I] 2017-08-03T17:39:58Z Reloaded WAL cache /var/lib/influxdb/wal/simplisafe/autogen/6663 in 1.212740854s engine=tsm1
[I] 2017-08-03T17:39:58Z write failed for shard 6663: [shard 6663] field type conflict service=write
[I] 2017-08-03T17:39:58Z tsm1 WAL starting with 10485760 segment size engine=tsm1 service=wal
[I] 2017-08-03T17:39:58Z tsm1 WAL writing to /var/lib/influxdb/wal/simplisafe/autogen/6663 engine=tsm1 service=wal
[I] 2017-08-03T17:39:58Z /var/lib/influxdb/data/simplisafe/autogen/6663/000001744-000000007.tsm (#1) opened in 11.438441ms engine=tsm1 service=filestore
[I] 2017-08-03T17:39:58Z /var/lib/influxdb/data/simplisafe/autogen/6663/000002224-000000005.tsm (#2) opened in 33.907891ms engine=tsm1 service=filestore
[I] 2017-08-03T17:39:58Z /var/lib/influxdb/data/simplisafe/autogen/6663/000002228-000000003.tsm (#3) opened in 38.188929ms engine=tsm1 service=filestore
[I] 2017-08-03T17:39:58Z /var/lib/influxdb/data/simplisafe/autogen/6663/000001744-000000006.tsm (#0) opened in 40.400569ms engine=tsm1 service=filestore
[I] 2017-08-03T17:39:58Z reading file /var/lib/influxdb/wal/simplisafe/autogen/6663/_08910.wal, size 10488091 engine=tsm1 service=cacheloader
[I] 2017-08-03T17:39:58Z reading file /var/lib/influxdb/wal/simplisafe/autogen/6663/_08911.wal, size 10487624 engine=tsm1 service=cacheloader
[I] 2017-08-03T17:39:59Z reading file /var/lib/influxdb/wal/simplisafe/autogen/6663/_08912.wal, size 105747 engine=tsm1 service=cacheloader
[I] 2017-08-03T17:39:59Z reading file /var/lib/influxdb/wal/simplisafe/autogen/6663/_09725.wal, size 0 engine=tsm1 service=cacheloader

The only thing I’ve figured out to get the server back up and running is to delete the shard from the file system. This works, except this is the second time in the past two days that I’ve had to do this for the same shard. Is there any way for me to repair or inspect the shard or prevent whichever write is causing it to fail to stop happening? Any advice would be much appreciated.

Cheers

artburkart · August 3, 2017, 7:25pm

Here is a crash log. I think there might be some sort of recursion I’m hitting that could be causing this.

gist.github.com

https://gist.github.com/artburkart/bae539db2d6d85fdf4237b485ed13312

influx_crash.log

Ran this query through Grafana:
SELECT mean("thread_cache_needed") AS "Thread Cache Needed" FROM (SELECT 100 - (("threads_created" / "connections") * 100) AS "thread_cache_needed" FROM "mysql" WHERE $timeFilter AND "host" = 'ip-10-20-0-127.ec2.internal' fill(null)) WHERE $timeFilter GROUP BY time($__interval) fill(null)



InfluxDB Log:

...snip.....
[I] 2017-08-03T18:13:12Z SELECT mean(thread_cache_needed) AS "Thread Cache Needed" FROM (SELECT 100 - ((threads_created / connections) * 100) AS thread_cache_needed FROM simplisafe.autogen.mysql WHERE time > 1500514212972ms AND time < 1500522389529ms AND host = '10.0.0.1') WHERE time > 1500514212972ms AND time < 1500522389529ms GROUP BY time(30s) service=query

This file has been truncated. show original

artburkart · August 4, 2017, 2:33pm

Is there anything I can do to make this question more approachable? Am I asking it in the wrong forum?

Jeffery_K · August 7, 2017, 8:41pm

A field key for a measurement already exists with a defined context type (e.g. float) in the backend InfluxDB
A insert arrives for that measurement & field with a different context type (e.g. int)
This is probably the source of your issue. I ran into this while inserting some data, and we’ve actually had to make certain changes on the data being inserts, to make sure it’s inserted as the right type, the first time, because the very first insert, is what determines the field type.
So, if you have a field that sometimes has string data, and sometimes has numeric data, you’ll want to make sure you insert it as a string (quotes around it) all the time, so that it isn’t created as a float/int value, and then you try to insert a string.

Or, if some of the values are float, and others are int, make the int’s float, by adding .0 to them.

More info: Line Protocol Reference | InfluxDB OSS 1.3 Documentation
Hope this helps.

artburkart · August 7, 2017, 8:57pm

Thanks for the response @Jeffery_K! I’ve seen the docs you pointed out, and I think I understand the gist of it all. Do you know if it would result in corruption of a shard as well as data loss?

Jeffery_K · August 7, 2017, 10:28pm

No corruption, but anything in that specific write call will be gone. So just data loss.

artburkart · August 7, 2017, 10:43pm

I think I might be experiencing two separate issues because your explanation covers the type mismatch log message, but not the content of the attached log file or the fact that I had to delete the shard in question in order to recover.

Jeffery_K · August 9, 2017, 9:20pm

Oh, I didn’t even see that second question, Yes, my comment was only surrounding the type mismatch.
I don’t know much about the second crash. I wouldn’t think that would be related to the type mismatch.

I don’t know much about the go programming language (never wrote in it), however I agree with your assessment. It seems like you are hitting some kind of recursion. Hopefully, someone from influx sees this, and can bring it to their engineers that you have a specific query, that when run, causes the “runtime: goroutine stack exceeds 1000000000-byte limit” error, and crashes influxd.
It looks like your query is causing an infinite recursion to occur in the query execution code at /root/go/src/github.com/influxdata/influxdb/influxql/ast.go:4493

Jeffery_K · August 9, 2017, 9:22pm

What version of influx was that crashlog generated against?

artburkart · August 9, 2017, 9:45pm

We updated to v1.3.0 today because we learned it was a bug fixed in 1.2.1, but we still get the mismatch error and our influx db refuses to start until we delete the latest shard. At this point we’ve lost about 7 days of data, so hopefully we can solve it soon. Just to be clear though, the recursion bug no longer occurs. That’s a plus!

Thanks for your insights

Topic		Replies	Views
E! [outputs.influxdb] Failed to write metric (will be dropped: 400 Bad Request) InfluxDB 1 influxdb	1	3318	June 20, 2022
Field type conflict	1	2893	June 27, 2018
Possible issue with shard corruption	2	2011	June 7, 2017
Duplicate fieldKey name with different fieldType	4	792	March 8, 2019
Field type conflict in kapacitor logs	0	717	September 13, 2018

Write failed for shard #### field type conflict service=write

Related topics