Hi, I’m running into an issue with my influxdb server where I get this output:
[I] 2017-08-03T17:39:58Z Reloaded WAL cache /var/lib/influxdb/wal/simplisafe/autogen/6663 in 1.212740854s engine=tsm1
[I] 2017-08-03T17:39:58Z write failed for shard 6663: [shard 6663] field type conflict service=write
[I] 2017-08-03T17:39:58Z tsm1 WAL starting with 10485760 segment size engine=tsm1 service=wal
[I] 2017-08-03T17:39:58Z tsm1 WAL writing to /var/lib/influxdb/wal/simplisafe/autogen/6663 engine=tsm1 service=wal
[I] 2017-08-03T17:39:58Z /var/lib/influxdb/data/simplisafe/autogen/6663/000001744-000000007.tsm (#1) opened in 11.438441ms engine=tsm1 service=filestore
[I] 2017-08-03T17:39:58Z /var/lib/influxdb/data/simplisafe/autogen/6663/000002224-000000005.tsm (#2) opened in 33.907891ms engine=tsm1 service=filestore
[I] 2017-08-03T17:39:58Z /var/lib/influxdb/data/simplisafe/autogen/6663/000002228-000000003.tsm (#3) opened in 38.188929ms engine=tsm1 service=filestore
[I] 2017-08-03T17:39:58Z /var/lib/influxdb/data/simplisafe/autogen/6663/000001744-000000006.tsm (#0) opened in 40.400569ms engine=tsm1 service=filestore
[I] 2017-08-03T17:39:58Z reading file /var/lib/influxdb/wal/simplisafe/autogen/6663/_08910.wal, size 10488091 engine=tsm1 service=cacheloader
[I] 2017-08-03T17:39:58Z reading file /var/lib/influxdb/wal/simplisafe/autogen/6663/_08911.wal, size 10487624 engine=tsm1 service=cacheloader
[I] 2017-08-03T17:39:59Z reading file /var/lib/influxdb/wal/simplisafe/autogen/6663/_08912.wal, size 105747 engine=tsm1 service=cacheloader
[I] 2017-08-03T17:39:59Z reading file /var/lib/influxdb/wal/simplisafe/autogen/6663/_09725.wal, size 0 engine=tsm1 service=cacheloader
The only thing I’ve figured out to get the server back up and running is to delete the shard from the file system. This works, except this is the second time in the past two days that I’ve had to do this for the same shard. Is there any way for me to repair or inspect the shard or prevent whichever write is causing it to fail to stop happening? Any advice would be much appreciated.
Cheers
Here is a crash log. I think there might be some sort of recursion I’m hitting that could be causing this.
Is there anything I can do to make this question more approachable? Am I asking it in the wrong forum?
A field key for a measurement already exists with a defined context type (e.g. float) in the backend InfluxDB
A insert arrives for that measurement & field with a different context type (e.g. int)
This is probably the source of your issue. I ran into this while inserting some data, and we’ve actually had to make certain changes on the data being inserts, to make sure it’s inserted as the right type, the first time, because the very first insert, is what determines the field type.
So, if you have a field that sometimes has string data, and sometimes has numeric data, you’ll want to make sure you insert it as a string (quotes around it) all the time, so that it isn’t created as a float/int value, and then you try to insert a string.
Or, if some of the values are float, and others are int, make the int’s float, by adding .0 to them.
More info: Line Protocol Reference | InfluxDB OSS 1.3 Documentation
Hope this helps.
Thanks for the response @Jeffery_K! I’ve seen the docs you pointed out, and I think I understand the gist of it all. Do you know if it would result in corruption of a shard as well as data loss?
No corruption, but anything in that specific write call will be gone. So just data loss.
I think I might be experiencing two separate issues because your explanation covers the type mismatch log message, but not the content of the attached log file or the fact that I had to delete the shard in question in order to recover.
Oh, I didn’t even see that second question, Yes, my comment was only surrounding the type mismatch.
I don’t know much about the second crash. I wouldn’t think that would be related to the type mismatch.
I don’t know much about the go programming language (never wrote in it), however I agree with your assessment. It seems like you are hitting some kind of recursion. Hopefully, someone from influx sees this, and can bring it to their engineers that you have a specific query, that when run, causes the “runtime: goroutine stack exceeds 1000000000-byte limit” error, and crashes influxd.
It looks like your query is causing an infinite recursion to occur in the query execution code at /root/go/src/github.com/influxdata/influxdb/influxql/ast.go:4493
What version of influx was that crashlog generated against?
We updated to v1.3.0 today because we learned it was a bug fixed in 1.2.1, but we still get the mismatch error and our influx db refuses to start until we delete the latest shard. At this point we’ve lost about 7 days of data, so hopefully we can solve it soon. Just to be clear though, the recursion bug no longer occurs. That’s a plus!
Thanks for your insights