Suspected Deadlock Issue – Need Help

I’ve been using InfluxDB without any issues, but it suddenly stopped. The InfluxDB version is 2.6.0

I am relatively new to Go and still learning about concurrency and locking mechanisms. I suspect that a potential deadlock might occur in InfluxDB 2.6.0 related to the TagValueIterator() function when interacting with AddSeriesList().

Specifically, I wonder if the following sequence could lead to a deadlock scenario:

  • TagValueIterator() acquires the first RLock() on f.mu.
  • AddSeriesList() then attempts to acquire a write Lock() on f.mu.
  • Meanwhile, tk.TagValueIterator() attempts to acquire another RLock(), which might cause a deadlock.

If tk.f is pointing to the same LogFile instance, is it possible that this sequence could result in a deadlock? Since Go’s sync.RWMutex does not allow acquiring a write lock (Lock()) while a read lock (RLock()) is already held, I am curious whether this could be a potential issue in certain conditions.

Below is the pprof output from when the issue first occurred. In this state, the lock is not being released. Additionally, deletion and queries are not working.

goroutine 106814401 [semacquire, 6 minutes]:
sync.runtime_SemacquireMutex(0xc00015020c?, 0x78?, 0x3?)
        /usr/local/go/src/runtime/sema.go:77 +0x25
sync.(*RWMutex).Lock(0xc023a48620?)
        /usr/local/go/src/sync/rwmutex.go:152 +0x71
github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*LogFile).AddSeriesList(0xc0095a71d0, 0xc000150200, {0xc00863f800?, 0x13, 0x0?}, {0xc00863fb00?, 0x13, 0xc00e37daf8?})
       influxdb-2.6.0/tsdb/index/tsi1/log_file.go:545 +0x4a5
github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*Partition).createSeriesListIfNotExists(0xc037ff10e0, {0xc00863f800, 0x13, 0x20}, {0xc00863fb00, 0x13, 0x20})
       influxdb-2.6.0/tsdb/index/tsi1/partition.go:725 +0x165
github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*Index).CreateSeriesListIfNotExists.func1()
       influxdb-2.6.0/tsdb/index/tsi1/index.go:680 +0x13e
created by github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*Index).CreateSeriesListIfNotExists
       influxdb-2.6.0/tsdb/index/tsi1/index.go:673 +0x1dd
	~~~
 
goroutine 106815338 [semacquire, 6 minutes]:
sync.runtime_SemacquireMutex(0x4?, 0x40?, 0x2?)
        /usr/local/go/src/runtime/sema.go:77 +0x25
sync.(*RWMutex).RLock(...)
        /usr/local/go/src/sync/rwmutex.go:71
github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*LogFile).MeasurementIterator(0xc0095a71d0)
       influxdb-2.6.0/tsdb/index/tsi1/log_file.go:784 +0x6b
	~~~
 
		
goroutine 106814631 [semacquire, 6 minutes]:
sync.runtime_SemacquireMutex(0x3318308?, 0x38?, 0xc?)
        /usr/local/go/src/runtime/sema.go:77 +0x25
sync.(*RWMutex).RLock(...)
        /usr/local/go/src/sync/rwmutex.go:71
github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*logTagKey).TagValueIterator(0xc02a1a6fb8)
       influxdb-2.6.0/tsdb/index/tsi1/log_file.go:1385 +0x51
github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*LogFile).TagValueIterator(0xc0095a71d0?, {0xc04537e640?, 0xa?, 0x158ed72?}, {0xc03be04a20, 0x9, 0x28?})
       influxdb-2.6.0/tsdb/index/tsi1/log_file.go:432 +0x185
	~~~

I appreciate any insights!

Hello @han,
I’m not sure. I think this question might be better suited here:

1 Like

Hello @han I have a similar issue (see Query engine stuck) and I recover by restarting influxd (wrote a small daemon for that). Before the restart I also take a snapshot of the mutexes (via http://localhost:8086/debug/pprof/mutex?debug=1) and from the trace it looks like both TagValueIterator and AddSeriesList are busy in a RUnlock:

43012 2 @ 0x7f029e3e9085 0x7f029f0d9d53 0x7f029f0ca6d8 0x7f029f0ea3ea 0x7f029f0d118b 0x7f029f06a09b 0x7f029f072a65 0x7f029f0a753e 0x
7f02a0598d23 0x7f02a0598906 0x7f02a054199b 0x7f02a017c976 0x7f02a017ffe5 0x7f02a017c523 0x7f029f22dec5 0x7f029e3dc781
#       0x7f029e3e9084  sync.(*RWMutex).RUnlock+0x24                                                                            /go/
src/sync/rwmutex.go:119
#       0x7f029f0d9d52  github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*LogFile).TagValueIterator+0x192                     /root/project/tsdb/index/tsi1/log_file.go:432
#       0x7f029f0ca6d7  github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*FileSet).TagValueIterator+0x117                     /root/project/tsdb/index/tsi1/file_set.go:334
#       0x7f029f0ea3e9  github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*Partition).TagValueIterator+0x89                    /root/project/tsdb/index/tsi1/partition.go:822
#       0x7f029f0d118a  github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*Index).TagValueIterator+0x10a                       /root/project/tsdb/index/tsi1/index.go:977
#       0x7f029f06a09a  github.com/influxdata/influxdb/v2/tsdb.IndexSet.tagValueIterator+0x11a                                  /root/project/tsdb/index.go:2096
#       0x7f029f072a64  github.com/influxdata/influxdb/v2/tsdb.IndexSet.MeasurementTagKeyValuesByExpr+0x684                     /root/project/tsdb/index.go:2977
#       0x7f029f0a753d  github.com/influxdata/influxdb/v2/tsdb.(*Store).TagValues+0x83d                                         /root/project/tsdb/store.go:2017
#       0x7f02a0598d22  github.com/influxdata/influxdb/v2/v1/services/storage.(*Store).tagValues+0x342                          /root/project/v1/services/storage/store.go:457
#       0x7f02a0598905  github.com/influxdata/influxdb/v2/v1/services/storage.(*Store).TagValues+0x305                          /root/project/v1/services/storage/store.go:413
#       0x7f02a054199a  github.com/influxdata/influxdb/v2/storage/flux.(*tagValuesIterator).Do+0x1da                            /root/project/storage/flux/reader.go:982
#       0x7f02a017c975  github.com/influxdata/influxdb/v2/query/stdlib/influxdata/influxdb.(*Source).processTables+0xb5         /root/project/query/stdlib/influxdata/influxdb/source.go:69
#       0x7f02a017ffe4  github.com/influxdata/influxdb/v2/query/stdlib/influxdata/influxdb.(*readTagValuesSource).run+0x104     /root/project/query/stdlib/influxdata/influxdb/source.go:485
#       0x7f02a017c522  github.com/influxdata/influxdb/v2/query/stdlib/influxdata/influxdb.(*Source).Run+0xa2                   /root/project/query/stdlib/influxdata/influxdb/source.go:50
#       0x7f029f22dec4  github.com/influxdata/flux/execute.(*executionState).do.func2+0x3c4                                     /go/pkg/mod/github.com/influxdata/flux@v0.195.2/execute/executor.go:535
140687752 391 @ 0x7f029f0db256 0x7f029f0db23e 0x7f029f0e9473 0x7f029f0cf679 0x7f029e3dc781
#       0x7f029f0db255  sync.(*RWMutex).RUnlock+0x475                                                                           /go/
src/sync/rwmutex.go:119
#       0x7f029f0db23d  github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*LogFile).AddSeriesList+0x45d                        /root/project/tsdb/index/tsi1/log_file.go:538
#       0x7f029f0e9472  github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*Partition).createSeriesListIfNotExists+0x152        /root/project/tsdb/index/tsi1/partition.go:729
#       0x7f029f0cf678  github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*Index).CreateSeriesListIfNotExists.func1+0x118      /root/project/tsdb/index/tsi1/index.go:679

AddSeriesList also has several Unlocks active.

Is there anything particular I should try to check?

Influxdb 2.7.5 and 2.7.10

Got a second case. Again, TagValueIterator was in 1*RUnlock and AddSeriesList was in 3*Unlock and 1*RUnlock. Same signature…

Hello. In my case, it seems that the issue occurred when a new series was inserted and another client called schema.measurementTagValues. The same issue did not occur when I stopped using schema.measurementTagValues. I have reported the issue here: Possible Deadlock in TagValueIterator() and AddSeriesList() · Issue #26164 · influxdata/influxdb · GitHub — please take a look!
I hope your issue gets resolved as well.

I saw the ticket, it is far more detailed of what I can get.

The only contribution I may have is the MUTEX dumps during the freezes. Please let me know if they can be of any help.