Queries for most-recent records returns random null fields


#1

With one python client (example at bottom of this post) we’re inserting ~15 fields into an InfluxDB at 10-20hz, and with another client (python, javascript via HTTP, or curl; client doesn’t seem to matter) we’re querying the data back at a regular interval (every second or so).

The query results are showing random NULL field values in random records that we know are not NULL, and this only occurs in the most recent record. If you go back a few seconds later and requery that record (using the timestamp) all fields will be non-NULL as expected. Also, if you change the query to return the TWO most recent records (“ORDER BY time DESC LIMIT 2”), then you’ll only see the NULLs in the most-recent record, never the 2nd-most-recent record.

The number of fields and frequency of INSERTs seems to make a difference. When I started writing the example python script below I started with 3 fields at 10hz and could not reproduce the problem. It was only once I got up to 15 fields at 100hz before I started seeing the problem and even then it took ~15 seconds on my machine. In our real application this problem reproduces constantly.

Is it possible that InfluxDB is returning records that are not yet fully constructed?

X-Influxdb-Build: OSS
X-Influxdb-Version: 1.7.2

Shell script to monitor data:

while [ true ]
do
    curl -G HOST:PORT/query -u USER:PASS --data-urlencode "db=testdata" --data-urlencode "q=SELECT * FROM test GROUP BY id ORDER BY time DESC LIMIT 1" -H 'Accept: application/csv'
    sleep 1
done

Example Output (note the 2nd and 7th results):

name,tags,time,a,b,c,d,e,f,g,h,i,k
test,id=BRAVO,1549362627625793024,3281,3281,2821.7526728879693,3281.2938092214545,3281,NUMBER3281,76,876,9567,0
test,id=ALPHA,1549362627625793024,3281,3281,2821.7526728879693,3281.2938092214545,3281,NUMBER3281,76,876,9567,0
name,tags,time,a,b,c,d,e,f,g,h,i,k
test,id=BRAVO,1549362628735095040,3337,3337,2739.900803680287,3337.987289242315,3337,NUMBER3337,75,190,1176,0
test,id=ALPHA,1549362628735095040,,,,,,,,,,0
name,tags,time,a,b,c,d,e,f,g,h,i,k
test,id=BRAVO,1549362629847493888,3394,3394,1794.644406422681,3394.137331619533,3394,NUMBER3394,27,7,72,0
test,id=ALPHA,1549362629847493888,3394,3394,1794.644406422681,3394.137331619533,3394,NUMBER3394,27,7,72,0
name,tags,time,a,b,c,d,e,f,g,h,i,k
test,id=BRAVO,1549362630926076928,3447,3447,3088.427124800288,3447.1667128580702,3447,NUMBER3447,1,225,3706,0
test,id=ALPHA,1549362630926076928,3447,3447,3088.427124800288,3447.1667128580702,3447,NUMBER3447,1,225,3706,0
name,tags,time,a,b,c,d,e,f,g,h,i,k
test,id=BRAVO,1549362632039686912,3502,3502,226.2890415850697,3502.8870562305306,3502,NUMBER3502,32,535,8675,0
test,id=ALPHA,1549362632039686912,3502,3502,226.2890415850697,3502.8870562305306,3502,NUMBER3502,32,535,8675,0
name,tags,time,a,b,c,d,e,f,g,h,i,k
test,id=BRAVO,1549362633150441984,3559,3559,14.748148441622998,3559.527007017778,3559,NUMBER3559,72,867,5978,0
test,id=ALPHA,1549362633150441984,3559,3559,14.748148441622998,3559.527007017778,3559,NUMBER3559,72,867,5978,0
name,tags,time,a,b,c,d,e,f,g,h,i,k
test,id=BRAVO,1549362634248436992,3614,3614,668.0670182831518,3614.584586236862,3614,NUMBER3614,63,978,6921,0
test,id=ALPHA,1549362634248436992,,,,,,NUMBER3614,63,978,6921,0
name,tags,time,a,b,c,d,e,f,g,h,i,k
test,id=BRAVO,1549362635328133120,3668,3668,2697.5671731272537,3668.6300554540594,3668,NUMBER3668,54,249,1678,0
test,id=ALPHA,1549362635328133120,3668,3668,2697.5671731272537,3668.6300554540594,3668,NUMBER3668,54,249,1678,0

Demo script to insert data:

import time, datetime, pprint, math, random
from influxdb import InfluxDBClient

BASETIME = datetime.datetime(1970, 1, 1)

def createInfluxClient(database, measurement):
	client = InfluxDBClient(HOST, PORT, USER, PASS)

	if not database in [db['name'] for db in client.get_list_database()]:
		client.create_database(database)

	client.switch_database(database)

	if measurement in [ms['name'] for ms in client.get_list_measurements()]:
		client.drop_measurement(measurement)
	return client

def getTimestamp():
	d = datetime.datetime.now() - BASETIME
	return int(math.floor(d.total_seconds() * 1e+9))

if __name__ == '__main__':
	client = createInfluxClient('testdata', 'test')

	try:
		count = 0
		while True:
			timestamp = getTimestamp()
			fields = {
				"a": count,
				"b": count,
				"c": count * random.uniform(0, 1),
				"d": count + random.uniform(0, 1),
				"e": str(count),
				"f": 'NUMBER' + str(count),
				"g": random.randint(0, 100),
				"h": random.randint(0, 1000),
				"i": random.randint(0, 10000),
				"j": "",
				"k": 0.0,
				"l": 0,
				"m": 'asdf',
				"n": 42,
				"o": 0,
			}
			client.write_points([
				{
					"measurement": "test",
					"time": timestamp,
					"tags": {
						"id": "ALPHA",
					},
					"fields": fields,
				},
				{
					"measurement": "test",
					"time": timestamp,
					"tags": {
						"id": "BRAVO",
					},
					"fields": fields,
				}
			])
			print(count)
			count = count + 1
			time.sleep(0.01)
	except KeyboardInterrupt:
		pass
	finally:
		client.close()