Problem in Storing data in Influxdb

Hello everyone,

I am currently interfacing ECG sensor with STM32F discovery board. I have written a python script to read the ECG values serially at 4ms and store it in InfluxDB database.

And here goes my python code:
#!/usr/bin/python

from influxdb import InfluxDBClient
import serial
import time

#Global Variables
#Create a file to store data read from EPBM4.
path = '/home/edutech/adc_test'


#Authentication Vriables to login in InfluxDBClient

localhost = '192.12.313.31'		#Host IP Address
username = 'xxxx'				#influx db username & password
password = 'xxxx'			
port = 8086


#Creating a Serial Port Instance 
ser = serial.Serial('/dev/ttyUSB0', 9600)


#Connect to Influx DB running on Host PC or RPi3
#@param host Ip addr, port number(default 8086), influxdb username and password
client = InfluxDBClient(localhost, port, username, password)


#Create a new database to store ECG Data
client.create_database('ecg_value')
client.switch_database('ecg_value')



#Writing Datapoint in 'ecg' database
#@param  [' your data' ], {'db' : 'db_name' }, 204, 'line') 
while True:
	store_data = ser.readline()			#Reading the serial data 
	client.write(store_data, {'db': 'ecg_value' },204,'line')  

I haven’t given a timestamp, as its optional.

I have checked the Datapoints by querying it in influx, and I think data data is not store properly.

Reason: I am transmitting ECG values at 4ms, so in 1 sec i should get 1000/4 = 250 datapoints and i am only getting 20-25 points.

Can anyone suggest me a solution ?

One more thought, if i store using timestamp than will influx store all my datapoints according to timestamp ?

With the client you are using, I have no idea what it is using for a default timestamp(seconds, ms, ns)

You should add a timestamp into your code to make sure it is saving at the correct time and not rounding.

Also how are you querying the data? Tools like Grafana have limits to the number of points they display. Query the data in JSON or export it as CSV to check how many metrics are actually stored

Also, all data in InfluxDB is stored with a timestamp. It is a time series database so that is required. Maybe I am not understanding the question?

Hello,

I am continuously reading ECG values serially (receiving data points at every 4ms) and storing data (with timestamp) into influxdb.

I have created a while loop (given below) to store data points:

while True:
	raw_data = ser.readline()
	ecg_data = int(raw_data)

data = [{ 
"measurement": "ECG", \
"tags": { "user": "Nikhil" },  \
"time": datetime.datetime.strptime(datetime.datetime(2017,6,9,tzinfo=datetime.timezone.utc)  \
	.now().isoformat(), '%Y-%m-%dT%H:%M:%S.%f').strftime('%Y-%m-%dT%H:%M:%S.%fZ'), \
"fields": {"rate": ecg_data }	\
}]

client.write_points(data, time_precision='u', protocol=u'json')

But when i query in my database, it stores only 24-25 data points in 1 second. Does influxdb have any maximum data limit to store data in 1 sec ?

Because when i write the serial data directly in file, it gives me approx 70-75 datapoints/sec.

Assuming that your client.write_points() calls are inside the loop, based on what you describe I suspect the culprit is simply the time that it takes to carry out the Influx write - in particular if you’re doing it over a network. Remember that since everything is being executed synchronously, your code will wait for the HTTP call triggered by client.write_points() to complete before moving on to take the next reading.

One fairly straightforward way to troubleshoot would be to have your script print out the values (and timestamps!) that are being saved to STDOUT. Try this with and without the client.write_points() call. See what the actual frequency of the readings is in both cases, and whether there’s a difference.

If you do need to take readings at 4ms intervals and write them over a network, you probably won’t be able to keep up making individual synchronous HTTP calls for each readings. Consider either:

  • Batching readings within your script, and sending them in batches (e.g. 1,000 readings at a time)
  • Using another mechanism to send the readings, such as UDP packets to Telegraf

Something else worth considering, you are only storing a single field, tag and measurement. You can only store a single data point per time measurement. So if you end up storing two values at the same time timestamp with the same tags, measurement names, it will overwrite it.

Outputting it to a file like @svet said is a good idea to troubleshoot.

Also maybe you should considering batching metrics. Instead of doing a request every 4ms, you should wait maybe a second and then send them in a group. This will save overhead on InfluxDB writes.

Oops, looks like I missed that recommendation from @svet. So yes I agree with that point too

Without client.write_points() call the frequency is around 71 samples/sec. And by using client.write_points() it reduced 7-10 samples/sec.

I am now sending the data in batches and 75datapoints/sec is getting stored in influxdb. Now i am facing issue on Grafana side, as I am unable to visualize the datapoints might be due to timestamps.

I’ve responded to your twin question on the Grafana forum: Problem in Querying the Data from influxdb - #4 by svetb - InfluxDB - Grafana Labs Community Forums

At least part of the problem is that you’re generating timestamps in local time but telling Influx they’re UTC (by appending “Z” at the end).

I don’t quite follow what your timestamp function does, but something like

datetime.datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%S.%fZ')

should be sufficient, assuming your device clock is set correctly.

In terms of why you’re still only getting 70-75 readings per second (instead of the 250 you were expecting), I think your serial connection’s 9600 baud rate may also be a bottleneck.

Consider that that link’s putting through a maximum of 9600 bits per second gross. Even in a very “frugal” scenario, where that link is literally only transmitting a sequence of 16-bit ints, you’ll be lucky to hit 250 samples/second. And since you’re using readline() to get the data, I would assume that you’ve got some non-negligible overhead on there that’s taking that down considerably.

I don’t know how important it is for you to get the full 250 samples/sec, but if you you’ll probably need to make some fairly low-level optimizations in order to hit that.

I have tried with baud rate 115200, and it gives me the same rate of 70-75 data points/sec.

In my case, 70-75 data points are fine as I get proper pulse waveform on excel.