Batch load into remote influxDB using Python - continue

RicaRezende · August 11, 2020, 3:56pm

Hello Anais.

I clicked in “Solution” by mistake in the other thread.

Thanks for the example.
And sorry for all the questions, I’m really newbie in Python and InfluxDB.

I’m almost there.
Python code looks to work, however, data were not inserted into InfluxDB.

My code:

from influxdb_client import InfluxDBClient, Point
username = ‘mon360’
password = base64.b64decode(env.INFLUXDBUSRPWD)
database = ‘xxxxxx’
retention_policy = ‘autogen’
bucket = f’{database}/{retention_policy}’

client = InfluxDBClient(url=‘xxxxxxxxxx:8086’, token=f’{username}:{password}‘, org=’-‘)
print(’*** Write Points ***')
write_api = client.write_api()

v_cpupct = psutil.cpu_percent()
v_curtime = time.strftime(“%Y-%m-%d-T%H:%M:%SZ%z”, time.localtime())

point = Point(“SRV_HEATH”).tag(“CUSTOMER”, env.CUSTOMER).tag(“HOSTNAME”, env.HOSTNAME).tag(“RESOURCE”, “CPU”).field(“PCT_USED”, v_cpupct)
print(point.to_line_protocol())

write_api.write(bucket=bucket, record=point)

So, there are 3 questions here:

What is wrong that even without and execution error, no data is written into InfluxDB?
Where, in the point, the time will be defined to be inserted?
The example looks to be for a single point insert. How to change it to perform a bulk load?

Many thanks.

Anaisdg · August 11, 2020, 9:10pm

No worries! Thanks for your willingness to learn!

Hmm I’m not sure. You can try setting debug=True when you instantiate the client like so:
client = InfluxDBClient(url="http://localhost:9999", token="my-token", org="my-org", debug=True)
A timestamp is created at the time of insert.
SOLUTION 1: THIS BLOG: Write millions of points from csv.
Write Millions of Points From CSV to InfluxDB with the 2.0 Python Client | InfluxData

SOLUTION 2: COMPOSE A DATAFRAME
You can also write an entire DataFrame to InfluxDB. This script creates a DataFrame with 2 rows, a datetime timestamp index, and one tag column “location”.

Write Pandas DataFrame
"""
_now = pd.Timestamp().now('UTC')
_data_frame = pd.DataFrame(data=[["coyote_creek", 1.0], ["coyote_creek", 2.0]],
                           index=[now, now + timedelta(hours=1)],
                           columns=["location", "water_level"])

_write_client.write(bucket.name, record=data_frame, data_frame_measurement_name='h2o_feet',
                    data_frame_tag_columns=['location'])

SOLUTION 3: CSV TO DATAFRAME TO INFLUX:
Here is an example of how to read a csv, convert it to a dataframe, and write the dataframe to InfluxDB.
Imagine your csv zoo_data.csv has the following headers:
Animal_name, count, cost
You want Animal_name to be a tag and the rest to be fields.

You might use the following script to read the csv, convert it to a dataframe, add a timestamp column, set that timestamp column as an index and write the DataFrame to InfluxDBv2.

from influxdb_client import InfluxDBClient
import pandas as pd
mydata = pd.read_csv("~/Downloads/zoo_data.csv")
mydata.head()
print(mydata.size) #size of DataFrame is 250

# create an array of regularly. spaced timestamps and add it to the DataFrame as an index. 
import datetime
t = pd.date_range(start='1/1/2020', end='05/01/2020', periods=1818)
s = pd.Series(t, name = 'TimeStamp')
mydata.insert(0, 'TimeStamp', s)
mydata = mydata.set_index('TimeStamp')

token = $mytoken
bucket = "demoBucket"
org = "hackathonDemoOrg"

from influxdb_client import InfluxDBClient, Point, WriteOptions
from influxdb_client.client.write_api import SYNCHRONOUS

client = InfluxDBClient(url="http://localhost:9999", token=token, org=org, debug=False)
write_client = client.write_api(write_options=SYNCHRONOUS)

write_client.write(bucket, record=mydata, data_frame_measurement_name='zoo-data',
                    data_frame_tag_columns=["Animal_name"])

RicaRezende · August 11, 2020, 10:05pm

It worked… Partially.

I was able to load (and I can see it on db ) , however I’m only able to load 2 records.
When trying to insert the third record (and any other number of records above 2) it triggers an error:

ValueError: Shape of passed values is (4, 4), indices imply (2, 4)

My code:

for i in range(2):
v_disk = psutil.disk_partitions()[i][1]
v_diskpct = psutil.disk_usage(psutil.disk_partitions()[i][1])[3]
data_array.append([env.CUSTOMER, env.HOSTNAME, v_disk, v_diskpct])
data_frame = pd.DataFrame(data=data_array,
index=[v_curtime, v_curtime],
columns=[“CUSTOMER”, “HOSTNAME”, “RESOURCE”, “PCT_USED”])
from influxdb_client.client.write_api import SYNCHRONOUS
write_client = client.write_api(write_options=SYNCHRONOUS)
write_client.write(‘MON360’, record=data_frame, data_frame_measurement_name=‘SRV_HEATH’,
data_frame_tag_columns=[‘CUSTOMER’, ‘HOSTNAME’, ‘RESOURCE’])

Any clue???

RicaRezende · August 12, 2020, 2:59pm

Hello.

While digging I noticed that the number of itens in the “index” clause must match the number of records I need to insert.
That said, it worked for a known number of records, but once in the real life app I’m building the number of records is unknown, how to deal with that?

I already tried to build a loop to create a variable with all the information and pass the variable to “pd.DataFrame”, but it didn’t work.
Also I tried to pass as literal time instead of a variable without success.

Any clue?

Anaisdg · August 1, 2022, 5:51pm

Hello @RicaRezende,
You don’t need to make your timestamp column an index anymore.

github.com/influxdata/influxdb-client-python

Feature Request: make working with pandas dataframe timestamps easier

opened 05:29PM - 16 May 22 UTC

closed 12:59PM - 24 May 22 UTC

Anaisdg

enhancement

Currently users have to do a bit of timestamp manipulation in order to write pan…das DataFrames, for example: ![image](https://user-images.githubusercontent.com/30506042/168647474-e84ccf9a-013b-426a-a2ee-f4a416af622e.png) The following features could make working pandas DataFrames easier: - The ability to specify the timestamp column/not set the timestamp column as an index - The ability to specify the timestamp type and automatic conversation (like from ISO or rfc3339 timestamps to datetime objects).

You could look at the length of your df and build your index based off of that. Or you could not store the timestamp in the index, store it in a different column instead (see fix above) and fill the column with a value. Although if you’re writing records with the same timestamp and the same tags/measurements then the last record will just overwrite the rest. Is there not a timestamp for each record? If there isn’t then this doesn’t feel like timeseries data.

Topic		Replies	Views
Batch load into remote influxDB using Python Store	7	3064	August 10, 2020
Python client library : problem with writing new point in the bucket	4	58	August 22, 2024
A question about insert data with python	1	30	January 13, 2025
Increasing InfluxDB insertion rate via Influx-Python lib	6	4490	January 2, 2019
No Data in UI After Python Script Write InfluxDB 2 python , docker	2	362	January 10, 2024

Batch load into remote influxDB using Python - continue

Related topics