Batch load into remote influxDB using Python - continue

Hello Anais.

I clicked in “Solution” by mistake in the other thread. :frowning:

Thanks for the example.
And sorry for all the questions, I’m really newbie in Python and InfluxDB.

I’m almost there. :slight_smile:
Python code looks to work, however, data were not inserted into InfluxDB.

My code:

from influxdb_client import InfluxDBClient, Point
username = ‘mon360’
password = base64.b64decode(env.INFLUXDBUSRPWD)
database = ‘xxxxxx’
retention_policy = ‘autogen’
bucket = f’{database}/{retention_policy}’

client = InfluxDBClient(url=‘xxxxxxxxxx:8086’, token=f’{username}:{password}‘, org=’-‘)
print(’*** Write Points ***')
write_api = client.write_api()

v_cpupct = psutil.cpu_percent()
v_curtime = time.strftime(“%Y-%m-%d-T%H:%M:%SZ%z”, time.localtime())

point = Point(“SRV_HEATH”).tag(“CUSTOMER”, env.CUSTOMER).tag(“HOSTNAME”, env.HOSTNAME).tag(“RESOURCE”, “CPU”).field(“PCT_USED”, v_cpupct)
print(point.to_line_protocol())

write_api.write(bucket=bucket, record=point)

So, there are 3 questions here:

  1. What is wrong that even without and execution error, no data is written into InfluxDB?
  2. Where, in the point, the time will be defined to be inserted?
  3. The example looks to be for a single point insert. How to change it to perform a bulk load?

Many thanks.

No worries! Thanks for your willingness to learn!

  1. Hmm I’m not sure. You can try setting debug=True when you instantiate the client like so:
    client = InfluxDBClient(url="http://localhost:9999", token="my-token", org="my-org", debug=True)
  2. A timestamp is created at the time of insert.
  3. SOLUTION 1: THIS BLOG: Write millions of points from csv.
    Write Millions of Points From CSV to InfluxDB with the 2.0 Python Client | InfluxData

SOLUTION 2: COMPOSE A DATAFRAME
You can also write an entire DataFrame to InfluxDB. This script creates a DataFrame with 2 rows, a datetime timestamp index, and one tag column “location”.

Write Pandas DataFrame
"""
_now = pd.Timestamp().now('UTC')
_data_frame = pd.DataFrame(data=[["coyote_creek", 1.0], ["coyote_creek", 2.0]],
                           index=[now, now + timedelta(hours=1)],
                           columns=["location", "water_level"])

_write_client.write(bucket.name, record=data_frame, data_frame_measurement_name='h2o_feet',
                    data_frame_tag_columns=['location'])

SOLUTION 3: CSV TO DATAFRAME TO INFLUX:
Here is an example of how to read a csv, convert it to a dataframe, and write the dataframe to InfluxDB.
Imagine your csv zoo_data.csv has the following headers:
Animal_name, count, cost
You want Animal_name to be a tag and the rest to be fields.

You might use the following script to read the csv, convert it to a dataframe, add a timestamp column, set that timestamp column as an index and write the DataFrame to InfluxDBv2.

from influxdb_client import InfluxDBClient
import pandas as pd
mydata = pd.read_csv("~/Downloads/zoo_data.csv")
mydata.head()
print(mydata.size) #size of DataFrame is 250

# create an array of regularly. spaced timestamps and add it to the DataFrame as an index. 
import datetime
t = pd.date_range(start='1/1/2020', end='05/01/2020', periods=1818)
s = pd.Series(t, name = 'TimeStamp')
mydata.insert(0, 'TimeStamp', s)
mydata = mydata.set_index('TimeStamp')

token = $mytoken
bucket = "demoBucket"
org = "hackathonDemoOrg"

from influxdb_client import InfluxDBClient, Point, WriteOptions
from influxdb_client.client.write_api import SYNCHRONOUS

client = InfluxDBClient(url="http://localhost:9999", token=token, org=org, debug=False)
write_client = client.write_api(write_options=SYNCHRONOUS)

write_client.write(bucket, record=mydata, data_frame_measurement_name='zoo-data',
                    data_frame_tag_columns=["Animal_name"])
1 Like

It worked… Partially. :slight_smile:

I was able to load (and I can see it on db :slight_smile:) , however I’m only able to load 2 records.
When trying to insert the third record (and any other number of records above 2) it triggers an error:

ValueError: Shape of passed values is (4, 4), indices imply (2, 4)

My code:

for i in range(2):
v_disk = psutil.disk_partitions()[i][1]
v_diskpct = psutil.disk_usage(psutil.disk_partitions()[i][1])[3]
data_array.append([env.CUSTOMER, env.HOSTNAME, v_disk, v_diskpct])
data_frame = pd.DataFrame(data=data_array,
index=[v_curtime, v_curtime],
columns=[“CUSTOMER”, “HOSTNAME”, “RESOURCE”, “PCT_USED”])
from influxdb_client.client.write_api import SYNCHRONOUS
write_client = client.write_api(write_options=SYNCHRONOUS)
write_client.write(‘MON360’, record=data_frame, data_frame_measurement_name=‘SRV_HEATH’,
data_frame_tag_columns=[‘CUSTOMER’, ‘HOSTNAME’, ‘RESOURCE’])

Any clue???

Hello.

While digging I noticed that the number of itens in the “index” clause must match the number of records I need to insert.
That said, it worked for a known number of records, but once in the real life app I’m building the number of records is unknown, how to deal with that?

I already tried to build a loop to create a variable with all the information and pass the variable to “pd.DataFrame”, but it didn’t work.
Also I tried to pass as literal time instead of a variable without success. :frowning:

Any clue?

Hello @RicaRezende,
You don’t need to make your timestamp column an index anymore.

You could look at the length of your df and build your index based off of that. Or you could not store the timestamp in the index, store it in a different column instead (see fix above) and fill the column with a value. Although if you’re writing records with the same timestamp and the same tags/measurements then the last record will just overwrite the rest. Is there not a timestamp for each record? If there isn’t then this doesn’t feel like timeseries data.