Uploading using Python (updated with code snippets)

Hey, first time poster here - I’ve never used Influx before but someone requested that I upload some data to a database here.

I generally work in Python’s Pandas to clean data and now I’m trying to upload said data. The data is stored in a dataframe named df with some of the columns shown in the image at the bottom. There are null elements set as NaN.

The start of my code is:

import pandas as pd

df = pd.read_csv("data.csv")
df.info()

which reads through the csvs in a folder and appends them to a dataframe. The df.info() returns

<class 'pandas.core.frame.DataFrame'>
Int64Index: 110327 entries, 0 to 12335
Data columns (total 29 columns):
 #   Column                Non-Null Count   Dtype         
---  ------                --------------   -----         
 0   Date_Time             110327 non-null  datetime64[ns]
 1   Status                28610 non-null   object        
 2   RPM                   110267 non-null  float64       
 3   Wind Speed m/s        110267 non-null  float64       
 4   Tension g             110267 non-null  float64       
 5   filename              110327 non-null  object        
 6   Test                  110327 non-null  object        
dtypes: datetime64[ns](1), float64(3), object(3)
memory usage: 25.3+ MB

After doing a bit of reading and trial and error and looking at the git repository for the Python documentation I think I’ve written a script the convert the datetime into the ciso8601 format to try and solve the issue of Python’s date time vs the C interpreter’s for datetime using the code:

from pytz import UTC
import ciso8601
from influxdb_client.client.write.point import EPOCH

def convert_time(time):
    converted_time = int((UTC.localize(ciso8601.parse_datetime(time.strftime('%Y-%m-%dT%H:%M:%S.%f'))) - EPOCH).total_seconds())
    return converted_time

df['Date_Time'] = df['Date_Time'].apply(lambda x: convert_time(x))
df.set_index('Date_Time', inplace=True)
df.head()

And df.head() returns a dataframe that looks like:

Date_Time (index) Status RPM Wind Speed m/s Tension g filename Test
int(1523356476) string float float float string string

where there may be NaNs in place of null values.

The code for uploading the data to InfluxDB is:

import influxdb_client

from influxdb_client.client.write_api import SYNCHRONOUS
from influxdb_client import InfluxDBClient, WriteOptions

# API Credenntials
bucket = "My bucket ID"
org = "My org ID"
token = "My bucket token"

url="https://europe-west1-1.gcp.cloud2.influxdata.com/"

_client = influxdb_client.InfluxDBClient(
    url=url,
    token=token,
    org=org,
    debug=True
)

_write_client = _client.write_api(
    write_options=WriteOptions(
        batch_size=500,
        flush_interval=10_000,
        jitter_interval=2_000,
        retry_interval=5_000,
        max_retries=5,
        max_retry_delay=30_000,
        exponential_base=2
    )
)

_write_client.write(
    bucket=bucket, 
    org=org, 
    record = df.head(500), 
    data_frame_measurement_name = 'wind-data-df',
    data_frame_tag_columns=['Status', 'filename', 'Test']
)

print("Done!")

And then for the upload section I used the ‘ID’ value (not the name) from the website to link my bucket, organization, and token, and used the google cloud in Belgium, I think it was.

I’ve set the String/object values to tags (though I’m not entirely sure what that means, I’m guessing an easier way to filter data), the Date_Time is set to the index, and everything else should be a float. My current understanding is that ‘String’ values need to be labelled as .tag() and numericals are labelled as .field().

In _write_client.write() I’ve set df.head(500) which should only upload the top 500 values instead of the 110,327 values from my original dataframe. I did attempted to upload all the data at once before hand.

It all seems to upload fine but after closing the connection it seems to have timeout issues because I’ve exceeded my limited_write plan limit? (I am not sure what this is either.)

When I look into my bucket online it is still empty… so what I’d like to know is:

  • Where did the data go?
  • And how can I upload data to show to myself it’s working/I ‘understand’ how to use the InfluxDB Python/Pandas API?

Running the Debug got the response (formatted to read easier) of:

send: 
  b'POST /api/v2/write?
     org=My org ID&
     bucket=My bucket ID&
     precision=ns HTTP/1.1\r\n
       Host: europe-west1-1.gcp.cloud2.influxdata.com\r\n
       Accept-Encoding: identity\r\n
       Content-Length: 60216\r\n
       Content-Encoding: identity\r\n
       Content-Type: text/plain\r\n
       Accept: application/json\r\n
       Authorization: Token My bucket token\r\n
       User-Agent: influxdb-client-python/1.16.0\r\n\r\n
  '
send: b'wind-data-df,<Insert my data here>'

reply: 'HTTP/1.1 204 No Content\r\n'
header: Date: Wed, 07 Apr 2021 15:38:15 GMT
header: Connection: keep-alive
header: trace-id: 914c35c96b69d1cf
header: trace-sampled: false
header: Strict-Transport-Security: max-age=15724800; includeSubDomains

Other questions would be:

  • Am I correct in thinking that string/object values should be listed as a tag_column?
  • Should data appear as a table on the Data Explorer dashboard in the browser?

Your timestamps are from April 2018 - is that correct?
That’s what confused me at first too. You have to set the time window in the UI big enough, otherwise you won’t see any(!) data at all, even in the Data Explorer view.

I did try and set that the date span large but I’m still not seeing anything and maybe this is something I’m doing wrong in the website end UI.

Ok then that was probably not the problem. :wink:
Then the ingress probably already does not work.
I would switch on the debug option:

InfluxDBClient(url=url, token=token, org=org, debug=True)

Also, I would choose a smaller data set for now.

Edit:
Btw, please don’t post your code as an image here in the forum.
Use the markdown format instead:

```python
put your python code here
```

Franky, I have updated my original post! I also ran the debug = True and posted what I got returned from that.

Maybe this is the simple reason and you exceeded the free tier plan?
From the documentation:

This InfluxDB Cloud Free Plan allows you to try out all the features and capabilities of InfluxDB Cloud service up to the certain specific limits that the Free Plan would allow. Free Plan accounts are rate limited as follows:

  • Writes: 5MB every 5 minutes
  • Tasks & Queries: 300MB every 5 minutes
  • Storage: 30 days of retention
  • Cardinality: Up to 10,000 series
  • Alerting: 2 alert checks and 2 notification rules
  • Also, you can create up to:
    • 5 dashboards
    • 5 tasks
    • 2 databases to store your time series data

So I’m sending across a dataframe with 250 x 7 entries including index, that is 1,750 entries which should be okay.

The Debug seems to upload the data but also gives the “reply: no content”. I’m guessing this means that it’s either uploaded the data and finds nothing new or it’s not seeing any data at all?

Deleted the bucket and created a fresh new one and uploaded to that and it still displays the ‘No Content’ reply.

Is there a webinar series or online video tutorials in setting up the online database because I am still ignorant as to what I might be doing wrong…

If I remember correctly, the 204 answer is ok.
I would reduce the complexity of the problem, there are too many error possibilities to begin with.
For example, forget the pandas dataframe for now and try to write only a single data point.
Start with a very simple example in python and if it works, increase the complexity.

Okay, I’ve found the data. It’s in the 3rd of May 1970 - not the 10th of April 2018… so it is working but something is wrong with the date.

Correcting my error for that I have managed to upload a sample of my dataset.

1 Like