Hey, first time poster here - I’ve never used Influx before but someone requested that I upload some data to a database here.
I generally work in Python’s Pandas to clean data and now I’m trying to upload said data. The data is stored in a dataframe named
df with some of the columns shown in the image at the bottom. There are null elements set as
The start of my code is:
import pandas as pd df = pd.read_csv("data.csv") df.info()
which reads through the csvs in a folder and appends them to a dataframe. The
<class 'pandas.core.frame.DataFrame'> Int64Index: 110327 entries, 0 to 12335 Data columns (total 29 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date_Time 110327 non-null datetime64[ns] 1 Status 28610 non-null object 2 RPM 110267 non-null float64 3 Wind Speed m/s 110267 non-null float64 4 Tension g 110267 non-null float64 5 filename 110327 non-null object 6 Test 110327 non-null object dtypes: datetime64[ns](1), float64(3), object(3) memory usage: 25.3+ MB
After doing a bit of reading and trial and error and looking at the git repository for the Python documentation I think I’ve written a script the convert the datetime into the
ciso8601 format to try and solve the issue of Python’s date time vs the C interpreter’s for datetime using the code:
from pytz import UTC import ciso8601 from influxdb_client.client.write.point import EPOCH def convert_time(time): converted_time = int((UTC.localize(ciso8601.parse_datetime(time.strftime('%Y-%m-%dT%H:%M:%S.%f'))) - EPOCH).total_seconds()) return converted_time df['Date_Time'] = df['Date_Time'].apply(lambda x: convert_time(x)) df.set_index('Date_Time', inplace=True) df.head()
df.head() returns a dataframe that looks like:
|Date_Time (index)||Status||RPM||Wind Speed m/s||Tension g||filename||Test|
where there may be
NaNs in place of null values.
The code for uploading the data to InfluxDB is:
import influxdb_client from influxdb_client.client.write_api import SYNCHRONOUS from influxdb_client import InfluxDBClient, WriteOptions # API Credenntials bucket = "My bucket ID" org = "My org ID" token = "My bucket token" url="https://europe-west1-1.gcp.cloud2.influxdata.com/" _client = influxdb_client.InfluxDBClient( url=url, token=token, org=org, debug=True ) _write_client = _client.write_api( write_options=WriteOptions( batch_size=500, flush_interval=10_000, jitter_interval=2_000, retry_interval=5_000, max_retries=5, max_retry_delay=30_000, exponential_base=2 ) ) _write_client.write( bucket=bucket, org=org, record = df.head(500), data_frame_measurement_name = 'wind-data-df', data_frame_tag_columns=['Status', 'filename', 'Test'] ) print("Done!")
And then for the upload section I used the ‘ID’ value (not the name) from the website to link my bucket, organization, and token, and used the google cloud in Belgium, I think it was.
I’ve set the
object values to tags (though I’m not entirely sure what that means, I’m guessing an easier way to filter data), the
Date_Time is set to the
index, and everything else should be a float. My current understanding is that ‘String’ values need to be labelled as
.tag() and numericals are labelled as
_write_client.write() I’ve set
df.head(500) which should only upload the top 500 values instead of the 110,327 values from my original dataframe. I did attempted to upload all the data at once before hand.
It all seems to upload fine but after closing the connection it seems to have timeout issues because I’ve exceeded my limited_write plan limit? (I am not sure what this is either.)
When I look into my bucket online it is still empty… so what I’d like to know is:
- Where did the data go?
- And how can I upload data to show to myself it’s working/I ‘understand’ how to use the InfluxDB Python/Pandas API?
Running the Debug got the response (formatted to read easier) of:
send: b'POST /api/v2/write? org=My org ID& bucket=My bucket ID& precision=ns HTTP/1.1\r\n Host: europe-west1-1.gcp.cloud2.influxdata.com\r\n Accept-Encoding: identity\r\n Content-Length: 60216\r\n Content-Encoding: identity\r\n Content-Type: text/plain\r\n Accept: application/json\r\n Authorization: Token My bucket token\r\n User-Agent: influxdb-client-python/1.16.0\r\n\r\n ' send: b'wind-data-df,<Insert my data here>' reply: 'HTTP/1.1 204 No Content\r\n' header: Date: Wed, 07 Apr 2021 15:38:15 GMT header: Connection: keep-alive header: trace-id: 914c35c96b69d1cf header: trace-sampled: false header: Strict-Transport-Security: max-age=15724800; includeSubDomains
Other questions would be:
- Am I correct in thinking that string/object values should be listed as a
- Should data appear as a table on the
Data Explorerdashboard in the browser?