Hey, first time poster here - I’ve never used Influx before but someone requested that I upload some data to a database here.
I generally work in Python’s Pandas to clean data and now I’m trying to upload said data. The data is stored in a dataframe named df
with some of the columns shown in the image at the bottom. There are null elements set as NaN
.
The start of my code is:
import pandas as pd
df = pd.read_csv("data.csv")
df.info()
which reads through the csvs in a folder and appends them to a dataframe. The df.info()
returns
<class 'pandas.core.frame.DataFrame'>
Int64Index: 110327 entries, 0 to 12335
Data columns (total 29 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date_Time 110327 non-null datetime64[ns]
1 Status 28610 non-null object
2 RPM 110267 non-null float64
3 Wind Speed m/s 110267 non-null float64
4 Tension g 110267 non-null float64
5 filename 110327 non-null object
6 Test 110327 non-null object
dtypes: datetime64[ns](1), float64(3), object(3)
memory usage: 25.3+ MB
After doing a bit of reading and trial and error and looking at the git repository for the Python documentation I think I’ve written a script the convert the datetime into the ciso8601
format to try and solve the issue of Python’s date time vs the C interpreter’s for datetime using the code:
from pytz import UTC
import ciso8601
from influxdb_client.client.write.point import EPOCH
def convert_time(time):
converted_time = int((UTC.localize(ciso8601.parse_datetime(time.strftime('%Y-%m-%dT%H:%M:%S.%f'))) - EPOCH).total_seconds())
return converted_time
df['Date_Time'] = df['Date_Time'].apply(lambda x: convert_time(x))
df.set_index('Date_Time', inplace=True)
df.head()
And df.head()
returns a dataframe that looks like:
Date_Time (index) | Status | RPM | Wind Speed m/s | Tension g | filename | Test |
---|---|---|---|---|---|---|
int(1523356476) |
string |
float |
float |
float |
string |
string |
where there may be NaN
s in place of null values.
The code for uploading the data to InfluxDB is:
import influxdb_client
from influxdb_client.client.write_api import SYNCHRONOUS
from influxdb_client import InfluxDBClient, WriteOptions
# API Credenntials
bucket = "My bucket ID"
org = "My org ID"
token = "My bucket token"
url="https://europe-west1-1.gcp.cloud2.influxdata.com/"
_client = influxdb_client.InfluxDBClient(
url=url,
token=token,
org=org,
debug=True
)
_write_client = _client.write_api(
write_options=WriteOptions(
batch_size=500,
flush_interval=10_000,
jitter_interval=2_000,
retry_interval=5_000,
max_retries=5,
max_retry_delay=30_000,
exponential_base=2
)
)
_write_client.write(
bucket=bucket,
org=org,
record = df.head(500),
data_frame_measurement_name = 'wind-data-df',
data_frame_tag_columns=['Status', 'filename', 'Test']
)
print("Done!")
And then for the upload section I used the ‘ID’ value (not the name) from the website to link my bucket, organization, and token, and used the google cloud in Belgium, I think it was.
I’ve set the String
/object
values to tags (though I’m not entirely sure what that means, I’m guessing an easier way to filter data), the Date_Time
is set to the index
, and everything else should be a float. My current understanding is that ‘String’ values need to be labelled as .tag()
and numericals are labelled as .field()
.
In _write_client.write()
I’ve set df.head(500)
which should only upload the top 500 values instead of the 110,327 values from my original dataframe. I did attempted to upload all the data at once before hand.
It all seems to upload fine but after closing the connection it seems to have timeout issues because I’ve exceeded my limited_write plan limit? (I am not sure what this is either.)
When I look into my bucket online it is still empty… so what I’d like to know is:
- Where did the data go?
- And how can I upload data to show to myself it’s working/I ‘understand’ how to use the InfluxDB Python/Pandas API?
Running the Debug got the response (formatted to read easier) of:
send:
b'POST /api/v2/write?
org=My org ID&
bucket=My bucket ID&
precision=ns HTTP/1.1\r\n
Host: europe-west1-1.gcp.cloud2.influxdata.com\r\n
Accept-Encoding: identity\r\n
Content-Length: 60216\r\n
Content-Encoding: identity\r\n
Content-Type: text/plain\r\n
Accept: application/json\r\n
Authorization: Token My bucket token\r\n
User-Agent: influxdb-client-python/1.16.0\r\n\r\n
'
send: b'wind-data-df,<Insert my data here>'
reply: 'HTTP/1.1 204 No Content\r\n'
header: Date: Wed, 07 Apr 2021 15:38:15 GMT
header: Connection: keep-alive
header: trace-id: 914c35c96b69d1cf
header: trace-sampled: false
header: Strict-Transport-Security: max-age=15724800; includeSubDomains
Other questions would be:
- Am I correct in thinking that string/object values should be listed as a
tag_column
? - Should data appear as a table on the
Data Explorer
dashboard in the browser?