Index problem with Pandas dataframe and writing to the Influx API

Hi,

I have a CSV dataset that is comprised of millions of events and approximately 50 columns. I created a Pandas dataframe with the CSV file. When writing to the API, Influx requires that the timestamp be set as the index. However, the timestamp sometimes is the same value for two different events. When this is set as the index, the second row with the same timestamp has its timestamp removed from the index (it seems I have no choice here in the pandas library). The Influx API then refuses to accept the data because there is one row with an empty index.

Given that it is possible that the timestamp values could be the same, is it possible to have a workaround on the Influx API?

Many thanks,
Matthew.

Hi all,

for anyone’s interest, I have found a workaround. I created a new column in the dataframe that is a random number between 1 and 1,000. Then converted the timestamp column (the column that had duplicate value) into an epoch value in microsecond format. As this timestamp was originally in millisecond format, adding a random value only affects the microsecond value. Then I converted this back into a timestamp.

Here is the Python code. The format of the timestamp column is like this: YYYYmmDDHHMMSSsss+/-HHMM where the second HHMM is the time zone, e.g. -0300 for New York, or +0200 for Helsinki.

time_format = ‘%Y%m%d%H%M%S%f’
milli_to_micro_second_converter = 1000.0
df = df.join(df[timestamp_column].str.split(‘+’, expand = True).rename(columns={0:‘dateTimestamp’, 1:‘timezoneHour’}))
df[‘TimeStamp’] = pd.to_datetime(df[‘dateTimestamp’], format = time_format)
df[‘DateTimeStampDelta’] = (df[‘TimeStamp’] - dt.datetime(1970, 1, 1)).dt.total_seconds().astype(‘int64’)
df[‘DateTimeStampDeltaMsec’] = df[‘DateTimeStampDelta’] * 1000
df[‘DateTimeStampStr’] = df[‘DateTimeStamp’].astype(‘str’)
df = df.join(df[‘DateTimeStampStr’].str.split(‘.’, expand = True).rename(columns = {0:‘hms’, 1:‘msec’}))
df[‘msec_int’] = df[‘msec’].astype(‘int64’)
df[‘deltamsec_msec’] = df[‘DateTimeStampDeltaMsec’] + df[‘msec_int’]
df[‘deltamsec_musec’] = df[‘deltamsec_msec’] * milli_to_micro_second_converter
df[‘random’] = np.random.randint(0, 1000, df.shape[0])
df[‘deltamsec_musec_mod_by_random_int’] = df[‘random’] + df[‘deltamsec_musec’