Insert XML data to InfluxDB

influxdb
#1

Hello,

I’m working as freelancer and am alone out here, i don’t control the flow or the format of the times series data i receive.
And they all are in XML for now. I checked with Telegraf and there doesn’t seem to be any XML input module. I was wondering if that was a current project ? Because i really can’t find any easy way to collect those XML and insert their data in InfluxDB.

I’m currently trying to use Node.JS to parse these XML and insert them into influxDB through the API, but this feels so wrong… Isn’t there a better way ? Also the writing API of influxDB doesn’t say if it’s best to insert 1000x1 row, 1x1000 row, or 10x100 row. I mean, the doc does specify that if you write multiple lines at a time, it will be faster than if you write it line by line, but at some point if i insert 50K lines with one HTTP post request, isn’t it going to be a problem ?

By the way, I’m totally new toinfluxData and time series, I have checked the documentation and the web, so far i havn’t find any solution.

#2

We have an issue open around XML support, can you review the proposal to see if it would work for you?

#3

I don’t think so. When i say i don’t have the control over the XML, i mean that i have ZERO control over it.
I can’t even change a header or a tag in my XML. It’s fully locked.

If by " the proposal ", you mean, use HTTP JSON API, yeah it would work, i’m already having JSON right now to avoid the pain of XML parsing. Problem is, what was great in XML is shitty in JSON, and i still have the headers i have no idea how i will handle.

So basically, it would work, but i would have to do like 4 - 5 JSON per XML (Because of tag structure, who are sometimes actually different measurements in influxDB), forget the XML headers info, etc…
Would be faster to directly write a series of point in the native HTTP API. But thanks for the proposal.

Now, the issue is dated a year ago, and my “boss” actually picked XML because it’s the most freaking standard file type, how can it not be supported ? Do you realize the number of applications that use XML ?
Don’t get me wrong, you seem to do a wonderfull work and influxDB seems just GREAT. I just don’t understand how what a lot of people consider the most standard file type / format, isn’t supported at all.
And, as small and stupid as i am, is there anything i can do to help you make it possible ?

#4

The proposal I was referring to is the one using xpath described around here and refined a little afterwards. You would essentially write a list of xpath queries for the fields and tags you want to collect. Since it is a “parser” it could be used in any plugin that has the data_format option, not only HTTP plugins. I think this will allow you to parse XML that is using any structure.

is there anything i can do to help you make it possible

The best thing would be to contact sales

#5

I’ll look into that and come back. Thanks for your answers !

#6

So i contacted sales, they asked me if i posted here, if the community didn’t have a magic solution to me.
They told me most plugins are community driven and that you’re my best shot.
Also, they don’t have much request for XML input, so they were quite surprised by my email (actually they told me i was the first to contact them with XML request …). They also suggested me to develop my own plugin in GO to make XML insertion avalaible.

I’m not familiar at all with xpath, i’m all new to this, i just need a simple & quick way to write data inside influxDB …
I actually made JSON out of my XML, and i can now retrieve the data with node.js , but the node-js client is outdated since 0.9 as i said in another post.

Do you have a magic solution to insert data with JS ?
I found a small library on github, it’s just very very basic, but might do the trick after all …

Also i’m wondering, if by any chance i want to develop a plugin (probably not in GO but in JS), what should i know ?
Where do i find all the intel i need to make this work ?
For example, the library doesn’t allow to set any unit or setting for time precision. This is something i have been looking for days now… How do you set a proper time precision setting ? And where ? Is it a configuration file / variable ? Is it in line protocol ? In measurement ?

Thanks for your answer and leads anyway friend !

#7

The best way to interface with JS if using the exec input plugin. If you go this route I highly suggest having your script output in line protocol, as this is the best supported format. The other formats may not support everything Telegraf can do, for instance the json format doesn’t support setting the time precision.

I think node-influx is the most popular client for InfluxDB, if you go this route you can either send directly to InfluxDB, bypassing Telegraf, or send to a Telegraf http_listener input. If you take the exec plugin route you can just output in one of the supported data input formats.

If you eventually decide you want to write a plugin in Go, the method is documented in the contributing guide.

Let me know if you have any more questions.

#8

Thank you again.

I looked for influx client and found there was one in JS, outdated since 0.9, then i found a plugin in a post of the issues.
Can’t believe i didn’t find this one … This will be of great help !