When is a span considered the final span in a trace?

amukamal · June 17, 2019, 7:13pm

I am trying to understand how the Telegraf plug-in works. If multiple applications are sending Span events for a given trace, at what point does Telegraf consider the trace complete? Does a span need to identify itself as the final span in a trace? Or is there a time factor (i.e., no spans after a given time elapses means the trace is complete)?
Am I fundamentally missing something? I found no indication on how a trace is ended on any web searches.

daniel · June 17, 2019, 9:44pm

Are you using the zipkin plugin? A trace doesn’t have a concept of being complete, it’s just a collection of spans within a time range.

amukamal · June 17, 2019, 11:09pm

Thanks Daniel. I’m not using anything yet (just influxdb). I thought an end span to the trace is needed for the trace to be summarized, but I guess the trace is just stored as a collection of spans with a unifying trace id (or rolling up to the same parent span). If that is the case, I am wondering if running analytics on the trace can be performant - such as: show me the 95 percentile latency of a particular trace throughout the day, when the trace occurs 0.5 - 1 million times a day.

gianarb · June 18, 2019, 8:16am

Hello! Daniel is right. There is no concept of “end” inside a trace. You can keep adding things to a trace event after a year if it is still stored.
You can create an ‘end’ span if you think that tracking it can be useful.

I am wondering if running analytics on the trace can be performant

You need to remember that a trace is just a set of points with a different correlation, usually trace_id to correlate spans between each other and span_id to identify a single span — nothing more than a point with the concept of duration. Performing aggregation and calculation on them can be very useful. we do things like "how much time service-a spends speaking with service-b.

amukamal · June 18, 2019, 10:29am

This is the use case I am trying to solve:
Let’s say a set of 10 services creates 20-30 unique traces. By unique I mean a combination of spans and presence of a tag (or tags) within a span. Let’s say there are one million such traces per day - each one classifiable as one of these unique traces. How feasible is it to expect a front end to graph the latency of a particular (unique) trace (end to end) or to set an SLO (real time alert) on the trace?

While the information in the span is important, my first level problem is to understand the end-to-end trace as a composite event / KPI and set SLOs on them.
I appreciate the response. Thank you, Alan

gianarb · June 18, 2019, 12:56pm

I think there is not a lot more to say other than trying. We use https://github.com/influxdata/jaeger-store to store traces in InfluxDB, our number are different and Jaeger offers sampling as well if you would like to exclude a % of traces.

When you will have a set of points in influxdb you can use the normal features provided by the stack such as continuous queries to do the SLO part.

amukamal · June 18, 2019, 12:57pm

Thanks! We certainly will try…

Topic		Replies	Views
OpenTracing: An Open Standard for Distributed Tracing	0	1319	October 26, 2017
Telegraf Final Aggregator Metric is outside of aggregation window Telegraf	1	1197	February 19, 2021
Confused by `last()` InfluxDB 2 chronograf	7	4308	December 17, 2020
Better Explanation of Telegraf Internal Stats telegraf	2	1677	January 26, 2018
Select disctinct and count with flux InfluxDB 2 flux	2	719	April 4, 2022

When is a span considered the final span in a trace?

Related topics