Introduction
In OpenTelemetry, spans are the building blocks of distributed tracing. They represent a single operation within a trace and can have various attributes (labels) associated with them. Sometimes, you may need to modify these attributes such as adding extra attribute or even removing them. In this post, we will explore how to modify span attributes in OpenTelemetry.
Motivation
I was recently working on adding OpenTelemetry to a Python ETL application. The application was fetching objects from an S3 bucket, and I wanted to modify the span attributes to include the bucket name in the span. As the application was using the boto3
library, and OpenTelemetry has out of the box instrumentation for botocore
using BotocoreInstrumentor
which adds a few span attributes like cloud.region, rpc.method, rpc.service
but not the bucket name. So I had to modify the span attributes to include the bucket name.
How to add S3 bucket name to span attributes in OpenTelemetry
Let’s say we have the code below to list objects in an S3 bucket using boto3
:
def list_s3_objects(bucket_name: str, max_keys: int = 100) -> List[Dict[str, Any]]:
s3_client = boto3.client('s3')
try:
response = s3_client.list_objects_v2(Bucket=bucket_name, MaxKeys=max_keys)
if 'Contents' in response:
return response['Contents']
else:
return []
... # rest of the code
And let’s say we have already instrumented the boto3
library using OpenTelemetry’s BotocoreInstrumentor
:
from opentelemetry.instrumentation.botocore import BotocoreInstrumentor
BotocoreInstrumentor().instrument()
This will produce a span for the list_objects_v2
operation as shown below:
"spans": [
{
"operationName": "S3.ListObjectsV2",
"spanID": "a13024b2575f7873",
"tags": [
{
"key": "cloud.region",
"type": "string",
"value": "us-west-2"
},
{
"key": "http.status_code",
"type": "int64",
"value": 200
},
{
"key": "rpc.method",
"type": "string",
"value": "ListObjectsV2"
},
{
"key": "rpc.service",
"type": "string",
"value": "S3"
},
{
"key": "rpc.system",
"type": "string",
"value": "aws-api"
},
{
"key": "server.address",
"type": "string",
"value": "s3.us-west-2.amazonaws.com"
}
],
"traceID": "4ba03977861af9541cacc87f559d03be",
}
],
...
As we didn’t create this span manually, we can modify the span using two methods. I used span.name to identify the span, because all S3 spans start with “S3.” like S3.ListObjectsV2, S3.GetObject, etc.
Modifying span attributes using SpanProcessor
To modify the span attributes, we can create a custom SpanProcessor
. This processor will update the span once its created and updates its attributes.
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.instrumentation.botocore import BotocoreInstrumentor
class CustomSpanProcessor(BatchSpanProcessor):
# Called when span is started
def on_start(self, span, parent_context=None):
if span.name.startswith("S3.") and api_params.get("Bucket"):
span.set_attribute("bucket.name", api_params["Bucket"])
provider = TracerProvider()
otlp_exporter = OTLPSpanExporter()
processor = CustomSpanProcessor(otlp_exporter)
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
trace.get_tracer(__name__)
Here, we created a custom CustomSpanProcessor
that inherits from BatchSpanProcessor
. In the on_start
method, we check if the span name starts with “S3.” and then set the bucket.name
attribute.
There’s also a on_end
method that can be used to modify the span after it has finished, but in this case, we only need to modify it when it starts.
Modifying span attributes using request hook
Another way to modify span attributes is by using hooks. There are two types of hooks, request_hook and response_hook. The request hook is called before the request is sent, and the response hook is called after the response is received.
from opentelemetry.instrumentation.botocore import BotocoreInstrumentor
def s3_request_instrument_hook(span, service_name, operation_name, api_params):
if span and span.is_recording() and span.name.startswith("S3.") and api_params.get("Bucket"):
span.set_attribute("bucket.name", api_params["Bucket"])
BotocoreInstrumentor().instrument(request_hook=s3_request_instrument_hook)
is_recording()
checks if the span is active, only active and required spans should be exported to reduce computation and network overhead.
Once we’ve modified the span attributes, we can see that the span now includes the bucket.name
attribute:
"spans": [
{
"operationName": "S3.ListObjectsV2",
"spanID": "a14024b2975f7873",
"tags": [
{
"key": "cloud.region",
"type": "string",
"value": "us-west-2"
},
{
"key": "http.status_code",
"type": "int64",
"value": 200
},
{
"key": "rpc.method",
"type": "string",
"value": "ListObjectsV2"
},
{
"key": "rpc.service",
"type": "string",
"value": "S3"
},
{
"key": "rpc.system",
"type": "string",
"value": "aws-api"
},
{
"key": "server.address",
"type": "string",
"value": "s3.us-west-2.amazonaws.com"
},
{
"key": "bucket.name",
"type": "string",
"value": "my-s3-bucket"
}
],
...
"traceID": "4ba03977861af9541cacc87f559d03be",
}
],
...
References
https://opentelemetry-python-kinvolk.readthedocs.io/en/latest/instrumentation/botocore/botocore.html