Skip to content

Conversation

@mahendrabishnoi2
Copy link
Member

@mahendrabishnoi2 mahendrabishnoi2 commented Sep 17, 2025

Fixes #7004

This PR adds support for experimental otel.sdk.processor.span.processed metric in simple span processor.
Definition of metric at: https://github.com/open-telemetry/semantic-conventions/blob/v1.36.0/docs/otel/sdk-metrics.md

Experimental metrics are behind a feature flag: OTEL_GO_X_OBSERVABILITY

Observability Implementation Checklist

Observability Implementation Checklist

Based on the project Observability guidelines, ensure the following are completed:

Environment Variable Activation

  • Observability features are disabled by default
  • Features are activated through the OTEL_GO_X_OBSERVABILITY environment variable
  • Use consistent pattern with x.Observability.Enabled() check 1
  • Follow established experimental feature pattern 23

Encapsulation

  • Instrumentation is encapsulated within a dedicated struct (e.g., Instrumentation)
  • Instrumentation is not mixed into the instrumented component
  • Instrumentation code is in its own file or package if complex/reused
  • Instrumentation setup doesn't bloat the main component code

Initialization

  • Initialization is only done when observability is enabled
  • Setup is explicit and side-effect free
  • Return errors from initialization when appropriate
  • Use the global Meter provider (e.g., otel.GetMeterProvider())
  • Include proper meter configuration with:
    • Instrumentation package name is used for the Meter
    • Instrumentation version (e.g. Version)
    • Schema URL (e.g. SchemaURL)

Performance

  • Little to no overhead when observability is disabled
  • Expensive operations are only executed when observability is enabled
  • When enabled, instrumentation code paths are optimized to reduce allocation/computation overhead

Attribute and Option Allocation Management

  • Use sync.Pool for attribute slices and options with dynamic attributes
  • Pool objects are properly reset before returning to pool
  • Pools are scoped for maximum efficiency while ensuring correctness

Caching

  • Static attribute sets known at compile time are pre-computed and cached
  • Common attribute combinations use lookup tables/maps

Benchmarking

  • Benchmarks provided for all instrumentation code
  • Benchmark scenarios include both enabled and disabled observability
  • Benchmark results show impact on allocs/op, B/op, and ns/op (use b.ReportAllocs() in benchmarks)

Error Handling and Robustness

  • Errors are reported back to caller when possible
  • Partial failures are handled gracefully
  • Use partially initialized components when available
  • Return errors to caller instead of only using otel.Handle()
  • Use otel.Handle() only when component cannot report error to user

Context Propagation

  • Observability measurements receive the context from the function being measured (don't break context propagation by using context.Background())

Semantic Conventions Compliance

  • All metrics follow OpenTelemetry Semantic Conventions
  • Use the otelconv convenience package for metric semantic conventions
  • Component names follow semantic conventions
  • Use package path scope type as stable identifier for non-standard components
  • Component names are stable unique identifiers
  • Use global counter for uniqueness if necessary
  • Component ID counter is resettable for deterministic testing

Testing

  • Use deterministic testing with isolated state
  • Restore previous state after tests (t.Cleanup())
  • Isolate meter provider for testing
  • Use t.Setenv() for environment variable testing
  • Reset component ID counter for deterministic component names
  • Test order doesn't affect results

Benchmarks

> benchstat bmark.result
goos: darwin
goarch: arm64
pkg: go.opentelemetry.io/otel/sdk/trace/internal/observ
cpu: Apple M1 Pro
                             │ bmark.result │
                             │    sec/op    │
SSP/SpanProcessed-8            146.7n ± 15%
SSP/SpanProcessedWithError-8   205.1n ±  3%
geomean                        173.5n

                             │ bmark.result │
                             │     B/op     │
SSP/SpanProcessed-8              280.0 ± 0%
SSP/SpanProcessedWithError-8     408.0 ± 0%
geomean                          338.0

                             │ bmark.result │
                             │  allocs/op   │
SSP/SpanProcessed-8              3.000 ± 0%
SSP/SpanProcessedWithError-8     3.000 ± 0%
geomean                          3.000

Footnotes

  1. https://github.com/open-telemetry/opentelemetry-go/blob/e4ab3141123d0811125a69823dbbe4d9ec5a9b8f/exporters/stdout/stdouttrace/internal/observ/instrumentation.go#L101-L103

  2. https://github.com/open-telemetry/opentelemetry-go/blob/e4ab3141123d0811125a69823dbbe4d9ec5a9b8f/exporters/stdout/stdouttrace/internal/x/x.go

  3. https://github.com/open-telemetry/opentelemetry-go/blob/e4ab3141123d0811125a69823dbbe4d9ec5a9b8f/sdk/internal/x/x.go

@codecov
Copy link

codecov bot commented Sep 17, 2025

Codecov Report

❌ Patch coverage is 95.58824% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.3%. Comparing base (2e5fdd1) to head (3a89a59).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
sdk/trace/simple_span_processor.go 82.3% 2 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@          Coverage Diff          @@
##            main   #7374   +/-   ##
=====================================
  Coverage   86.3%   86.3%           
=====================================
  Files        294     295    +1     
  Lines      25987   26054   +67     
=====================================
+ Hits       22427   22495   +68     
+ Misses      3187    3186    -1     
  Partials     373     373           
Files with missing lines Coverage Δ
sdk/trace/internal/observ/simple_span_processor.go 100.0% <100.0%> (ø)
sdk/trace/simple_span_processor.go 80.8% <82.3%> (+5.3%) ⬆️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mahendrabishnoi2 mahendrabishnoi2 marked this pull request as ready for review September 27, 2025 08:04
@MrAlias
Copy link
Contributor

MrAlias commented Oct 8, 2025

@mahendrabishnoi2 based your results:

> benchstat bmark.result
goos: darwin
goarch: arm64
pkg: go.opentelemetry.io/otel/sdk/trace/internal/observ
cpu: Apple M1 Pro
                             │ bmark.result │
                             │    sec/op    │
SSP/SpanProcessed-8            146.7n ± 15%
SSP/SpanProcessedWithError-8   205.1n ±  3%
geomean                        173.5n

                             │ bmark.result │
                             │     B/op     │
SSP/SpanProcessed-8              280.0 ± 0%
SSP/SpanProcessedWithError-8     408.0 ± 0%
geomean                          338.0

                             │ bmark.result │
                             │  allocs/op   │
SSP/SpanProcessed-8              3.000 ± 0%
SSP/SpanProcessedWithError-8     3.000 ± 0%
geomean                          3.000

@mahendrabishnoi2
Copy link
Member Author

My bad, I forgot to run the results through benchstat.

Copy link
Contributor

@MrAlias MrAlias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good. The memory management of attributes during the measurement needs to be optimized though.

flc1125

This comment was marked as outdated.

@mahendrabishnoi2

This comment was marked as outdated.

@mahendrabishnoi2
Copy link
Member Author

Updated benchmarks
0 allocation and 90% decrease in op/s for happy path (without error)

ssp_3.txt
goos: darwin
goarch: arm64
pkg: go.opentelemetry.io/otel/sdk/trace/internal/observ
cpu: Apple M1 Pro
BenchmarkSSP/SpanProcessed-8         	1000000000	         0.6945 ns/op	       0 B/op	       0 allocs/op
BenchmarkSSP/SpanProcessed-8         	1000000000	         0.9358 ns/op	       0 B/op	       0 allocs/op
BenchmarkSSP/SpanProcessed-8         	1000000000	         0.7210 ns/op	       0 B/op	       0 allocs/op
BenchmarkSSP/SpanProcessed-8         	1000000000	         0.7109 ns/op	       0 B/op	       0 allocs/op
BenchmarkSSP/SpanProcessed-8         	1000000000	         0.6756 ns/op	       0 B/op	       0 allocs/op
BenchmarkSSP/SpanProcessed-8         	1000000000	         0.6617 ns/op	       0 B/op	       0 allocs/op
BenchmarkSSP/SpanProcessed-8         	1000000000	         0.6618 ns/op	       0 B/op	       0 allocs/op
BenchmarkSSP/SpanProcessed-8         	1000000000	         0.6620 ns/op	       0 B/op	       0 allocs/op
BenchmarkSSP/SpanProcessed-8         	1000000000	         0.6620 ns/op	       0 B/op	       0 allocs/op
BenchmarkSSP/SpanProcessed-8         	1000000000	         0.6700 ns/op	       0 B/op	       0 allocs/op
BenchmarkSSP/SpanProcessedWithError-8         	 9040986	       130.8 ns/op	     232 B/op	       3 allocs/op
BenchmarkSSP/SpanProcessedWithError-8         	 9043536	       132.3 ns/op	     232 B/op	       3 allocs/op
BenchmarkSSP/SpanProcessedWithError-8         	 9094186	       132.2 ns/op	     232 B/op	       3 allocs/op
BenchmarkSSP/SpanProcessedWithError-8         	 9013471	       135.0 ns/op	     232 B/op	       3 allocs/op
BenchmarkSSP/SpanProcessedWithError-8         	 9140390	       136.9 ns/op	     232 B/op	       3 allocs/op
BenchmarkSSP/SpanProcessedWithError-8         	 8351335	       142.4 ns/op	     232 B/op	       3 allocs/op
BenchmarkSSP/SpanProcessedWithError-8         	 9088309	       133.6 ns/op	     232 B/op	       3 allocs/op
BenchmarkSSP/SpanProcessedWithError-8         	 8878872	       133.3 ns/op	     232 B/op	       3 allocs/op
BenchmarkSSP/SpanProcessedWithError-8         	 9043981	       137.5 ns/op	     232 B/op	       3 allocs/op
BenchmarkSSP/SpanProcessedWithError-8         	 8916673	       134.7 ns/op	     232 B/op	       3 allocs/op
PASS
ok  	go.opentelemetry.io/otel/sdk/trace/internal/observ	21.455s
goos: darwin
goarch: arm64
pkg: go.opentelemetry.io/otel/sdk/trace/internal/observ
cpu: Apple M1 Pro
                             │    ssp_1.txt    │              ssp_3.txt               │
                             │     sec/op      │    sec/op     vs base                │
SSP/SpanProcessed-8            146.7000n ± 15%   0.6728n ± 7%  -99.54% (p=0.000 n=10)
SSP/SpanProcessedWithError-8      205.1n ±  3%    134.2n ± 2%  -34.59% (p=0.000 n=10)
geomean                           173.5n          9.500n       -94.52%

                             │ ssp_1.txt  │                ssp_3.txt                │
                             │    B/op    │    B/op     vs base                     │
SSP/SpanProcessed-8            280.0 ± 0%     0.0 ± 0%  -100.00% (p=0.000 n=10)
SSP/SpanProcessedWithError-8   408.0 ± 0%   232.0 ± 0%   -43.14% (p=0.000 n=10)
geomean                        338.0                    ?                       ¹ ²
¹ summaries must be >0 to compute geomean
² ratios must be >0 to compute geomean

                             │ ssp_1.txt  │                ssp_3.txt                │
                             │ allocs/op  │ allocs/op   vs base                     │
SSP/SpanProcessed-8            3.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)
SSP/SpanProcessedWithError-8   3.000 ± 0%   3.000 ± 0%         ~ (p=1.000 n=10) ¹
geomean                        3.000                    ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

@dmathieu dmathieu requested a review from flc1125 October 13, 2025 07:34
@MrAlias MrAlias added this to the v1.39.0 milestone Oct 16, 2025
@MrAlias
Copy link
Contributor

MrAlias commented Oct 17, 2025

@mahendrabishnoi2 this looks ready to merge. Please take a look at the conflicts and resolve them. sdk/trace/internal/x/README.md was moved to sdk/internal/x/README.md.

@MrAlias MrAlias merged commit f7d2882 into open-telemetry:main Oct 17, 2025
33 checks passed
@MrAlias MrAlias mentioned this pull request Dec 5, 2025
MrAlias added a commit that referenced this pull request Dec 8, 2025
## Overview

### Added

- Greatly reduce the cost of recording metrics in
`go.opentelemetry.io/otel/sdk/metric` using hashing for map keys.
(#7175)
- Add `WithInstrumentationAttributeSet` option to
`go.opentelemetry.io/otel/log`, `go.opentelemetry.io/otel/metric`, and
`go.opentelemetry.io/otel/trace` packages. This provides a
concurrent-safe and performant alternative to
`WithInstrumentationAttributes` by accepting a pre-constructed
`attribute.Set`. (#7287)
- Add experimental observability for the Prometheus exporter in
`go.opentelemetry.io/otel/exporters/prometheus`. Check the
`go.opentelemetry.io/otel/exporters/prometheus/internal/x` package
documentation for more information. (#7345)
- Add experimental observability metrics in
`go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc`. (#7353)
- Add temporality selector functions `DeltaTemporalitySelector`,
`CumulativeTemporalitySelector`, `LowMemoryTemporalitySelector` to
`go.opentelemetry.io/otel/sdk/metric`. (#7434)
- Add experimental observability metrics for simple log processor in
`go.opentelemetry.io/otel/sdk/log`. (#7548)
- Add experimental observability metrics in
`go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc`.
(#7459)
- Add experimental observability metrics in
`go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp`.
(#7486)
- Add experimental observability metrics for simple span processor in
`go.opentelemetry.io/otel/sdk/trace`. (#7374)
- Add experimental observability metrics in
`go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp`. (#7512)
- Add experimental observability metrics for manual reader in
`go.opentelemetry.io/otel/sdk/metric`. (#7524)
- Add experimental observability metrics for periodic reader in
`go.opentelemetry.io/otel/sdk/metric`. (#7571)
- Support `OTEL_EXPORTER_OTLP_LOGS_INSECURE` and
`OTEL_EXPORTER_OTLP_INSECURE` environmental variables in
`go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp`. (#7608)
- Add `Enabled` method to the `Processor` interface in
`go.opentelemetry.io/otel/sdk/log`. All `Processor` implementations now
include an `Enabled` method. (#7639)
- The `go.opentelemetry.io/otel/semconv/v1.38.0` package. The package
contains semantic conventions from the `v1.38.0` version of the
OpenTelemetry Semantic Conventions. See the [migration
documentation](./semconv/v1.38.0/MIGRATION.md) for information on how to
upgrade from `go.opentelemetry.io/otel/semconv/v1.37.0.`(#7648)

### Changed

- `Distinct` in `go.opentelemetry.io/otel/attribute` is no longer
guaranteed to uniquely identify an attribute set. Collisions between
`Distinct` values for different Sets are possible with extremely high
cardinality (billions of series per instrument), but are highly
unlikely. (#7175)
- `WithInstrumentationAttributes` in `go.opentelemetry.io/otel/trace`
synchronously de-duplicates the passed attributes instead of delegating
it to the returned `TracerOption`. (#7266)
- `WithInstrumentationAttributes` in `go.opentelemetry.io/otel/meter`
synchronously de-duplicates the passed attributes instead of delegating
it to the returned `MeterOption`. (#7266)
- `WithInstrumentationAttributes` in `go.opentelemetry.io/otel/log`
synchronously de-duplicates the passed attributes instead of delegating
it to the returned `LoggerOption`. (#7266)
- Rename the `OTEL_GO_X_SELF_OBSERVABILITY` environment variable to
`OTEL_GO_X_OBSERVABILITY` in `go.opentelemetry.io/otel/sdk/trace`,
`go.opentelemetry.io/otel/sdk/log`, and
`go.opentelemetry.io/otel/exporters/stdout/stdouttrace`. (#7302)
- Improve performance of histogram `Record` in
`go.opentelemetry.io/otel/sdk/metric` when min and max are disabled
using `NoMinMax`. (#7306)
- Improve error handling for dropped data during translation by using
`prometheus.NewInvalidMetric` in
`go.opentelemetry.io/otel/exporters/prometheus`. ⚠️ **Breaking Change:**
Previously, these cases were only logged and scrapes succeeded. Now,
when translation would drop data (e.g., invalid label/value), the
exporter emits a `NewInvalidMetric`, and Prometheus scrapes **fail with
HTTP 500** by default. To preserve the prior behavior (scrapes succeed
while errors are logged), configure your Prometheus HTTP handler with:
`promhttp.HandlerOpts{ ErrorHandling: promhttp.ContinueOnError }`.
(#7363)
- Replace fnv hash with xxhash in `go.opentelemetry.io/otel/attribute`
for better performance. (#7371)
- The default `TranslationStrategy` in
`go.opentelemetry.io/exporters/prometheus` is changed from
`otlptranslator.NoUTF8EscapingWithSuffixes` to
`otlptranslator.UnderscoreEscapingWithSuffixes`. (#7421)
- Improve performance of concurrent measurements in
`go.opentelemetry.io/otel/sdk/metric`. (#7427)
- Include W3C TraceFlags (bits 0–7) in the OTLP `Span.Flags` field in
`go.opentelemetry.io/exporters/otlp/otlptrace/otlptracehttp` and
`go.opentelemetry.io/exporters/otlp/otlptrace/otlptracegrpc`. (#7438)
- The `ErrorType` function in `go.opentelemetry.io/otel/semconv/v1.37.0`
now handles custom error types.
If an error implements an `ErrorType() string` method, the return value
of that method will be used as the error type. (#7442)

### Fixed

- Fix `WithInstrumentationAttributes` options in
`go.opentelemetry.io/otel/trace`, `go.opentelemetry.io/otel/metric`, and
`go.opentelemetry.io/otel/log` to properly merge attributes when passed
multiple times instead of replacing them. Attributes with duplicate keys
will use the last value passed. (#7300)
- The equality of `attribute.Set` when using the `Equal` method is not
affected by the user overriding the empty set pointed to by
`attribute.EmptySet` in `go.opentelemetry.io/otel/attribute`. (#7357)
- Return partial OTLP export errors to the caller in
`go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc`. (#7372)
- Return partial OTLP export errors to the caller in
`go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp`. (#7372)
- Return partial OTLP export errors to the caller in
`go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc`.
(#7372)
- Return partial OTLP export errors to the caller in
`go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp`.
(#7372)
- Return partial OTLP export errors to the caller in
`go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc`.
(#7372)
- Return partial OTLP export errors to the caller in
`go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp`.
(#7372)
- Fix `AddAttributes`, `SetAttributes`, `SetBody` on `Record` in
`go.opentelemetry.io/otel/sdk/log` to not mutate input. (#7403)
- Do not double record measurements of `RecordSet` methods in
`go.opentelemetry.io/otel/semconv/v1.37.0`. (#7655)
- Do not double record measurements of `RecordSet` methods in
`go.opentelemetry.io/otel/semconv/v1.36.0`. (#7656)

### Removed

- Drop support for [Go 1.23]. (#7274)
- Remove the `FilterProcessor` interface in
`go.opentelemetry.io/otel/sdk/log`. The `Enabled` method has been added
to the `Processor` interface instead. All `Processor` implementations
must now implement the `Enabled` method. Custom processors that do not
filter records can implement `Enabled` to return `true`. (#7639)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Trace SDK observability - simple processor metrics

4 participants