Skip to main content

Observability

Orisun exposes three operational signals: distributed traces, structured logs, and on-demand profiles.

Tracing

Orisun emits OpenTelemetry traces and exports them over OTLP gRPC. This is the primary signal for understanding request flow and latency across the save, query, and publish paths.

VariableDefaultDescription
ORISUN_OTEL_ENABLEDtrueEnable OpenTelemetry tracing.
ORISUN_OTEL_ENDPOINTlocalhost:4317OTLP gRPC collector endpoint.

Point ORISUN_OTEL_ENDPOINT at any OTLP-compatible collector (the OpenTelemetry Collector, Tempo, Jaeger with OTLP, Honeycomb, etc.). Traces are tagged with the service name so multiple nodes are distinguishable.

note

Orisun currently exports traces, not a built-in metrics endpoint. For request rates and latencies, derive them from spans in your tracing backend, or scrape the Go runtime via pprof. There is no Prometheus /metrics endpoint to scrape.

Logging

Logs are structured and leveled. Set the level with ORISUN_LOGGING_LEVEL (DEBUG, INFO, WARN, ERROR; default INFO).

Operationally useful log lines include:

  • effective GOMAXPROCS, GOMEMLIMIT, and GOGC at startup,
  • publisher boundary-lock acquisition and contention in clustered deployments,
  • publisher checkpoint progress and any publish errors.

Use DEBUG to trace authentication, subscription handover, and per-call detail; keep INFO or higher in production.

Profiling

The Go pprof server is available for CPU, heap, and goroutine profiling when you need to investigate a performance issue.

VariableDefaultDescription
ORISUN_PPROF_ENABLEDfalseEnable the pprof HTTP server.
ORISUN_PPROF_PORT6060pprof listen port.
go tool pprof http://localhost:6060/debug/pprof/heap
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

Leave pprof disabled in production unless you are actively profiling, and never expose its port publicly.

What to watch

  • Publisher lag — the gap between committed and published positions. Investigate with the Troubleshooting steps.
  • Catch-up vs live — subscribers repeatedly catching up suggest the JetStream retention window is too small for their pace. See Delivery Guarantees.
  • Consistency conflicts — a high ALREADY_EXISTS rate is a domain hotspot, not an error. Narrow the consistency context or add an index.