Observability
Orisun exposes three operational signals: distributed traces, structured logs, and on-demand profiles.
Tracing
Orisun emits OpenTelemetry traces and exports them over OTLP gRPC. This is the primary signal for understanding request flow and latency across the save, query, and publish paths.
| Variable | Default | Description |
|---|---|---|
ORISUN_OTEL_ENABLED | true | Enable OpenTelemetry tracing. |
ORISUN_OTEL_ENDPOINT | localhost:4317 | OTLP gRPC collector endpoint. |
Point ORISUN_OTEL_ENDPOINT at any OTLP-compatible collector (the OpenTelemetry Collector, Tempo, Jaeger with OTLP, Honeycomb, etc.). Traces are tagged with the service name so multiple nodes are distinguishable.
Orisun currently exports traces, not a built-in metrics endpoint. For request rates and latencies, derive them from spans in your tracing backend, or scrape the Go runtime via pprof. There is no Prometheus /metrics endpoint to scrape.
Logging
Logs are structured and leveled. Set the level with ORISUN_LOGGING_LEVEL (DEBUG, INFO, WARN, ERROR; default INFO).
Operationally useful log lines include:
- effective
GOMAXPROCS,GOMEMLIMIT, andGOGCat startup, - publisher boundary-lock acquisition and contention in clustered deployments,
- publisher checkpoint progress and any publish errors.
Use DEBUG to trace authentication, subscription handover, and per-call detail; keep INFO or higher in production.
Profiling
The Go pprof server is available for CPU, heap, and goroutine profiling when you need to investigate a performance issue.
| Variable | Default | Description |
|---|---|---|
ORISUN_PPROF_ENABLED | false | Enable the pprof HTTP server. |
ORISUN_PPROF_PORT | 6060 | pprof listen port. |
go tool pprof http://localhost:6060/debug/pprof/heap
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
Leave pprof disabled in production unless you are actively profiling, and never expose its port publicly.
What to watch
- Publisher lag — the gap between committed and published positions. Investigate with the Troubleshooting steps.
- Catch-up vs live — subscribers repeatedly catching up suggest the JetStream retention window is too small for their pace. See Delivery Guarantees.
- Consistency conflicts — a high
ALREADY_EXISTSrate is a domain hotspot, not an error. Narrow the consistency context or add an index.