Details

    • Type: Story
    • Status: Resolved (View workflow)
    • Priority: Medium
    • Resolution: Done
    • Affects versions: None
    • Fix versions: 8.8.3.0
    • Components: None
    • Labels:
      None
    • Epic Link:
    • Sprint:
      Sprint 163, Sprint 164
    • Story Points:
      8
    • Capitalizable:
      True

      Description

      Based on the details in the OpenTracing PR that Dimitry submitted, we should add support for OpenTracing in Repose.

      Dimitry has provided a great prototype for this feature, but we will update and/or rewrite it to align with our intentions for the service and our best practices. We want to support OpenTracing as a fundamental part of Repose (like logging, or in the future, the metrics service). That is, OpenTracing data will always be collected, but only reported as configured.

      We will likely have to write a new GlobalTracer implementation to allow for changing the underlying Tracer at runtime. Let's call is a ReposeTracer rather than a GlobalTracer. The ReposeTracer should be wrapped by the GlobalTracer so that developers can still leverage the GlobalTracer. The GlobalTracer should also be made a Spring bean so that it can be injected. We will also likely to have write a new Interceptor to be able to use our configured header name rather than the header name used by Uber. The new configured header name will be configured in the system model (since the service is not required, but we will always trace).

      The header name used for tracking Span s will be configurable. We will default it to whatever the current "standard" header name is. The "standard" has not yet been defined. The header should be on every outbound request from Repose. This can be done by using the Interceptor mentioned above.

      Remember, the distributed datastore handles its own traffic and will need to support tracing! The Atom Feed service will also need to start its own spans.

      Background info:
      Old Wiki: Tracing
      Internal Wiki: Application Tracing
      Slack channel: #application-tracing

      Test Acceptance Criteria:

      • When Repose starts up, user should specify a tracer implementation (e.g. Jaeger, Lightstep, Zipkin, Datadog, etc.) and the tracer-specific configuration necessary to send trace information to the collector
        • For this release, Repose should support Jaeger configuration. The specific test case for this scenario would be the user starts up repose with Jaeger configuration to send span data to a Jaeger agent/collector
      • If request from upstream service has a trace set, Repose should set the span passed from that trace as a parent span and log a child span in its place.
      • If request from upstream service does not have a trace set, Repose should set its span a root span.
      • All outbound requests from Repose must have trace header information embedded in their requests
      • Repose must support ability to sample requests, including:
        • On/Off (constant sampling)
          • When sampling type is set to const and value set to 0, none of the spans are reported to the collector
          • When sampling type is set to const and value set to 1, all of the spans are reported to the collector
        • Probabilistic sampling
          • When sampling type is set to probabilistic and value set to 0.001, all of the spans are reported to the collector; however collector only samples 1 of 1000 traces (VALIDATING COLLECTOR LOGIC IS OUT OF SCOPE)
        • Rate limited sampling
          • When sampling type is set to rate-limited and value set to 1.0, all of the spans are reported to the collector; however collector only samples 1 trace per second (VALIDATING COLLECTOR LOGIC IS OUT OF SCOPE)
      • Repose should have a way to dynamically stop sampling
        • User starts Repose and starts sending requests. User updates configuration to turn off tracing. Repose no longer sends spans to the collector.
      • Repose request must not break if tracer collector/agent is unavailable
        • User starts Repose and starts sending requests. User kills tracer agent/collector. Repose keeps working as-is (optional log message that no spans are reported due to issue).
      • Repose request must be able to handle failing scenarios from outbound calls (timeouts, 4xx/5xx response codes)
        • User starts Repose and starts sending requests.
          • Origin sends back a 400 response code. Span is reported with 400 response code.
          • Origin sends back a 500 response code. Span is reported with 500 response code.
          • Origin doesn't send anything back. In 30 seconds, span is reported with 503 response code.
          • Origin sends back a connection reset. Span is reported with 503 response code.
          • Origin sends back a connection refused. Span is reported with 503 response code.
          • Keystone sends back a 400 response code. Span is reported with 400 response code.
          • Keystone sends back a 500 response code. Span is reported with 500 response code.
          • Keystone doesn't send anything back. In 30 seconds, span is reported with 503 response code.
          • Keystone sends back a connection reset. Span is reported with 503 response code.
          • Keystone sends back a connection refused. Span is reported with 503 response code.
      • Repose should support logging of spans
        • User sets logging to debug/info?. Log message shows span id reported for every request.
      • Repose should provide an ability of correlating X-Trans-Id information with a span (gives ability for systems who utilize X-Trans-Id for logging to easily correlate a trace to a specific set of logs)
        • User sends request. Span is reported with a tag with x-request-id set to x-trans-id request GUID.
      • Integration test coverage should not go down with new code
      • Unit test coverage should not go down with new code

      Acceptance Criteria:

      • We will make the header configurable, and default to whatever the "standard" header is.
      • The header should be added to every request out of Repose.
      • Repose can collect and report OpenTracing data.
        • Repose should interoperate with other services that are using OpenTracing.

        Attachments

          Issue links

            Activity

              People

              • Assignee:
                adrian.george Adrian George
                Reporter:
                mario.lopez Mario Lopez
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: