Analysis

Overview

Repose instruments its internal operations to allow users visibility into Repose's state and its environment.  Some example metrics are:

  • Information on the return codes coming from Repose and the endpoint services. (E.g., 200s, 300s, 400s & 500s).
  • Information on the processing time for Repose & the endpoint services to process client requests.
  • Customized filter information.  E.g., for the Translation filter, number of translated requests & responses by content type.

This data is expose in two ways:

  • JMX - Java management extensions, a modular & dynamic method of managing & monitoring applications running on the JVM.
  • Graphite - A scalable real time graphing solution. See http://graphite.wikidot.com/ for more information on Graphite.

Limited remote management actions can also be activated through JMX.  This remote management includes:

  • For the DistDatastore, the local cache can be cleared.
  • The external Client Authentication service can be pinged to ensure that it's reachable.
  • The Rate Limiter filter can reset the rate limit for an individual ID or group. 

More information on these 3 mechanisms is below.

JMX

As mentioned above, JMX provides a modular & dynamic way to monitor & manage applications running on the JVM.  Below we will discuss how to leverage JVM to monitor Repose & to perform limited management operations.

Enabling remote JMX access

By default, any JMX client can access another JVM on the same machine.  To enable a access from a remote machine, the following needs to be added to your java command line.

-Dcom.sun.management.jmxremote.port=$JMX_PORT -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=true

  • JMX_PORT is the port through which JMX will accept connections.

Running a JMX Client

jConsole is a JMX client which is shipped with the JVM.  On OS X, it is installed under the Commands directory within your Java installation.  Documentation for jConsole can be found here:  http://docs.oracle.com/javase/6/docs/technotes/guides/management/jconsole.html

Instrumentation

Instrumentation exposed through JMX is displayed within a tree structure.  Selecting an attribute & then its value will display a graph like below.  The following shows the value for the repose-node1-com.rackspace.papi/ResponseCode/Repose/5XX/Attributes/MeanRate, the rate of 500 response codes being returned by Repose.

Remote Management

In addition to viewing data in the JVM, remote actions can be executed through the JMX client.  In the example below, the repose-node1-com.rackspace.papi.service.datastore.impl.ehcache/ReposeLocalCache/Operations/removeTokensAndRoles has been selected and in the right panel, the 2 parameters are provided & then executed by clicking the removeTokenAndRoles button.

Graphite

Graphite allows data to be sent to a server to be displayed or propagated to further applications.  Repose can also expose any instrumentation through Graphite by configuring Repose with the Graphite server name & port, period (in seconds) and path prefix (for organizing the data within Graphite) in the metrics.cfg.xml.  For example:

 

See Configuration for more information on the format of metrics.cfg.xml.

Configuration

The Instrumentation & Remote Management features are configured through the metrics.cfg.xml file.  This file provide the following configuration mechanisms:

  • Configure metrics related options
  • Enumerate zero or more Graphite servers to which to send data.
  • Enumerate instrument-specific properties for an entire repose cluster or for a subset of clusters & nodes.

An example of a configuration file is below:

 

  • metrics - The root node containing metrics related attributes.
    • enabled - When set, dictates whether or not metrics are reported. If set to true, metrics will be reported. If set to false, metrics will not be reported. Defaults to true.
  • graphite - Contains zero or more server nodes which point to graphite servers. (required)
    • server - A graphite server.
      • host - The Graphite server hostname. (required)
      • port - The port on which the Graphite server accepts data streams. (required)
      • period - The polling period in seconds for the Graphite server. (required)
      • prefix - The path prefix for organizing the data within the Graphite server. (required)
  • properties - (Not yet implemented) Contains zero or more instrument properties for this Repose deployment. (required)
    • property - An individual property value. (optional)
      • name - The name of the property. (required)
      • value - The value of the property. (required)
      • cluster - A space-delimited list of cluster names. (optional)
      • node - A node-delimited list of node names. (optional)

When specifying property values, the following rules apply

  • All properties have a default value.
  • A property can have multiple property nodes associated with it, each property node provides a value to a subset of the clusters & nodes.
  • If no cluster and no node list are specified, all nodes & clusters get the value.
  • If cluster list and no node list are specified, all nodes in the listed clusters get the value.
  • If no cluster list and a node list is specified, all clusters with the given node name get the value.  This allows deployments with only 1 cluster not to have to specify the cluster id. 
  • If cluster list & node list are specified, all nodes which fall under the intersection of the cluster & node lists get the value.

Instrumentation

The following table lists the instruments that are provided by Repose.  Items in italics are future features.

<cluster>-<node>-com.rackspace.papi

 

 

 

 

 

 

 

JMX Name

Attribute

Description

ConfigurationService/<file path>

 

 

valid-update-time

String, timestamp of last successful config file load

 

last-update-attempt-time

String, timestamp of last attempted config file load, might have been unsuccessful

 

is-last-update-valid

Boolean, if last update was valid

 

configuration-xml

String, contents of configuration file

 

Sha1

String, SHA1 of the contents of the configuration file.

 

 

 

ResponseCode/Repose

2XX

Meter, response codes returned from Repose, by status code group

 

3XX

 

 

4XX

 

 

5XX

 

 

 

 

ResponseCode/<endpoint id> & "All Endpoints"

 

 

2XX

Meter, response codes returned from origin, by status code group

 

3XX

 

 

4XX

 

 

5XX

 

 

 

 

ActiveRequests/Repose

Counter, currently active requests

 

 

 

ActiveRequests/Origin

 

 

<endpoint id> & all-endpoints

Counter, currently active requests to origin service

 

 

 

Ping/<endpoint id>

 

HealthCheck to origin service

 

 

 

ProcessTime/CompleteClientProcessing

 

 

time

Meter, tracking time it takes for repose to process request from client

 

throughput

Meter, tracking throughput of repose requests from clients

 

top-time

Array[String] - top # of method & URI which takes the longest.  Configure with sys-model property com.rackspace.papi.CompleteClientProcessing.TopTime

 

time-by-threshold

Array[String] - list of method & URI which take at least  # milliseconds – configure with sys-model property com.rackspace.papi.CompleteClientProcessing.TimeThreshold

 

 

 

ProcessTime/ReposeProcessingRequest

 

 

time

Meter, tracking time it takes for repose to process request from client

 

throughput

Meter, tracking throughput of repose requests from clients

 

top-time

Array[String] - top # of method & URI which takes the longest.  Configure with sys-model property com.rackspace.papi.ReposeProcessingRequest.TopTime

 

time-by-threshold

Array[String] - list of method & URI which take at least  # milliseconds – configure with sys-model property com.rackspace.papi.ReposeProcessingRequest.TimeThreshold

 

 

 

ProcessTime/ReposeProcessingResponse

 

time

Meter, tracking time it takes for repose to processing response from origin

 

throughput

Meter, tracking throughput of repose processing responses from origin

 

top-time

Array[String] - top # of method & URI which takes the longest.  Configure with sys-model property com.rackspace.papi.ReposeProcessingResponse.TopTime

 

time-by-threshold

Array[String] - list of method & URI which take at least  # milliseconds – configure with sys-model property com.rackspace.papi.ReposeProcessingResponse.TimeThreshold

 

 

ProcessTime/<endpoint>ProcessingRequest & "All Endpoints ProcessingRequest"

 

time

Meter, tracking time it takes for origin to process request from repose

 

throughput

Meter, tracking throughput of origin to process request from repose

 

top-time

Array[String] - top # of method & URI which takes the longest.  Configure with sys-model property com.rackspace.papi.OriginProcessingRequest.TopTime

 

time-by-threshold

Array[String] - list of method & URI which take at least  # milliseconds – configure with sys-model property com.rackspace.papi.OriginProcessingRequest.TimeThreshold

 

 

RequestTimeout/TimeoutToOrigin (Repose 2.8.1)

 

 

<endpoint> & all-endpoints

Meter, timeout rate from repose to origin service

 

 

 

FilterContextList

 

 

 

filter-info

Table of filters by id containing

  • Name
  • Array[String] - paths to filter config files subscribed by each service.
  • String, filter URI-Regex (or .* if none listed)
 
 

 

 

 

<cluster>-<node>-com.rackspace.papi.components

 

 

 

 

ApiValidator/<filter id or name-number in sys-model> (Repose 2.8.1)

 

 

ACROSS ALL

Meter, all invalid requests

 

<role>

Meter, invalid requests by role

 

 

 

ClientAuth/<filter id or name-number in sys-model>

 

 

top-id-fails

Array[String] - top # of id’s which fail auth.  Configure with sys-model property com.rackspace.papi.components.clientauth.common.TopFailCapacity

 

fails-by-threshold

Array[String] - list of id’s which fail at least # times – configure with sys-model property com.rackspace.papi.components.clientauth.common.FailThreshold

 

fail

Meter, fails from auth service

 

Calls

Meter, calls to auth service

 

white-list

Meter, white-list requests

 

 

 

ClientAuthorization/<filter id or name-number in sys-model>

 

top-id-blocked

Array[String] - top # of id’s which are blocked.  Configure with sys-model property com.rackspace.papi.components.authz.TopBlockedCapacity

 

blocked-by-threshold

Array[String] - list of id’s which are blocked least # times – configure with sys-model property com.rackspace.papi.components.authz.BlockedThreshold

 

blocked

Meters, requests blocked by auth service

 

Calls

Meter, calls to auth service

 

Ping

HealthCheck, verify service is up

 

 

DestinationRouter/<filter id or name-number in sys-model> (Repose 2.8.1)

 

ACROSS ALL

Meter, all routed responses

 

<target>

Meter, routed response by target

 

 

 

com.rackspace.papi.service.datastore.impl.ehcache.ReposeLocalCache
 

void removeAllCacheData();

Clears the entire cache.
 boolean removeGroups(String tenantId, String token); Removes users group info from the Client Auth Filter.
 boolean removeLimits(String userId);Removes all limits by user. This is dependent on how they are identifying who to rate-limit. so if they're using ip-identity to rate-limit then it'll be by requesting ip. if they're rate-limiting using auth then it'll be a users tenant id.
 boolean removeTokenAndRoles(String tenantId, String token);Removes user cached in from the validate token call from the Client Auth filter.
   

HeaderNormalization/<filter id or name-number in sys-model> (Repose 2.8.2)

 

ACROSS ALL

Meter, all normalizations

 

<uri-regex-method>

Meter, normalizations by uri-regex & method.

 

 

 

 

 

 

RateLimiting/<filter id or name-number in sys-model>

 

 

ACROSS ALL

Meter, all rate limited requests

 

<group>

Table of URI & Meter of rate limited requests.

 

reset-limit-id( id )

Admin function, reset rate-limit by ID

 

reset-limit-group( group )

Reset rate-limit by group.

 rate-limit-object-size( id )Get number of items in Datastore for given id.

 

 

 

ResponseMessaging/<filter id or name-number in sys-model>

 

ACROSS ALL

Meter, all generated messages

 

<code-media-type>

Meter, generated messages by code & media-type

 

time

Meter, time for response generation

 

time-<code-media-type>

Meter, time for response generation by code-media-type

 

 

 

 

 

 

TranslationRequest/<filter id or name-number in sys-model>

 

ACROSS ALL

Meter, all request translations

 

<content-type-accept>

Meter, request translation by content-type & accept type

 

time

Meter, time for translation generation

 

time-<code-media-type>

Meter, time for translation generation by code-media-type

 

 

 

TranslationResponse/<filter id or name-number in sys-model>

 

ACROSS ALL

Meter, all response translations

 

<code-content-type-accept>

Meter, response translation by status code, content type & accept type

 

time

Meter, time for translation generation

 

time-<code-content-type-accept>

Meter, time for translation generation by code-media-type

 

 

 

UriNormalization/<filter id or name-number in sys-model> (Repose 2.8.2)

 

ACROSS ALL

Meter, all normalizations

 

<uri-regex-method> or <extension>

Meter, URI normalizations by uri-regex & method or extension

 

 

 

Versioning/<filter id or name-number in sys-model> (Repose 2.8.2)

 

 

<id>

Meter, versioning by versioning id

 

 

 

Compression/<filter id or name-number in sys-model>
  Meter, all content-encodings successfully decompressed (requests)
  Meter, all content-encodings unsuccessfully decompressed (requests)
  Meter, all accept-encodings successfully compressed (response)
  Meter, all accept-encodings unsuccessfully compressed (response)
  Meter, user agents served

Log4j/<logger>

 

yammer metrics log4j

 

 

 

java.lang.management

 

 

 

 

 

MemoryMXBean

getHeapMemoryUsage

 

MemoryMXBean

getNonHeapMemoryUsage

 

RuntimeMXBean

getStartTime

 

GarbageCollectorMXBean

getCollectionCount

 

 

getCollectionTime

 

OperatingSystemMXBean

getSystemLoadAverage

 

Remote Management

The following table lists the remote management features which are provided by Repose.  Items in italics are future features.

 

<cluster>-<node>-com.rackspace.papi

 

 

 

 

 

JMX Name

Operation

Description

Ping/<endpoint id>

 

HealthCheck to origin service

 

 

 

 

 

 

<cluster>-<node>-com.rackspace.papi.components

 

 

 

 

ClientAuthorization/<filter id or name-number in sys-model>

 

Ping

HealthCheck, verify service is up

 

 

 

com.rackspace.papi.service.datastore.impl.ehcache.ReposeLocalCache
 

void removeAllCacheData();

Clears the entire cache.
 boolean removeGroups(String tenantId, String token); Removes users group info from the Client Auth Filter.
 boolean removeLimits(String userId);Removes all limits by user. This is dependent on how they are identifying who to rate-limit. so if they're using ip-identity to rate-limit then it'll be by requesting ip. if they're rate-limiting using auth then it'll be a users tenant id.
 boolean removeTokenAndRoles(String tenantId, String token);Removes user cached in from the validate token call from the Client Auth filter.
   

 

 

 

RateLimiting/<filter id or name-number in sys-model>

 

 

reset-limit-id( id )

Admin function, reset rate-limit by ID

 

reset-limit-group( group )

Reset rate-limit by group.