freno serves requests via HTTP. Requests/responses are short enough that HTTP does not incur a substantial overhead. freno listens on configuration's "ListenPort".
Client/automated requests should use HEAD requests, and manual/human requests may use GET requests. Both variations return the same HTTP status codes.
The check request is the one important question freno must answer: "may this app write to this datastore?"
For example in /check/archive/mysql/main1 the archive app wishes to write to the main1 MySQL cluster.
freno answers by choosing an appropriate HTTP status code, as follows:
200(OK): Application may write to data store404(Not Found): Unknown metric name.417(Expectation Failed): Requesting application is explicitly forbidden to write.429(Too Many Requests): Do not write. A normal state indicating the store's state does not meet expected threshold.500(Internal Server Error): Internal error. Do not write.
Notes:
- Clients should only proceed to write on status code
200. 404(Not Found) can be seen when metric name is incorrect, undefined, or if the server is not the leader or was just promoted and didn't get the chance to collect data yet.417(Expectation Failed) results from a user/admin tellingfrenoto reject requests from certain apps429(Too Many Requests) is just a normal "do not write" response, and is a frequent response if the store is busy.500(Internal Server Error) can happen if the node just started, or otherwisefrenomet an unexpected error. Try aGET(more informative) request or search the logs.
freno supports the following:
-
/check/<app>/<store-type>/<store-name>: the most important request: mayappwrite to a backend store?<app>can be any name, does not need to be pre-definedmysqlis the only supported<store-type>at this time<store-name>must be defined in the configuration file- Example:
/check/archive/mysql/main1
-
/throttle-app/<app-name>/ttl/<ttlMinutes>/ratio/<ratio>: refuse partial/complete access to an app for a limited amount of time. Examples:/throttle-app/archive/ttl/30/ratio/1: completely refuse/check/archive/*requests for a duration of30minutes/throttle-app/archive/ttl/30/ratio/0.9: mostly refuse/check/archive/*requests for a duration of30minutes. On average (random dice roll),9out of10requests (i.e.90%) will be denied, and one approved./throttle-app/archive/ttl/30/ratio/0.5: refuse50%of/check/archive/*requests for a duration of30minutes
-
/throttle-app/<app-name>/ttl/<ttlMinutes>:- If app is already throttled, modify TTL portion only, without changing the ratio.
- If app is not already throttled, fully throttle for a duration of
1hour (ratiois implicitly1).
-
/throttle-app/<app-name>/ratio/<ratio>:- If app is already throttled, modify ratio portion only, without changing the TTL.
- If app is not already throttled, throttle with given ratio, for a duration of
1hour.
-
/throttle-app/<app-name>: refuse access to an app for1hour.Same as calling
/throttle-app/<app-name>/ttl/60/ratio/1. Provided as convenience endpoint. -
/throttle-appcan take a query parameterstore_nameto throttle the app only on one store (i.e. MySQL cluster). For example/throttle-app/archive?store_name=myclusterrefuses/check/archive/mysql/myclusterrequests for1hour./unthrottle-app/archivewill re-allow thearchiveapp to get valid response from/check/archive/*requests.Throttling will of course still consider cluster status, which is never overridden.
-
/throttled-apps: list currently throttled apps.
-
/recent-apps/<lastMinutes>: list app/host that have/checkedfrenoin the past given minutes. Example:/recent-apps/30show which apps from which hosts have issuedcheckrequests in the past30minutes
-
/recent-apps: no time limit;frenokeeps up to24hofcheckrequests.
/lb-check: returnsHTTP 200. Indicates the node is alive/leader-check: returnsHTTP 200when the node is theraftleader, or404otherwise./hostname: node host name
-
/check-read/<app>/<store-type>/<store-name>/<threshold>: a specialized check to see whether current value is lower than given threshold.As an example, consider
/check-read/archive/mysql/main1/2.5. This checks whether the currentmysql/main1store's value is smaller than or equals to2.5. The store's configured threshold value is ignored and not tested in this check.This read-check should not be used to approve writes. Writes should only be approved by using the
/checkrequest.However this check is known to be useful, at least in one common scenario: a monitoring of a MySQL cluster based on replication lag. In such case, we may have write requests followed by read requests. We may happen to know the elapsed time between write & read. As an example, say
2.5shave passed between the write and read. The check/check-read/archive/mysql/main1/2.5confirms or denies that relevant replicas are up-to-date for the2.5selapsed time. We can therefore read from the replicas and safely expect to find the data we wrote2.5sago on the master. -
/check-if-exists/<app>/<store-type>/<store-name>: like/check, but if the metric is unknown (e.g.<store-name>not infreno's configuration), return200 OK. This is useful for hybrid systems where some metrics need to be strictly controlled, and some not.frenowould probe the important stores, and still can serve requests for all stores. -
/check-read-if-exists/<app>/<store-type>/<store-name>/<threshold>: like/check-read, but if the metric is unknown (e.g.<store-name>not infreno's configuration), return200 OK. This is useful for hybrid systems where some metrics need to be strictly controlled, and some not.frenowould probe the important stores, and still can serve requests for all stores. -
/skip-host/<hostname>/ttl/<ttl-minutes>: skip host when aggregating metrics for specified number of minutes. If host is already skipped, update the TTL. -
/skip-host/<hostname>: same as/skip-host/<hostname>/ttl/60 -
/recover-host/<hostname>: recover a previously skipped host. -
/skipped-hosts: list currently skipped hosts
-
/help: show all supported request paths -
/config/memcache: show the memcache configuration used, so freno clients can use it to implement more efficient read strategies.
GET and HEAD respond with same status codes. But GET requests compute and return additional data. Automated requests should not be interested in this data; the status code is what should guide the clients. However humans or manual requests may benefit from extra information supplied by the GET request.
For example:
A GET request for http://my.freno.service:9777/check/archive/mysql/main1 may yield with:
{
"StatusCode": 200,
"Message": "",
"Value": 0.430933,
"Threshold": 1
}Extra info such as the threshold or actual replication lag value is irrelevant for automated requests, which should just know whether they're allowed to proceed or not. For humans this is beneficial input.