rate limiting

rate limiters are used to control the rate of traffic sent or received on the network
- viable if users can change the pace at which they hit the API endpoints without affecting the outcome
- not viable if spacing out requests is not an option (eg. real-time events)
during incidents (eg. if a service is operating slower than usual), there is a need to drop low-priority requests to make sure more critical ones get through (load shedding)
- a load shedder makes decisions based on the whole state of the system rather than the user making the request
- load shedders can help with emergencies by allowing the core part of the business working while the rest is on fire

restricts each user to N requests per second
possible enhancement: the ability to briefly burst above the cap for sudden spikes in usage during real-time events (eg. flash sales)

instead of “you can use the API N times a second”, we say “you can only have N API requests in progress at the same time” (word is simultaneous)
used to manage resource-intensive endpoints
- also reduces retry-induced load

divide traffic into two types: critical API methods (eg. creating charges), non-critical methods (eg. listing charges)
- stripe uses a Redis cluster to count the number of requests of each type
reserve a portion of infrastructure for critical requests (eg. 20%)
- any non-critical request over their 80% allocation would be rejected

most API services use a set of workers to independently respond to incoming requests in parallel
- last line of defense - if workers are backed up with requests, shed lower-priority traffic
- stripe divides traffic into 4 categories: (1) critical methods, (2) POSTs, (3) GETs, and (4) test mode traffic
number of workers with available capacity is tracked at all times - if a box is too busy to handle the request volume, it will slowly start shedding less-critical requests

it's important to shed and bring back traffic SLOWLY

else, we will end up flapping between states

stripe uses the token bucket algorithm in practice
- every user has a bucket, and every time they make a request we remove a token from that bucket
- implemented using Redis
there are some important considerations when implementing rate limiters
- hook the rate limiters into your middleware stack safely
  - if there were bugs in the rate limiting code (or if Redis were to go down), requests wouldn’t be affected - catch exceptions at all levels!!!
- show clear exceptions to users (eg. 429, 503)
- implement safeguard so that an admin can disable the rate limiters
  - possibly use feature flags to enable / disable rate limiters
  - set up alerts and metrics to understand how often they trigger
- “dark launch” each rate limiter to watch the traffic they block
  - evaluate if the choice was correct and tune the %s

stuff i learn