Throttling Web API Calls

April 1, 2011  |  ,  |  Share
Sign outside Wallington, England (CC-BY by anemoneprojectors)
Quick Summary
Many APIs limit the number of calls per second. TimeSpanSemaphore can help throttle pending calls so that you do not exceed the service limits.

From Amazon to Zillow, there are thousands of sites which provide access to data via an API. At FilterPlay we use lots of e-commerce APIs to retrieve product data and update prices used in our comparison engine. Our back-end system updates millions of items every day and fortunately many of these API calls and updates can be parallelized. Most APIs provide limits to help ensure the service remains available and responsive. It’s important to throttle your API calls to stay within the limits defined by each service.

Using a Semaphore to Limit Concurrency

Every programming language provides synchronization primitives to control access to a resource shared by multiple threads. For example, the lock keyword in C# will restrict execution of a block of code to a single thread at any one moment in time. A semaphore can be used to give multiple threads concurrent access to a resource. However, most web APIs also include a time window. For example, the BestBuy e-commerce API specifies that developers only make 5 calls every second. Since a web request can finish in under a second, its not enough to limit the number of calls using a semaphore. The following example illustrates the use of a semapahore which is set to only allow 5 concurrent workers. We’ll create 6 worker threads which perform 300ms of “work” after entering the semaphore:

static void DoWork(int taskId)
    DateTime started = DateTime.Now;
    Thread.Sleep(300);  // simulate work
        "Task {0} started {1}, completed {2}",
static void StandardSemaphoreTest()
    using (SemaphoreSlim pool = new SemaphoreSlim(5))
        for (int i = 1; i <= 6; i++)         
            Thread t = new Thread(new ParameterizedThreadStart((taskId) =>
    Thread.Sleep(2000); // give all the threads a chance to finish


Task 1 started 51.229, completed 51.540
Task 2 started 51.229, completed 51.540
Task 3 started 51.258, completed 51.558
Task 4 started 51.258, completed 51.558
Task 5 started 51.260, completed 51.560
Task 6 started 51.540, completed 51.840

Note that Task 6 starts immediately after Task 1 is completed and exits the sempahore. The simulated work only takes 300ms so all six workers easily finish in under a second, exceeding our limit of 5 per second. One solution would be to sleep for a second after every request. However, blocking a worker after its done using the resource isn’t a good idea. In our simple example thats not obvious because the thread exits after performing its work on the shared resource. However, in a real scenario you’ll call a web API to obtain some data and then process the results. It’s important that you don’t do the post-processing while holding a lock. We also shouldn’t block that work just to ensure a subsequent caller doesn’t exceed our limit. The solution is to couple a semaphore with a time span which must elapse before the caller can acquire a lease on the resource. I created a TimeSpanSemaphore class which internally uses a queue of time stamps to remember when the previous worker finished.

Don’t Forget the Transit Time

Its important to explain why we need to track time stamps from the moment when each action completes. My initial implementation simply reset a lock pool after each time period had elapsed. That may work perfectly for some throttling scenarios, but for web APIs we have to remember that we’re trying to obey a limit that is enforced on a remote server. BestBuy, Twitter, or Amazon don’t care whether you only send a certain number of requests per second, they can only observe how many requests per second they receive from your application. The variable time it takes a request to arrive on the remote server can cause you to violate the limits if you only use the time when requests are sent. Here’s an example:

Time (sec) Event
0.000 Requests 1-5 are sent to the server
0.700 The requests all arrive at the server
0.800 All requests return with data
1.000 1 second has elapsed so request 6 is sent to the server
1.100 Request 6 arrives much faster than the first 5 requests

The remote server sees that 6 requests arrived between 0.700 and 1.100 when only 400 ms have elapsed, violating the API limits.

Using the TimeSpanSemaphore

Instead of exposing the Wait() and Release() methods publicly, our TimeSpanSemaphore class provides a Run method which accepts an Action delegate and supports cancellation tokens. We also ensure that the lock is released if an exception occurs. Here is our previous example using the TimeSpanSemaphore class instead of the standard semaphore:

using (TimeSpanSemaphore throttle = new TimeSpanSemaphore(5, TimeSpan.FromSeconds(1)))
    for (int i = 1; i <= 6; i++)          
        Thread t = new Thread(new ParameterizedThreadStart((taskId) =>
                () => DoWork((int)taskId),


Task 2 started 53.276, completed 53.576
Task 1 started 53.276, completed 53.576
Task 3 started 53.276, completed 53.576
Task 4 started 53.278, completed 53.579
Task 5 started 53.279, completed 53.579
Task 6 started 54.598, completed 54.898

You can see that like the first example, the first 5 requests all start and complete around the same time. However, task 6 waits a full second after the completion of the first task before starting. In theory we wouldn’t have to wait a full second if we knew the how the request lifetime was spent (travel time to/from + server processing). However, we don’t know exactly when the remote server will decide to count the request. Its safer to assume the entire lifetime of the request was spent travelling to the server, but the next request might arrive instantly. This means you can’t use the API to max capacity, but its better to err on the side of caution rather than exceed the limits. The effect will be larger if the concurrent worker count or timespan are small. If you really need every last API call, you could adjust the count and/or timespan to account for this delay.

One tip for those who are making many concurrent API calls. By default .net only allows 2 concurrent requests per hostname. You can set the DefaultConnectionLimit of the ServicePointManager when you initialize your program (only needs to be set once) or in the .config file.

Show Me the Code

The source code for the TimeSpanSemaphore class is available in my GitHub repository JFLibrary. Over time I’m planning to add more utility code that I frequently use. I’d love to hear your feedback, bug reports or a different solution you’ve used to rate limit API calls.


  1. Appreciate this post. Was looking for a better way to throttle outgoing API calls…

    This points me in the right direction.

  2. I’m a little late to the party, but great article nonetheless!

Leave a Reply