Dealing with expensive APIs

The need to integrate systems via APIs is an everyday reality of many applications. However, APIs can be expensive in multiple senses:

monetary: some APIs require payment either as a time-based subscription or based on the number of requests sent to them
increased latency: calling an API, waiting for it to respond and processing the response means an increase of the latency of your application; it can also mean many threads will be blocked and the potential of concurrent processing will not be fully used
decreased stability: sending too many requests to an API in a short time can overload it so that the latency increases even more or the API starts failing completely causing your application to fail as well.

Let's see what we can do to mitigate these problems.

Observability

Good monitoring solutions have become essential with the rise of distributed (read: microservices) architectures. Any potential problem, including expensive APIs, can only be detected and efficiently solved if we have accurate data about what really happens in our systems. We need to know if and which APIs are (the most) expensive in our systems, how exactly they impact our costs, latencies and failures. We need numbers to compare.

Make sure to collect the values from the production because development and test environments have a different load structure and they are often connected to mock APIs instead of the real ones.

Standards like OpenTelemetry and many monitoring products on the market are available. Your application should emit relevant metrics related to the API usage.

Call it only when needed

When software systems are designed, it is common that integrations do not get as much attention as they deserve. The developers, after watching a quick tutorial, often rely on frameworks to do all the work. The result is that internal objects containing all possible attributes of a business entity are propagated verbatim everywhere including automatically mapped DTOs and API request/response schemas. Besides the poor efficiency, it can cause security issues.

One should not forget to ask: "Do we really need to update all the related data of a data object within each use case referring to the entity? Does accessing a data object that is used in many different modules of our code involve a potentially expensive API call?" In many cases a slimmer version of a DTO or an operation scope can avoid unnecessary expensive API calls.

Authentication and authorisation

Calling an expensive API may be totally OK if it contributes to the satisfaction of your paying customers. But it is often the case that APIs of your application are unsecured. It may be a good idea to make sure that only users who authenticate as your customers and are authorised by having a certain role can access functionalities involving expensive API calls.

Using threads efficiently

Calling an API is an I/O-heavy operation. Only building the request and parsing the response need the use of the CPU. Waiting for I/O takes thousands, up to a million times longer. So letting the CPU thread do some useful work during the wait can increase the efficiency greatly. You can use reactive programming or, in case you are using Java 21, the virtual threads feature can do it for you.

Another situation related to threads and concurrency is when you need to make multiple API calls whose inputs do not depend on each other. A typical example is asking for quotes from multiple providers and returning the best price to the client as quickly as possible. Making the calls in parallel will shorten the overall response time. To make the code more readable and ensure that errors are handled and threads are cleaned up properly, there is the concept of structured concurrency.

Caching

Correct caching setup can be hard, however it is one of the strongest weapons to fight excessive use of expensive APIs. If you know that from the business perspective it is OK to display a value that may be 2 minutes old, it means you can get the value from a cache instead of the real API for those 2 minutes.

If you have done the hard work of analysing what can be cached and for how long, frameworks like Spring can make the implementation very easy, you just need to decorate a method with that @Cacheable annotation. Be careful if you want to call the cacheable method from the same (bean) class. The abstraction lets you configure various cache implementations either using your JVM memory or a distributed cache like Redis.

Bulk operations

Every API call involves a lot of overhead besides the actual processing and formatting of the request and response payloads: Connection establishing, protocol handshakes, headers and envelopes, sometimes cold starts of a container...

If you are lucky, the provider of the expensive API also realises it and provides a way to get multiple results (records) from just one API request. The request may allow a list of IDs, a time range or some other flexible criteria matching multiple results.

Natural fits for using the bulk operations are

when your application also offers a display of a list of items
when you are doing batch processing, e.g. updating status of all records every night.

One problem with the bulk operations though is if the bulk operation's input is a list of IDs, it would be nice to put the resulting data items in a cache individually, each of them identified by its own ID.

Unfortunately this is beyond the scope of the @Cacheable abstraction mentioned above. There were some suggestions to cover this use case, but the additional complexity is not worth it. I was able to implement a satisfactory solution in just one simple class using a dependency injection of a Spring Cache interface. The implementation consists of the following steps

check which items are already in cache
invoke the API for the missing items (if there are any), converting the results to a HashMap
put the results in the cache
merge the cached results with the API results to build the full result list.

Let me know your thoughts on LinkedIn.

Back to all blog posts