The need to integrate systems via APIs is an everyday reality of many applications. However, APIs can be expensive in multiple senses:
Let's see what we can do to mitigate these problems.
Good monitoring solutions have become essential with the rise of distributed (read: microservices) architectures. Any potential problem, including expensive APIs, can only be detected and efficiently solved if we have accurate data about what really happens in our systems. We need to know if and which APIs are (the most) expensive in our systems, how exactly they impact our costs, latencies and failures. We need numbers to compare.
Make sure to collect the values from the production because development and test environments have a different load structure and they are often connected to mock APIs instead of the real ones.
Standards like OpenTelemetry and many monitoring products on the market are available. Your application should emit relevant metrics related to the API usage.
When software systems are designed, it is common that integrations do not get as much attention as they deserve. The developers, after watching a quick tutorial, often rely on frameworks to do all the work. The result is that internal objects containing all possible attributes of a business entity are propagated verbatim everywhere including automatically mapped DTOs and API request/response schemas. Besides the poor efficiency, it can cause security issues.
One should not forget to ask: "Do we really need to update all the related data of a data object within each use case referring to the entity? Does accessing a data object that is used in many different modules of our code involve a potentially expensive API call?" In many cases a slimmer version of a DTO or an operation scope can avoid unnecessary expensive API calls.
Calling an expensive API may be totally OK if it contributes to the satisfaction of your paying customers. But it is often the case that APIs of your application are unsecured. It may be a good idea to make sure that only users who authenticate as your customers and are authorised by having a certain role can access functionalities involving expensive API calls.
Calling an API is an I/O-heavy operation. Only building the request and parsing the response need the use of the CPU. Waiting for I/O takes thousands, up to a million times longer. So letting the CPU thread do some useful work during the wait can increase the efficiency greatly. You can use reactive programming or, in case you are using Java 21, the virtual threads feature can do it for you.
Another situation related to threads and concurrency is when you need to make multiple API calls whose inputs do not depend on each other. A typical example is asking for quotes from multiple providers and returning the best price to the client as quickly as possible. Making the calls in parallel will shorten the overall response time. To make the code more readable and ensure that errors are handled and threads are cleaned up properly, there is the concept of structured concurrency.
Correct caching setup can be hard, however it is one of the strongest weapons to fight excessive use of expensive APIs. If you know that from the business perspective it is OK to display a value that may be 2 minutes old, it means you can get the value from a cache instead of the real API for those 2 minutes.
If you have done the hard work of analysing what can be cached and for how long, frameworks like Spring can make the implementation very easy, you just need to decorate a method with that @Cacheable annotation. Be careful if you want to call the cacheable method from the same (bean) class. The abstraction lets you configure various cache implementations either using your JVM memory or a distributed cache like Redis.
Every API call involves a lot of overhead besides the actual processing and formatting of the request and response payloads: Connection establishing, protocol handshakes, headers and envelopes, sometimes cold starts of a container...
If you are lucky, the provider of the expensive API also realises it and provides a way to get multiple results (records) from just one API request. The request may allow a list of IDs, a time range or some other flexible criteria matching multiple results.
Natural fits for using the bulk operations are
One problem with the bulk operations though is if the bulk operation's input is a list of IDs, it would be nice to put the resulting data items in a cache individually, each of them identified by its own ID.
Unfortunately this is beyond the scope of the @Cacheable abstraction mentioned above. There were some suggestions to cover this use case, but the additional complexity is not worth it. I was able to implement a satisfactory solution in just one simple class using a dependency injection of a Spring Cache interface. The implementation consists of the following steps