4 minute read

I’ve worked in payment systems for a long time, and one common theme is that they make a lot of 3rd party API calls. For example, there are numerous payment networks and banks, fraud and risk checks, onboarding and KYC/KYB (Know Your Customer/Business) services, and more. And then there are the inbound calls as well, such as webhooks and callbacks from integration partners, e-commerce stores, and other outside interactions.

Some of these calls will be to external companies which provide APIs, and some will be to other services/teams within the same organization, but outside the current scope of the software system.

In all cases, managing 3rd party API calls can be tricky. There’s the initial integration, where the API requests/responses might not be what you expect or what the docs say. Then you go to production and find out the behavior there doesn’t quite match the sandbox/test environment. Even once it’s all working, APIs change over time, and what worked yesterday might not work tomorrow. And then there are weird edge cases where you get back a response for one call out of a million which blows up your processing and handling.

To help with these issues, I like when systems store all external requests.

Storage

The first step is storing all inbound and outbound API calls. People will often use logs for this, but I think it’s far more valuable to put them in the database instead. This makes them easier to query, easier to aggregate, and easier to join against other data.

You can use one table for both inbound and outbound or separate them depending on preferences, but generally it’s useful to store most of the available information, such as:

  • URL of the request
  • Datetime of the request
  • Request body
  • Response body
  • Response status/code (e.g. 200, 500)
  • Total time spent on the request
  • Request headers
  • Response headers
  • Metadata

For metadata, I like to use a JSON column and add in any metadata that links this request to an entity in the system. For example, it could include user_id, order_id, request_id, etc. As a JSON column, you can include more than one, and even include other complex nested information.

Storing the request time provides several benefits. One, it makes it easy to do some basic analysis of the API calls (e.g. what’s the average time for this call, or average time if we provide these fields but not these other fields?). It also helps with debugging, such as when you see a certain error code after exactly 5 seconds every time, you can infer there’s some 3rd party timeout.

Response headers often include extra debugging information, such as request or trace Ids, version numbers, etc. It’s common when asking for 3rd party support to provide these values so they can go look in their own logging to find your requests.

Once you start storing this information, it will often be immediately useful, even in development. And you can use it in integration tests (e.g. ensuring you pass a certain field with a certain value when making the API call). But the real power comes from debugging in production, such as finding all of the API calls associated with a failed payment to see what went wrong using simple SQL queries.

Implementation

Rather than try to write code for every API call, it’s often better to hook into the request/response lifecycle in one place and instrument all calls. The way you do this depends on the language and libraries, but they are generally called interceptors. For example:

Where possible, I like to record the request fields before the outbound call is started (e.g. request body, request headers) and then go back and update the row to store the response fields once the call is completed. There are several advantages over a single write at the end of the request cycle:

  • You can see and query requests in flight, which can be particularly useful when requests are taking a long time or hanging
  • Sometimes the call never finishes, and you want some record in the database instead of nothing (e.g. the process crashes, the server reboots, or even the response handling code blows up in an unexpected way before it writes the response row)

Security/Privacy

There are obviously security and privacy considerations when recording external requests. One basic approach is to filter parts of the bodies and headers that you don’t want stored. For example, authentication headers, PII (Personally Identifiable Information) and other sensitive information. I like replacing this information with something like [REDACTED] rather than just removing it so it’s clear that a value was present.

Some filtering can be global (redact all Authentication headers) and some can be request specific (for this request, redact password).

I also recommend adding a lifecycle to these tables. For example, delete or redact all data after 2 weeks. That way, if something does creep in unexpectedly, it won’t last very long. Most of the debugging value is in recent data. And pruning old data also keeps the table size in check.

Updated:

Leave a comment