Logged Exceptions contain sensitive data #2764

tomsoliver · 2023-07-07T13:13:18Z

tomsoliver
Jul 7, 2023

Most exceptions generated by this package inherit from HTTPError. This class holds the associated request as a property. When a request is thrown, the exception details are logged. When using structured logging, the exception is serialized along with it's properties. This means that the request object is also printed as a log, leading to the following log structure:

{
  "message": "...",
  "timestamp": "....",
  "exception": {
    "request": {...}
  }
}

The request object contains both the request body and headers within it, both of which can contain sensitive data that can be security or compliance risks. For example, authorization or api-key headers, or sensitive data in the request body like personal information, card details, or internal proprietary data.

Generally we can't program for all exception types of all external packages in every application, so we have default exception handling at the root of applications that log exceptions in a structured way. Would it be possible to clean up these exceptions even though I know it could be a breaking change. One suggestion, we could expose the request through a method instead of a property? Then the property could become request details that are safe to log?

I'd also be happy to help implement any changes

lovelydinosaur · 2023-07-10T08:43:50Z

lovelydinosaur
Jul 10, 2023
Maintainer

Hey @tomsoliver thanks for raising this. I've nudged it over into a discussion as our starting point for design decisions.

Let's work through it... 💪

Could we start with the simplest possible example code so we've got a concrete case to work from? (I find that helps keep things focused)

0 replies

trp-thomas-oliver · 2023-07-10T09:29:11Z

trp-thomas-oliver
Jul 10, 2023

Hi @tomchristie,

Thanks for getting back to me. Imagine an API call

import my_structured_logger as logger
import httpx

async def call_dependency(request, call_next):
    try:
        async with httpx.AsyncClient(...) as client:
            return await client.get(....)
    exception Exception as ex:
        logger.error("an error occurred", exc_info=True)

The structured logging will create a structure log like the following:

{
  "message": "...",
  "timestamp": "....",
  "exception": {
      ...
  }
}

However, when the exception inherits from httpx.HTTPError, it include the request object as a property. Structured logging by default generally logs the properties of objects, so the http error will be serialized into something like:

{
   "message":  "HTTPError occurred",
   "traceback": "....",
   "request": {
      "text": "Possible sensitive request body",
     "headers": [ "possibly sensitive header"... ]
   }
}

We follow this structured logging pattern because we can't keep track of every exception type for every dependency we follow. We want to make sure when exceptions happen, we have the correct details available, and structured exceptions within structured logs are very useful ways to do that. Is there a way we could remove or obfuscate sensitive information from exceptions thrown by httpx? Perhaps exposing it via a method from the exceptions as opposed to a property would mean it is still available for debugging but not exposed for serialization?

3 replies

lovelydinosaur Jul 10, 2023
Maintainer

Okay here's a simpler example...

import my_structured_logger as logger
import httpx

try:
    httpx.get("https://ejsfbksjdfkjdsnfkjnsdfkjnkjiuhokjh")
except Exception as exc:
    logger.error("an error occurred", exc_info=True)

It's somewhat ambiguous what a "structured logger" means here, so if there's a specific real-world case that we can work with then that would be valuable?...

lovelydinosaur Jul 11, 2023
Maintainer

What this kind of question usually comes down to is... "can we ensure that instance __repr__ cases are sensible & careful about what they display".

>>> import httpx
>>>
>>> h = httpx.Headers({"Authorization": "a"})
>>> h
Headers({'authorization': '[secure]'})
>>>
>>> u = httpx.URL("https://username:[email protected]/")
>>> u
URL('https://username:[secure]@example.com/')
>>>
>>> r = httpx.Response(200, content=b"abc")
>>> r
<Response [200 OK]>

zanieb Jul 11, 2023
Collaborator

What does your structured logger library use to determine which object "properties" should be included in the log?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Logged Exceptions contain sensitive data #2764

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Logged Exceptions contain sensitive data #2764

Uh oh!

tomsoliver Jul 7, 2023

Replies: 2 comments · 3 replies

Uh oh!

lovelydinosaur Jul 10, 2023 Maintainer

Uh oh!

Uh oh!

trp-thomas-oliver Jul 10, 2023

Uh oh!

lovelydinosaur Jul 10, 2023 Maintainer

Uh oh!

Uh oh!

lovelydinosaur Jul 11, 2023 Maintainer

Uh oh!

zanieb Jul 11, 2023 Collaborator

tomsoliver
Jul 7, 2023

Replies: 2 comments 3 replies

lovelydinosaur
Jul 10, 2023
Maintainer

trp-thomas-oliver
Jul 10, 2023

lovelydinosaur Jul 10, 2023
Maintainer

lovelydinosaur Jul 11, 2023
Maintainer

zanieb Jul 11, 2023
Collaborator