======== Scrubber ======== .. contents:: :local: Quickstart ========== 1. Create a :py:class:`fillmore.scrubber.Scrubber` instance. 2. Pass it a set of :py:class:`fillmore.scrubber.Rule` instances specifying things to scrub and how to scrub them. For example, lets say you wanted to remove the ``Auth-Token`` and ``X-Forwarded-For`` headers, ``code`` and ``state`` cookies, and also any frame-local variables with the name ``password``. Example: .. [[[cog import cog cog.outl("\n.. code-block:: python\n") with open("examples/scrubber/webapp_scrubber.py", "r") as fp: for line in fp: cog.out(f" {line}") cog.outl("") ]]] .. code-block:: python # examples/scrubber/webapp_scrubber.py from fillmore.scrubber import Scrubber, Rule, build_scrub_cookies # Create a Scrubber scrubber = Scrubber( rules=[ Rule( path="request", keys=["cookies"], # build_scrub_cookies builds a scrub function that handles the # different possible shapes of the value, scrubs the specified # bits, and returns the same shape scrub=build_scrub_cookies(params=["code", "state"]), ), Rule( path="request.headers", keys=["Auth-Token", "X-Forwarded-For"], # You can specify scrub functions as functions or Python dotted # paths scrub="scrub", ), Rule( path="exception.values.[].stacktrace.frames.[].vars", keys=["username"], scrub="scrub", ), ], ) .. [[[end]]] Things to know about scrubbing: 1. The Scrubber can take any number of rules. 2. Rules are executed in order. 3. If the rule specifies data that doesn't exist in the Sentry event, then the rule won't be run. 4. Anything scrubbed by Fillmore scrub functions has the value ``[Scrubbed]``. You can distinguish this from things scrubbed by sentry_sdk or Sentry server which use ``[Filtered]``. .. Note:: If traversing the Sentry event for the data to be scrubbed or the scrub rule kicks up an error, Fillmore will log the exception to the ``fillmore.scrubber`` logger. Make sure to set up the ``fillmore`` logger and set the level to ``logging.ERROR`` when setting up Python logging. How do I know what data to scrub? ================================== `Sentry `__ maintains documentation on the event payload as well as a schema. Capturing Sentry event payloads ------------------------------- You can set your application up to send data to a "fake sentry" like `Kent `__ and capture Sentry events to know exactly what data is getting sent and where in the payload it is. Sentry event schema ------------------- The schema for Sentry events is here: https://github.com/getsentry/sentry-data-schemas/blob/main/relay/event.schema.json You can validate Sentry event data using that. Sentry interface docs --------------------- Here are some interesting sections of the Sentry event: Breadcrumbs interface ~~~~~~~~~~~~~~~~~~~~~ https://develop.sentry.dev/sdk/event-payloads/breadcrumbs/ Breadcrumbs get added by Sentry integrations capturing various interesting things that happened before the Sentry event. To cut down on breadcrumbs, it's best to not include the relevant integrations. Fillmore lets you scrub breadcrumbs when Sentry events happen, but you might want to scrub breadcrumbs when they're being captured using a ``before_breadcrumbs`` function. https://docs.sentry.io/platforms/python/configuration/options/#before-breadcrumb Breadcrumbs tend to be free form, so Fillmore doesn't have a good scrubber for them--Fillmore scrubs the whole value or none of it. You'll either want to write your own scrub function that does what you need or you'll want to write a ``before_breadcrumbs`` function that fixes the breadcrumbs as they're captured. Contexts interface ~~~~~~~~~~~~~~~~~~ https://develop.sentry.dev/sdk/event-payloads/contexts/ This provides additional data about the environment the error happened in. Device, operating system, browser, gpu, etc. If one of the integrations you're using fills in some state context, that might be something to look into for scrubbing. Exception interface ~~~~~~~~~~~~~~~~~~~ Exception data: https://develop.sentry.dev/sdk/event-payloads/exception/ Stack trace data: https://develop.sentry.dev/sdk/event-payloads/stacktrace/ When Sentry captures unhandled exceptions, the exception information goes in this interface. It can have multiple stacktraces each of which consists of a stack of frames and related information. If your application handles sensitive data that can't go to a Sentry server, then you should make sure to shut off frame-local vars:: with_locals=False Otherwise, each frame can include variable names and values and it's really hard to scrub that effectively. Requests interface (for webapps) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ https://develop.sentry.dev/sdk/event-payloads/request/ Some things to know: 1. Different web frameworks capture the query string and cookies differently plus those two things can end up in multiple parts of the event. cookies This is stored in ``request.cookies`` as a string, a list of ``(name, value)`` tuples, or a dict. It can also show up in ``request.headers.Cookie`` as a string. Depending on the integrations used, if you specify:: send_default_pii=False then the cookie data may be an **empty string** regardless of whether there is cookie data or not. query string This is stored in ``request.query_string`` as a string, a list of ``(name, value)`` tuples, or a dict. It can also show up as a string in the ``request.url`` field value and in the repr of request objects in the stacktrace frames local-vars. 2. Request data is in ``request.data`` and may contain anything being submitted or uploaded. If users are submitting forms or uploading sensitive data, you might want to consider setting:: request_bodies="never" which will prevent the request data from being in the Sentry event. If you want to scrub it, you'll need to handle the fact that it could be bytes or a structured format depending on the integrations you have installed. 3. Request headers can include tokens, session information, and also information about your infrastructure. If you set:: send_default_pii=False then many of these headers are not added to the Sentry event. See the documentation (and possibly the code) for the integrations you're using. How do I debug Scrubbing problems? ================================== If the scrubbing code is kicking up exceptions, then Fillmore will log exceptions to the ``fillmore`` logger. Make sure to set up Python logging and set the ``fillmore`` logger to ``logging.ERROR``: .. [[[cog import cog cog.outl("\n.. code-block::\n") with open("examples/scrubber/fillmore_logging.py", "r") as fp: for line in fp: cog.out(f" {line}") cog.outl("") ]]] .. code-block:: # examples/scrubber/fillmore_logging.py import logging logging.getLogger("fillmore").setLevel(logging.ERROR) .. [[[end]]] How does it work? ================= The Python sentry-sdk generates Sentry events. Before sending the events, it passes the event to the function specified as the ``before_send`` handler when initializing Sentry. The ``before_send`` handler takes the Sentry event and a hint as arguments. The Fillmore Scrubber runs a series of Scrub Rules on the event producing an event with specified data scrubbed. The sentry-sdk then sends this scrubbed event to the Sentry server. .. seealso:: Filtering in sentry-sdk docs: https://docs.sentry.io/platforms/python/configuration/filtering/ Scrubbing data in sentry-sdk docs: https://docs.sentry.io/platforms/python/data-management/sensitive-data/