Scrubber
Quickstart
Create a
fillmore.scrubber.Scrubber
instance.Pass it a set of
fillmore.scrubber.Rule
instances specifying things to scrub and how to scrub them.
For example, lets say you wanted to remove the Auth-Token
and
X-Forwarded-For
headers, code
and state
cookies, and also any
frame-local variables with the name password
.
Example:
from fillmore.scrubber import Scrubber, Rule, build_scrub_cookies
# Create a Scrubber
scrubber = Scrubber(
rules=[
Rule(
path="request",
keys=["cookies"],
# build_scrub_cookies builds a scrub function that handles the
# different possible shapes of the value, scrubs the specified
# bits, and returns the same shape
scrub=build_scrub_cookies(params=["code", "state"]),
),
Rule(
path="request.headers",
keys=["Auth-Token", "X-Forwarded-For"],
# You can specify scrub functions as functions or Python dotted
# paths
scrub="scrub",
),
Rule(
path="exception.values.[].stacktrace.frames.[].vars",
keys=["username"],
scrub="scrub",
),
],
)
Things to know about scrubbing:
The Scrubber can take any number of rules.
Rules are executed in order.
If the rule specifies data that doesn’t exist in the Sentry event, then the rule won’t be run.
Anything scrubbed by Fillmore scrub functions has the value
[Scrubbed]
. You can distinguish this from things scrubbed by sentry_sdk or Sentry server which use[Filtered]
.
Note
If traversing the Sentry event for the data to be scrubbed or the scrub rule
kicks up an error, Fillmore will log the exception to the
fillmore.scrubber
logger. Make sure to set up the fillmore
logger
and set the level to logging.ERROR
when setting up Python logging.
How do I know what data to scrub?
Sentry maintains documentation on the event payload as well as a schema.
Capturing Sentry event payloads
You can set your application up to send data to a “fake sentry” like Kent and capture Sentry events to know exactly what data is getting sent and where in the payload it is.
Sentry event schema
The schema for Sentry events is here:
https://github.com/getsentry/sentry-data-schemas/blob/main/relay/event.schema.json
You can validate Sentry event data using that.
Sentry interface docs
Here are some interesting sections of the Sentry event:
Contexts interface
https://develop.sentry.dev/sdk/event-payloads/contexts/
This provides additional data about the environment the error happened in. Device, operating system, browser, gpu, etc.
If one of the integrations you’re using fills in some state context, that might be something to look into for scrubbing.
Exception interface
Exception data:
https://develop.sentry.dev/sdk/event-payloads/exception/
Stack trace data:
https://develop.sentry.dev/sdk/event-payloads/stacktrace/
When Sentry captures unhandled exceptions, the exception information goes in this interface. It can have multiple stacktraces each of which consists of a stack of frames and related information.
If your application handles sensitive data that can’t go to a Sentry server, then you should make sure to shut off frame-local vars:
with_locals=False
Otherwise, each frame can include variable names and values and it’s really hard to scrub that effectively.
Requests interface (for webapps)
https://develop.sentry.dev/sdk/event-payloads/request/
Some things to know:
Different web frameworks capture the query string and cookies differently plus those two things can end up in multiple parts of the event.
- cookies
This is stored in
request.cookies
as a string, a list of(name, value)
tuples, or a dict.It can also show up in
request.headers.Cookie
as a string.Depending on the integrations used, if you specify:
send_default_pii=False
then the cookie data may be an empty string regardless of whether there is cookie data or not.
- query string
This is stored in
request.query_string
as a string, a list of(name, value)
tuples, or a dict.It can also show up as a string in the
request.url
field value and in the repr of request objects in the stacktrace frames local-vars.
Request data is in
request.data
and may contain anything being submitted or uploaded.If users are submitting forms or uploading sensitive data, you might want to consider setting:
request_bodies="never"
which will prevent the request data from being in the Sentry event.
If you want to scrub it, you’ll need to handle the fact that it could be bytes or a structured format depending on the integrations you have installed.
Request headers can include tokens, session information, and also information about your infrastructure.
If you set:
send_default_pii=False
then many of these headers are not added to the Sentry event. See the documentation (and possibly the code) for the integrations you’re using.
How do I debug Scrubbing problems?
If the scrubbing code is kicking up exceptions, then Fillmore will log
exceptions to the fillmore
logger. Make sure to set up Python logging
and set the fillmore
logger to logging.ERROR
:
import logging
logging.getLogger("fillmore").setLevel(logging.ERROR)
How does it work?
The Python sentry-sdk generates Sentry events. Before sending the events, it
passes the event to the function specified as the before_send
handler
when initializing Sentry.
The before_send
handler takes the Sentry event and a hint as arguments.
The Fillmore Scrubber runs a series of Scrub Rules on the event producing an event with specified data scrubbed.
The sentry-sdk then sends this scrubbed event to the Sentry server.
See also
- Filtering in sentry-sdk docs:
https://docs.sentry.io/platforms/python/configuration/filtering/
- Scrubbing data in sentry-sdk docs:
https://docs.sentry.io/platforms/python/data-management/sensitive-data/