DataCater
Search
K

Filtering out records

Pipelines can apply transforms and filters to records.
If you specify a filter without combining it with a transform, records that do not pass the filter are filtered out and will not be published to the sink stream.
If you combine a filter with a transform, the transform is only applied to those records that pass the filter.
Let us have a look at the following spec of a pipeline that applies a custom Python filter to the field author, which holds JSON objects:
spec:
steps:
- kind: Field
fields:
author:
filter:
key: user-defined-filter
config:
code: |-
def filter(field, record: dict) -> bool:
return field["email"].endswith("@apache.org")
The Python filter checks if the key email of the JSON object ends with the suffix @apache.org. Records that do not pass this filter are filtered out.

Previewing filters

By default, the Pipeline Designer does not show records that are filtered out:
Applying a custom Python function to filter out records.
If you are interested in viewing records that do not pass the filters, you can open the Preview Settings and set the option Filtered out records to Show. In this mode, the Pipeline Designer shows records that are filtered out by the previewed step in gray:
The Pipeline Designer allows you to show records that are filtered out by the previewed step.