Filtering out records
Pipelines can apply transforms and filters to records.
If you specify a filter without combining it with a transform, records that do not pass the filter are filtered out and will not be published to the sink stream.
If you combine a filter with a transform, the transform is only applied to those records that pass the filter.
Let us have a look at the following spec of a pipeline that applies a custom Python filter to the field
author, which holds JSON objects:
- kind: Field
def filter(field, record: dict) -> bool:
The Python filter checks if the key
@apache.org. Records that do not pass this filter are filtered out.
By default, the Pipeline Designer does not show records that are filtered out:
Applying a custom Python function to filter out records.
If you are interested in viewing records that do not pass the filters, you can open the Preview Settings and set the option
Filtered out recordsto
Show. In this mode, the Pipeline Designer shows records that are filtered out by the previewed step in gray:
The Pipeline Designer allows you to show records that are filtered out by the previewed step.