Writing custom transforms
You can extend the list of pre-built transforms and add your own custom transform to DataCater.
At startup time, DataCater loads all transform definitions from the directory configured in
datacater.transforms.path
.You can extend the set of available transforms by adding a new folder to this directory. DataCater expects two files in this folder:
transform.py
and spec.yml
.Please note that, at the moment, you need to build your own Docker images for datacater/datacater and datacater/python-runner after adding new transforms.
The file
transform.py
must provide a Python function transform
that takes three parameters, value
, row
, and config
:value
is the value of the attribute that the transform is applied to. The data type ofvalue
depends on the attribute.row
is a Python dict and provides access to all other attributes of the record. You can address these attributes by their name.config
is a Python dict and provides access to the configuration of the transform.
Please see the following code listing for the
transform.py
of the transform tokenize
:def transform(value, row, config):
token = config.get("token", " ")
return value.split(token)
The file
spec.yml
provides documentation for the transform. It provides the following options:name
is the name (or label) of the transform.key
is an internal, unique identifier of the transform.description
descibes the transform.license
specifies the license of the transform.website
links to the website or repository of the transform.author
provides information about the author of the transform. It is an object and consists of the keysname
andemail
.labels
can be used to attach information to the transform.config
provides information about the available configuration for the transformversion
defines the version of the transform.
Please see the following code listing for the
spec.yml
of the transform tokenize
:---
name : Tokenize
key : tokenize
description: Tokenize attribute using given token
license : BSL
website : https://github.com/DataCater/datacater/transforms/tokenize
author :
email: [email protected]
name: DataCater GmbH
labels :
input-types: string
config :
- name: token
label: Token
type: text
version : 1.0.0