DataCater
Search
K

Writing custom transforms

You can extend the list of pre-built transforms and add your own custom transform to DataCater.
At startup time, DataCater loads all transform definitions from the directory configured in datacater.transforms.path.
You can extend the set of available transforms by adding a new folder to this directory. DataCater expects two files in this folder: transform.py and spec.yml.
Please note that, at the moment, you need to build your own Docker images for datacater/datacater and datacater/python-runner after adding new transforms.

transform.py

The file transform.py must provide a Python function transform that takes three parameters, value, row, and config:
  • value is the value of the attribute that the transform is applied to. The data type of value depends on the attribute.
  • row is a Python dict and provides access to all other attributes of the record. You can address these attributes by their name.
  • config is a Python dict and provides access to the configuration of the transform.
Please see the following code listing for the transform.py of the transform tokenize:
def transform(value, row, config):
token = config.get("token", " ")
return value.split(token)

spec.yml

The file spec.yml provides documentation for the transform. It provides the following options:
  • name is the name (or label) of the transform.
  • key is an internal, unique identifier of the transform.
  • description descibes the transform.
  • license specifies the license of the transform.
  • website links to the website or repository of the transform.
  • author provides information about the author of the transform. It is an object and consists of the keys name and email.
  • labels can be used to attach information to the transform.
  • config provides information about the available configuration for the transform
  • version defines the version of the transform.
Please see the following code listing for the spec.yml of the transform tokenize:
---
name : Tokenize
key : tokenize
description: Tokenize attribute using given token
license : BSL
website : https://github.com/DataCater/datacater/transforms/tokenize
author :
name: DataCater GmbH
labels :
input-types: string
config :
- name: token
label: Token
type: text
version : 1.0.0