DataCater
Search
K

Configuration of DataCater

The container image datacater/datacater implements DataCater's control plane and exposes a RESTful API. It accepts configuration via environment variables, which allows you to mix it with Kubernetes ConfigMaps and Secrets.
The following sections provide the available configuration options, including a description, their default values if applicable, and information on whether they are mandatory.

Deployments

The following configuration options are related to Deployments, which are internally powered by Kubernetes Deployments.

DATACATER_DEPLOYMENT_IMAGE

Container image name and version to use (default: datacater/pipeline:2023.1).

DATACATER_DEPLOYMENT_NAMESPACE

Kubernetes Namespace to use for deploying the deployment (default: default).

DATACATER_DEPLOYMENT_READY_SECONDS

Minimum ready seconds of the underlying Kubernetes Deployment (default: 2).

DATACATER_DEPLOYMENT_PULL_POLICY

Kubernetes pull policy of the underlying Kubernetes Deployment (default: IfNotPresent).

DATACATER_DEPLOYMENT_REPLICAS

Number of replicas of the underlying Kubernetes Deployment (default: 1). This resembles the number of deployment instances and enables you to parallelize the processing of your data.

DATACATER_DEPLOYMENT_HEALTH_PATH

Path of the health endpoint of deployments (default: /q/health).

DATACATER_DEPLOYMENT_METRICS_PATH

Path of the metrics endpoint of deployments (default: /q/metrics).

DATACATER_DEPLOYMENT_STATS_TIMEOUT

Timeout in milliseconds for requests to deployment statistics (default: 10000).

DATACATER_DEPLOYMENT_RESOURCES_REQUESTS_MEMORY

Memory request of the underlying Kubernetes Deployment (default: 300Mi).

DATACATER_DEPLOYMENT_RESOURCES_LIMITS_MEMORY

Memory limit of the underlying Kubernetes Deployment (default: 800Mi).

DATACATER_DEPLOYMENT_RESOURCES_REQUESTS_CPU

CPU request of the underlying Kubernetes Deployment (default: 0.1).

DATACATER_DEPLOYMENT_RESOURCES_LIMITS_CPU

CPU limit of the underlying Kubernetes Deployment (default: not set).

Python Runner

The following configuration options are related to DataCater 's Python runner, which is used for previewing and evaluating pipelines.

DATACATER_PYTHONRUNNER_IMAGE_NAMESPACE

Name of the Kubernetes Namespace that holds the Python Runner pool and service (default: default).

DATACATER_PYTHONRUNNER_SERVICENAME

Name of the Kubernetes Service that makes the Python Runner pool accessible (default: pythonrunner).

DATACATER_PYTHONRUNNER_PREVIEW_TIMEOUT

Timeout in milliseconds for single requests to the Python Runner pool when previewing pipelines (default: 10000).

DATACATER_PYTHONRUNNER_IMAGE_NAME

Name of the container image (default: datacater/python-runner).

DATACATER_PYTHONRUNNER_IMAGE_VERSION

Version/Tag of the container image (default: 2023.1).

Streams

The following configuration options are related to Streams, which are backed by Apache Kafka topics.

DATACATER_KAFKA_DEFAULT_NUM_PARTITIONS

DataCater uses this default value for the number of partitions when creating an Apache Kafka topic for a Stream that does not specify a number of partitions (default: 3).

DATACATER_KAFKA_DEFAULT_REPLICATION_FACTOR

DataCater uses this default value for the replication factor when creating an Apache Kafka topic for a Stream that does not specify a replication factor (default: 1).

DATACATER_KAFKA_API_TIMEOUT_MS

Timeout in milliseconds for all requests to Apache Kafka brokers (default: 5000).

Transforms

The following configuration options are related to Filters and Transforms.

DATACATER_FILTERS_PATH

Path to the local directory holding all filters. By default, filters are located at /datacater/filters in the container image.

DATACATER_TRANSFORMS_PATH

Path to the local directory holding all transforms. By default, transforms are located at /datacater/transforms in the container image.