DataCater
Search
K

Google Cloud BigQuery

Use change data capture to stream data from Google Cloud BigQuery tables to any data sink and transform them on the way.

Requirements

Please make sure that you have created a service account in Google Cloud, which is assigned to the primitive IAM role BigQuery Data Viewer on the level of the dataset and the primitive IAM roles BigQuery Job User and BigQuery Resource Viewer on the level of the project.

Configuration

This source connector supports the following configuration options:
Google Cloud credentials (JSON)
The content of the JSON-based credentials file provided by Google Cloud for the service account. The service account must have been assigned to the primitive IAM roles BigQuery Data Viewer (dataset level), BigQuery Job User (project level), and BigQuery ResourceViewer (project level).
Service account e-mail address
The e-mail address of the service account. We try to automatically extract the e-mail address from the provided Google Cloud credentials.
Project name
The name of the BigQuery project. We try to automatically extract the name of the BigQuery project from the provided Google Cloud credentials.
Dataset name
The name of the BigQuery dataset.
Table name
The name of the BigQuery table (or view). You may retrieve the list of tables (and views) available in the given BigQuery project and dataset by clicking on Fetch table names.
Change Data Capture mode
You may choose one of the following modes for change data capture:
  • BULK: Recurringly load all data from the BigQuery table. Basically no change data capture at all.
  • INCREMENTING: Use the primary key column, specified using the Primary key column configuration option, to recurringly extract new records. This mode does only extract INSERTs, but skips UPDATEs and DELETEs.
  • TIMESTAMP: Use the timestamp column, specified using the Timestamp column configuration option, to recurringly extract new and updated records. This mode does only extract INSERTs and UPDATEs, but skips DELETEs.
  • TIMESTAMP/INCREMENTING: Use the primary key column and the timestamp column, specified using the Primary key column and Timestamp column configuration options, to recurringly extract new and updated records. This mode does only extract INSERTs and UPDATEs, but skips DELETEs.
Primary key column
BigQuery does not natively support the concept of primary keys. Specifying a column, which can be used for uniquely identifying a row in BigQuery, allows DataCater to detect new records. Please make sure that this colum is never NULL.
Timestamp column
DataCater can use a timestamp column, which stores the time of the most recent update of a record, to detect record updates. Specifying the timestamp column is required when using TIMESTAMP or TIMESTAMP/INCREMENTING as Change Data Capture mode.
Sync interval
The interval in milliseconds between synchronizations of the BigQuery table and DataCater (default: 60000).