Google Cloud Storage
Use change data capture to stream flat files from a Google Cloud Storage folder to any data sink and transform them on the way.
At startup, the connector extracts data from all (matching) files from the given folder. After this initial sync, it watches the folder for new or updated files and syncs only relevant changes.
Please make sure that you have created a service account in Google Cloud, which is assigned to the primitive IAM role Storage Admin (or the permissions
storage.objects.list) on the level of the project.
This source connector supports the following configuration options:
Google Cloud credentials (JSON)
The content of the JSON-based credentials file provided by Google Cloud for the service account.
The name of the GCP project. We try to automatically extract the name of the GCP project from the provided Google Cloud credentials.
The name of the GCS bucket from which we shall extract files.
File name filter
Regular expression applied to files from the GCS bucket. Only files with a name matching the regular expression will be extracted. Default value:
.*(matches all file names).
The format of the extracted files. At the moment, this connector only supports CSV files.
CSV delimiter value
Only available for the file type CSV. The character that delimits different columns (default:
CSV quote character
Only available for the file format CSV. Character used for quotes (default:
CSV quote escape character
Only available for the file format CSV. Character used for escaping quotes (default:
CSV line separator
Only available for the file format CSV. String used for separating multiple lines (default:
CSV comment character
Only available for the file format CSV. Character used for comments. It must appear at the beginning of a line (default:
Generate attribute names from CSV header row
Only available for the file type CSV. Whether to use the first row of the CSV file for extracting attribute names or not. If this option is set to false, DataCater will generate attribute names based on the index of the attribute, and name them
Primary key column
Name of the attribute that uniquely identifies records, similar to a primary key in a database system.
Sync interval (s)
The interval in seconds between the synchronization of the GCS bucket and DataCater (default:
120). When synchronizing, DataCater consumes only those files from the GCS bucket, which have not yet been processed by the pipeline, allowing to implement change data capture to some degree.
DataCater automatically extends the set of attributes with the attribute
__datacater_file_nameand fills it with the name of the file.