BigQuery Writer programmer's reference
BigQuery Writer properties
property | type | default value | notes |
---|---|---|---|
Allow Quoted Newlines | Boolean | False | This property has been deprecated and has no effect. It will be removed in a future release. |
Batch Policy | String | eventCount:1000000, Interval:90 | This property defines the maximum amount of data Striim can accumulate and how much time can pass before Striim sends a batch of data to BigQuery. As discussed below, it is important that these settings do not individually or in combination result in your application exceeding BigQuery's quotas or limits. With the default setting, data will be written at least every 90 seconds or when the buffer accumulates 1,000,000 events. The batch policy includes eventCount and interval (see Setting output names and rollover / upload policies for syntax). Events are buffered locally on the Striim server and sent as a batch to the target every time either of the specified values is exceeded. When the app is stopped, any remaining data in the buffer is discarded. When Streaming Upload is True, Optimized Merge is False, and Mode is Merge, to reduce the amount of memory required on the Striim server portions of the data may be uploaded to the staging area before the batch policy has been reached. Once the batch policy is reached, the remaining data will be uploaded to the staging area and the whole batch will be written to the target. Similarly, when Streaming Upload is True and Mode is Append Only, to reduce the amount of memory required on the Striim server portions of the data may be written to the target before the batch policy has been reached. Once the batch policy is reached, the remaining data will be written to the target. To disable batching, set to When Streaming Upload is False, use Do not exceed BigQuery's quotas or limits (see Load jobs for the load method or Query jobs for the streaming method in the "Quotas and limits" section of Google's BigQuery documentation). For example, if you exceed the quota of batches per table per day day, BigQueryWriter will throw an exception such as error code 500, "An internal error occurred and the request could not be completed," and stop the application. To avoid this, reduce the number of batches by increasing the event count and/or interval. Contact Striim support if you need assistance in keeping within Google's quotas. When Optimized Merge is true, when an event includes a primary key update, the batch is sent to the target immediately, without waiting to reach the eventCount or interval. Monitoring reports and MON output for BigQuery Writer targets include Queued Batches Size Bytes, which reports the total current size of the buffer in bytes. |
Billing Project ID | String | Optionally, specify a project (different from the one specified in Project ID) to be billed for query execution. The service account associated with the specified Service Account Key must have the following permissions on the billing project:
| |
CDDL Action | String | Process | See Handling schema evolution. When the captured source DDL may include TRUNCATE commands, to avoid deletion of events in the target, place a CQ with the select statement |
Column Delimiter | String | | (UTF-8 007C) | Appears in Flow Designer only when Streaming Upload is False. When Streaming Upload is True, this setting has no effect. This property defines which character will be used to delimit fields in the delimited text files in which the adapter accumulates batched data. If the data will contain the |
Connection Profile Name | Enum | Appears in Flow Designer only when Use Connection Profile is True. See Introducing connection profiles. | |
Connection Retry Policy | String | totalTimeout=600, initialRetryDelay=10, retryDelayMultiplier=2.0, maxRetryDelay=60 , maxAttempts=5, jittered=True, initialRpcTimeout=10, rpcTimeoutMultiplier=2.0, maxRpcTimeout=30 | Altering these properties can have significant effects on performance and stability. Do not change unless instructed to by Striim support. |
Data Location | String | If necessary, specify the dataset's Data location property value, which specifies its Google Cloud region(s) (see BigQuery > Documentation > Resources > BigQuery locations). | |
Encoding | String | UTF-8 | This property selects the encoding for the delimited text files in which BigQueryWriter accumulates batched data. Currently the only supported encoding is UTF-8 (see "Encoding" in BigQuery > Documentation > Guides > Introduction to loading data > Choosing a data ingestion method). Appears in Flow Designer only when Streaming Upload is False. When Streaming Upload is True, this setting has no effect. |
Excluded Tables | String | When a wildcard is specified for Tables:'mydataset.%', ExcludedTables:'mydataset.ignore' | |
Ignorable Exception Code | String | Set to TABLE_NOT_FOUND to prevent the application from terminating when Striim tries to write to a table that does not exist in the target. See Handling "table not found" errors for more information. No other values are supported. Strim writes ignored exceptions to the application's exception store (see CREATE EXCEPTIONSTORE). | |
Include Insert ID | Boolean | True | When Streaming Upload is False, this setting is ignored, and is not displayed in the Flow Designer. When using the Storage Write API, this setting is ignored. When using the legacy streaming API:
|
Mode | String | APPENDONLY | |
Null Marker | String | NULL | When Streaming Upload is False, this property specifies a string to be inserted into fields in the delimited text files in which BigQueryWriter accumulates batched data to indicate that a field has a null value. These are converted back to nulls in the target tables. If any field might contain the string Appears in Flow Designer only when Streaming Upload is False. When Streaming Upload is True, this setting has no effect. |
Optimized Merge | Boolean | false | When the source events will always include full records, leave this set to False. Set to True only when Mode is MERGE, BigQuery Writer's input stream is the output of an HP NonStop reader, MySQL Reader, or Oracle Reader source, and the source events will include partial records. (For example, with Oracle Reader, when supplemental logging has not been enabled for all columns, partial records are sent for updates.) |
Parallel Threads | Integer | ||
Private Service Connect Endpoint | String | If using Private Service Connect with BigQuery, specify the endpoint created in the target Virtual Private Cloud, as described in Private Service Connect support for Google cloud adapters. | |
Project Id | String | Specify the BigQuery project ID of the dataset. See Get the project ID for BigQuery. | |
Quote Character | String |
| This property specifies the character(s) used to escape field values in the delimited text files in which the adapter accumulates batches. If the data will contain the Appears in Flow Designer only when Streaming Upload is False. When Streaming Upload is True, this setting has no effect. |
Service Account Key | String | Specify the path (from root, the Striim program directory, or the UploadedFiles directory) and file name of the .json credentials file downloaded from Google (see Download the service account key). | |
Standard SQL | Boolean | True | With the default setting of True, BigQuery Writer constrains timestamp values to legacy SQL. Set to False to use GoogleSQL. See BigQuery > Documentation > Reference > Migrating to GoogleSQL for more information. Do not set to false if you have partitioned your target tables (see Create the dataset(s) and target tables in BigQuery; partitioning tables). |
Streaming Configuration | String | MaxRequestSizeInMB=5, MaxParallelRequests=10, ApplicationCreatedStreamMode=None, UseLegacyStreamingApi=False | This property enables you to configure various options when using the Storage Write API or legacy streaming API. When Streaming Upload is False, these settings are ignored, and are not displayed in the Flow Designer. For best performance, adjust the values of the sub-properties so as not to exceed Google's quotas (see BigQuery > Documentation > Resources > Quotas and limits). If you need assistance in keeping within Google's quotas, Contact Striim support. When using the Storage Write API (that is, when UseLegacyStreamingApi=False):
When using the legacy streaming API (that is, when UseLegacyStreamingApi=True):
|
Streaming Upload | Boolean | False | With the default value of False, the writer uses the load method. Set to True to use a streaming API. See Choose which writing method to use. When Streaming Upload is False, BigQuery Writer cannot write to arrays (see BigQuery > Documentation > Guides > Loading CSV data from Cloud Storage > Limitations). |
Tables | String | Specify the name(s) of the table(s) to write to, in the format If the source is Database Reader and its Create Schema property is True, the specified dataset(s) and table(s) will be created in the target (see the discussion of Create Schema in Database Reader properties for more details). Otherwise, the table(s) must exist in BigQuery when the application is started, and if a specified target table does not exist, the application will terminate with an error. To skip writes to missing tables without terminating, specify TABLE_NOT_FOUND as an Ignorable Exception Code. When the target's input stream is a user-defined event, specify a single table. When the input stream is the output of a DatabaseReader, IncrementalBatchReader, or SQL CDC source (that is, when replicating data from one database to another), BigQuery Writer can write to multiple tables. In this case, specify the names of both the source and target tables. Use the source.emp,target.emp source.db1,target.db1;source.db2,target.db2 source.%,target.% source.mydatabase.emp%,target.mydb.% source1.%,target1.%;source2.%,target2.% MySQL and Oracle names are case-sensitive, SQL Server names are not. Specify names as When the columns in the source and target are not in the same order, the application will halt, even if the column names are the same. To avoid this, use ColumnMap to map at least one column. See Mapping columns for additional options. | |
Transport Options | String | connectTimeout=300, readTimeout=120 | Sets HTTP transport timeout options in seconds (see Java > Documentation > Reference > Class HttpTransportOptions when using the legacy streaming API or Load method. With the default setting, the connect timeout is five minutes and the read timeout is two minutes. This property is ignored when using the Storage Write API. |
Use Connection Profile | Boolean | False |
BigQuery Writer data type support and correspondence
The following tables describe which source data types are supported by BigQuery Writer and how the source data types are mapped to BigQuery data types. See also Data type support & mapping for schema conversion & evolution.
with an input stream of a user-defined type
Striim type | BigQuery type |
---|---|
java.lang.Byte, java.lang.Byte[] | BYTES |
java.lang.Double | FLOAT |
java.lang.Float | FLOAT |
java.lang.Integer | INTEGER |
java.lang.Long | INTEGER |
java.lang.Short | INTEGER |
java.lang.String | STRING |
with an Oracle source
Oracle type | BigQuery type |
---|---|
BFILE | unsupported |
BINARY_DOUBLE | NUMERIC |
BINARY_FLOAT | FLOAT64 |
BLOB | BYTES |
CHAR | STRING |
CLOB | STRING An insert or update that contains a column of this type generates two CDC log entries: an insert or update in which the value for this column is null, followed by an update including the value. |
DATE | DATE |
FLOAT | FLOAT64 |
INTERVALDAYTOSECOND | STRING |
INTERVALYEARTOMONTH | STRING |
LONG | unsupported |
LONG RAW | unsupported |
NCHAR | STRING |
NCLOB | STRING |
NESTED TABLE | unsupported |
NUMBER | NUMERIC |
NVARCHAR2 | STRING |
RAW | BYTES |
ROWID | unsupported |
TIMESTAMP | DATETIME |
TIMESTAMP WITHLOCALTIMEZONE | TIMESTAMP |
TIMESTAMP WITHTIMEZONE | TIMESTAMP |
UROWID | unsupported |
VARCHAR2 | STRING |
VARRAY | unsupported |
XMLTYPE | unsupported |
with a PostgreSQL source
PostgreSQL type | BigQuery type |
---|---|
bigint | INT64 |
bigserial | INT64 |
bit | unsupported |
bit varying | unsupported |
boolean | BOOLEAN |
box | unsupported |
bytea | BYTES |
character | STRING |
character varying | STRING |
cidr | unsupported |
circle | unsupported |
date | STRING |
double precision | FLOAT64 |
inet | INT64 |
integer | INT64 |
int2 | INT64 |
int4 | INT64 |
int4range | STRING |
int8 | LONG |
int8range | STRING |
integer | INTEGER |
interval | STRING |
json | STRING |
jsonb | STRING |
line | unsupported |
lseg | unsupported |
macaddr | unsupported |
money | unsupported |
numeric | NUMERIC |
path | unsupported |
pg_lan | unsupported |
point | unsupported |
polygon | unsupported |
real | REAL |
smallint | INT64 |
smallserial | INT64 |
serial | INT64 |
text | STRING |
time | STRING |
time with time zone | TIMESTAMP |
timestamp | DATETIME |
timestamp with time zone | TIMESTAMP |
tsquery | unsupported |
tsvector | unsupported |
txid_snapshot | unsupported |
uuid | unsupported |
xml | STRING |