Skip to main content

Striim Platform 5.2.0 documentation

BigQuery Writer programmer's reference

BigQuery Writer properties

property	type	default value	notes
Allow Quoted Newlines	Boolean	False	This property has been deprecated and has no effect. It will be removed in a future release.
Batch Policy	String	eventCount:1000000, Interval:90	This property defines the maximum amount of data Striim can accumulate and how much time can pass before Striim sends a batch of data to BigQuery. As discussed below, it is important that these settings do not individually or in combination result in your application exceeding BigQuery's quotas or limits. With the default setting, data will be written at least every 90 seconds or when the buffer accumulates 1,000,000 events. The batch policy includes eventCount and interval (see Setting output names and rollover / upload policies for syntax). Events are buffered locally on the Striim server and sent as a batch to the target every time either of the specified values is exceeded. When the app is stopped, any remaining data in the buffer is discarded. When Streaming Upload is True, Optimized Merge is False, and Mode is Merge, to reduce the amount of memory required on the Striim server portions of the data may be uploaded to the staging area before the batch policy has been reached. Once the batch policy is reached, the remaining data will be uploaded to the staging area and the whole batch will be written to the target. Similarly, when Streaming Upload is True and Mode is Append Only, to reduce the amount of memory required on the Striim server portions of the data may be written to the target before the batch policy has been reached. Once the batch policy is reached, the remaining data will be written to the target. To disable batching, set to `EventCount:1,Interval:0`. This may be useful when developing and debugging an application, but would result in unacceptable performance in production. When Streaming Upload is False, use `Interval:60` so as not to exceed the quota for 1500 a day (60 seconds = one minute, and there are 1,440 minutes in a day). Limits are subject to change by Google. Do not exceed BigQuery's quotas or limits (see Load jobs for the load method or Query jobs for the streaming method in the "Quotas and limits" section of Google's BigQuery documentation). For example, if you exceed the quota of batches per table per day day, BigQueryWriter will throw an exception such as error code 500, "An internal error occurred and the request could not be completed," and stop the application. To avoid this, reduce the number of batches by increasing the event count and/or interval. Contact Striim support if you need assistance in keeping within Google's quotas. When Optimized Merge is true, when an event includes a primary key update, the batch is sent to the target immediately, without waiting to reach the eventCount or interval. Monitoring reports and MON output for BigQuery Writer targets include Queued Batches Size Bytes, which reports the total current size of the buffer in bytes.
Billing Project ID	String		Optionally, specify a project (different from the one specified in Project ID) to be billed for query execution. The service account associated with the specified Service Account Key must have the following permissions on the billing project: bigquery.jobs.create bigquery.jobs.get bigquery.jobs.list bigquery.jobs.listAll
CDDL Action	String	Process	See Handling schema evolution. When the captured source DDL may include TRUNCATE commands, to avoid deletion of events in the target, place a CQ with the select statement `SELECT * FROM <input stream name> WHERE META(x, OperationName).toString() != 'Truncate';` (replacing `<input stream name>` with the name of the writer's input stream) between the reader and BigQuery Writer. Note that the target will have no record of affected events being deleted.
Column Delimiter	String	\| (UTF-8 007C)	Appears in Flow Designer only when Streaming Upload is False. When Streaming Upload is True, this setting has no effect. This property defines which character will be used to delimit fields in the delimited text files in which the adapter accumulates batched data. If the data will contain the `\|` (pipe) character, change the default value to a character that will not appear in the data. See BigQuery > Documentation > Guides > Loading CSV data from Cloud Storage > Details of loading CSV data > Field delimiters for more information and Google's recommendations.
Connection Profile Name	Enum		Appears in Flow Designer only when Use Connection Profile is True. See Introducing connection profiles.
Connection Retry Policy	String	totalTimeout=600, initialRetryDelay=10, retryDelayMultiplier=2.0, maxRetryDelay=60 , maxAttempts=5, jittered=True, initialRpcTimeout=10, rpcTimeoutMultiplier=2.0, maxRpcTimeout=30	Altering these properties can have significant effects on performance and stability. Do not change unless instructed to by Striim support.
Data Location	String		If necessary, specify the dataset's Data location property value, which specifies its Google Cloud region(s) (see BigQuery > Documentation > Resources > BigQuery locations).
Encoding	String	UTF-8	This property selects the encoding for the delimited text files in which BigQueryWriter accumulates batched data. Currently the only supported encoding is UTF-8 (see "Encoding" in BigQuery > Documentation > Guides > Introduction to loading data > Choosing a data ingestion method). Appears in Flow Designer only when Streaming Upload is False. When Streaming Upload is True, this setting has no effect.
Excluded Tables	String		When a wildcard is specified for `Tables`, you may specify here any tables you wish to exclude from the query. Specify the value exactly as for `Tables`. For example, to include data from all tables in `mydataset` except the table named `ignore`: Tables:'mydataset.%', ExcludedTables:'mydataset.ignore'
Ignorable Exception Code	String		Set to TABLE_NOT_FOUND to prevent the application from terminating when Striim tries to write to a table that does not exist in the target. See Handling "table not found" errors for more information. No other values are supported. Strim writes ignored exceptions to the application's exception store (see CREATE EXCEPTIONSTORE).
Include Insert ID	Boolean	True	When Streaming Upload is False, this setting is ignored, and is not displayed in the Flow Designer. When using the Storage Write API, this setting is ignored. When using the legacy streaming API: When Mode is APPENDONLY and Streaming Upload is True, with the default setting of True, BigQuery adds a unique ID to every row. Set to False if you prefer that BigQuery not add unique IDs. For more information, BigQuery > Documentation > Guides > Use the legacy streaming API > Best effort de-duplication > Disabling best effort de-duplication. When Mode is MERGE, you may set this to False as Striim will de-duplicate the events before writing them to the target.
Mode	String	APPENDONLY	See Choose which writing mode to use.
Null Marker	String	NULL	When Streaming Upload is False, this property specifies a string to be inserted into fields in the delimited text files in which BigQueryWriter accumulates batched data to indicate that a field has a null value. These are converted back to nulls in the target tables. If any field might contain the string `NULL`, change this to a sequence of characters that will not appear in the data. Appears in Flow Designer only when Streaming Upload is False. When Streaming Upload is True, this setting has no effect.
Optimized Merge	Boolean	false	When the source events will always include full records, leave this set to False. Set to True only when Mode is MERGE, BigQuery Writer's input stream is the output of an HP NonStop reader, MySQL Reader, or Oracle Reader source, and the source events will include partial records. (For example, with Oracle Reader, when supplemental logging has not been enabled for all columns, partial records are sent for updates.)
Parallel Threads	Integer		See Creating multiple writer instances (parallel threads).
Private Service Connect Endpoint	String		If using Private Service Connect with BigQuery, specify the endpoint created in the target Virtual Private Cloud, as described in Private Service Connect support for Google cloud adapters.
Project Id	String		Specify the BigQuery project ID of the dataset. See Get the project ID for BigQuery.
Quote Character	String	`"`	This property specifies the character(s) used to escape field values in the delimited text files in which the adapter accumulates batches. If the data will contain the `"` (UTF-8 0022) character, change the value to a sequence of characters that will not appear in the data. Appears in Flow Designer only when Streaming Upload is False. When Streaming Upload is True, this setting has no effect.
Service Account Key	String		Specify the path (from root, the Striim program directory, or the UploadedFiles directory) and file name of the .json credentials file downloaded from Google (see Download the service account key).
Standard SQL	Boolean	True	With the default setting of True, BigQuery Writer constrains timestamp values to legacy SQL. Set to False to use GoogleSQL. See BigQuery > Documentation > Reference > Migrating to GoogleSQL for more information. Do not set to false if you have partitioned your target tables (see Create the dataset(s) and target tables in BigQuery; partitioning tables).
Streaming Configuration	String	MaxRequestSizeInMB=5, MaxParallelRequests=10, ApplicationCreatedStreamMode=None, UseLegacyStreamingApi=False	This property enables you to configure various options when using the Storage Write API or legacy streaming API. When Streaming Upload is False, these settings are ignored, and are not displayed in the Flow Designer. For best performance, adjust the values of the sub-properties so as not to exceed Google's quotas (see BigQuery > Documentation > Resources > Quotas and limits). If you need assistance in keeping within Google's quotas, Contact Striim support. When using the Storage Write API (that is, when UseLegacyStreamingApi=False): ApplicationCreatedStreamMode is active only when using the Storage Write API. See BigQuery > Documentation > Guides > Batch load and stream data with BigQuery Storage Write API > Overview of the Storage Write API> Application-created streams for more information on "Committed type" application-created streams (Striim does not support the other types). With the default value of None, the Storage Write API's default mode is used. In this mode, Striim guarantees at-least-once processing (A1P), which means that after connection retries or recovery, the target may have duplicate records, but none will be missing. If you have frequent retries due to transient connectivity issues resulting in numerous duplicates, set to ApplicationCreatedStreamMode=CommittedMode to reduce the number of duplicates. MaxRequestSizeInMB: size in MB which denotes the maximum size of each streaming request with maximum value of 10 MB. See BigQuery > Documentation > Resources > Quotas and limits > API quotas and limits > Storage Write API. MaxParallelRequests in default mode only (ignored in Committed Mode) sets the maximum number of concurrent connections Striim will create to write to BigQuery. Setting a higher number will decrease the time required to write each streaming request. When the input for BigQuery Writer is from a CDC source and the mode is Append Only, set MaxParallelRequests to `1` to preserve the sequence of events. This will degrade performance, so we do not recommend setting MaxParallelRequests=1 in other situations. When using the legacy streaming API (that is, when UseLegacyStreamingApi=True): MaxRequestSizeInMB: maximum size in MB of each streaming request. The maximum value is 10. Changing to a value higher than 5 may cause “Invalid Errors” since internal metadata will be added along with the payload. See BigQuery > Documentation > Resources > Quotas and limits > Streaming inserts. MaxRecordsPerRequest: use only when UseLegacyStreamingApi is True. We recommend setting this to the "Maximum rows per request" limit (see BigQuery > Documentation > Resources > Quotas and limits > Streaming inserts). Limits are subject to change by Google. MaxParallelRequests sets the maximum number of concurrent connections Striim will create to write to BigQuery. Setting a higher number will decrease the time required to write each streaming request. When the input for BigQuery Writer is from a CDC source and the mode is Append Only, set MaxParallelRequests to `1` to preserve the sequence of events. This will degrade performance, so we do not recommend setting MaxParallelRequests=1 in other situations.
Streaming Upload	Boolean	False	With the default value of False, the writer uses the load method. Set to True to use a streaming API. See Choose which writing method to use. When Streaming Upload is False, BigQuery Writer cannot write to arrays (see BigQuery > Documentation > Guides > Loading CSV data from Cloud Storage > Limitations).
Tables	String		Specify the name(s) of the table(s) to write to, in the format `<dataset>.<table>`. Dataset and table names are case-sensitive. If the source is Database Reader and its Create Schema property is True, the specified dataset(s) and table(s) will be created in the target (see the discussion of Create Schema in Database Reader properties for more details). Otherwise, the table(s) must exist in BigQuery when the application is started, and if a specified target table does not exist, the application will terminate with an error. To skip writes to missing tables without terminating, specify TABLE_NOT_FOUND as an Ignorable Exception Code. When the target's input stream is a user-defined event, specify a single table. When the input stream is the output of a DatabaseReader, IncrementalBatchReader, or SQL CDC source (that is, when replicating data from one database to another), BigQuery Writer can write to multiple tables. In this case, specify the names of both the source and target tables. Use the `%` wildcard only for tables, not for schemas, databases, or datasets. If the reader uses three-part names, you must use them here as well. Note that Oracle CDB/PDB source table names must be specified in two parts when the source is Database Reader or Incremental Batch reader (`schema.%,schema.%`) but in three parts when the source is Oracle Reader or OJet ((`database.schema.%,schema.%`). Note that SQL Server source table names must be specified in three parts when the source is Database Reader or Incremental Batch Reader (`database.schema.%,schema.%`) but in two parts when the source is MS SQL Reader or MS Jet (`schema.%,schema.%`). Examples: source.emp,target.emp source.db1,target.db1;source.db2,target.db2 source.%,target.% source.mydatabase.emp%,target.mydb.% source1.%,target1.%;source2.%,target2.% MySQL and Oracle names are case-sensitive, SQL Server names are not. Specify names as `<schema name>.<table name>` for MySQL and Oracle and as `<database name>.<schema name>.<table name>` for SQL Server. When the columns in the source and target are not in the same order, the application will halt, even if the column names are the same. To avoid this, use ColumnMap to map at least one column. See Mapping columns for additional options.
Transport Options	String	connectTimeout=300, readTimeout=120	Sets HTTP transport timeout options in seconds (see Java > Documentation > Reference > Class HttpTransportOptions when using the legacy streaming API or Load method. With the default setting, the connect timeout is five minutes and the read timeout is two minutes. This property is ignored when using the Storage Write API.
Use Connection Profile	Boolean	False	See Introducing connection profiles.

BigQuery Writer data type support and correspondence

The following tables describe which source data types are supported by BigQuery Writer and how the source data types are mapped to BigQuery data types. See also Data type support & mapping for schema conversion & evolution.

with an input stream of a user-defined type

Striim type	BigQuery type
java.lang.Byte, java.lang.Byte[]	BYTES
java.lang.Double	FLOAT
java.lang.Float	FLOAT
java.lang.Integer	INTEGER
java.lang.Long	INTEGER
java.lang.Short	INTEGER
java.lang.String	STRING

with an Oracle source

Oracle type	BigQuery type
BFILE	unsupported
BINARY_DOUBLE	NUMERIC
BINARY_FLOAT	FLOAT64
BLOB	BYTES
CHAR	STRING
CLOB	STRING An insert or update that contains a column of this type generates two CDC log entries: an insert or update in which the value for this column is null, followed by an update including the value.
DATE	DATE
FLOAT	FLOAT64
INTERVALDAYTOSECOND	STRING
INTERVALYEARTOMONTH	STRING
LONG	unsupported
LONG RAW	unsupported
NCHAR	STRING
NCLOB	STRING
NESTED TABLE	unsupported
NUMBER	NUMERIC
NVARCHAR2	STRING
RAW	BYTES
ROWID	unsupported
TIMESTAMP	DATETIME
TIMESTAMP WITHLOCALTIMEZONE	TIMESTAMP
TIMESTAMP WITHTIMEZONE	TIMESTAMP
UROWID	unsupported
VARCHAR2	STRING
VARRAY	unsupported
XMLTYPE	unsupported

with a PostgreSQL source

PostgreSQL type	BigQuery type
bigint	INT64
bigserial	INT64
bit	unsupported
bit varying	unsupported
boolean	BOOLEAN
box	unsupported
bytea	BYTES
character	STRING
character varying	STRING
cidr	unsupported
circle	unsupported
date	STRING
double precision	FLOAT64
inet	INT64
integer	INT64
int2	INT64
int4	INT64
int4range	STRING
int8	LONG
int8range	STRING
integer	INTEGER
interval	STRING
json	STRING
jsonb	STRING
line	unsupported
lseg	unsupported
macaddr	unsupported
money	unsupported
numeric	NUMERIC
path	unsupported
pg_lan	unsupported
point	unsupported
polygon	unsupported
real	REAL
smallint	INT64
smallserial	INT64
serial	INT64
text	STRING
time	STRING
time with time zone	TIMESTAMP
timestamp	DATETIME
timestamp with time zone	TIMESTAMP
tsquery	unsupported
tsvector	unsupported
txid_snapshot	unsupported
uuid	unsupported
xml	STRING

In this section:

Would you like to provide feedback? Just click here to suggest edits.