Databricks Writer programmer's reference

Databricks Writer properties

property	type	default value	notes
Authentication Type	enum	PersonalAccessToken	Appears in Flow Designer only when Connection Profile is False. The simplest way to use Microsoft Entra ID (OAuth) is to create a connection profile (see Introducing connection profiles). Alternatively, select Manual OAuth and follow the instructions in Configuring Microsoft Entra ID (formerly Azure Active Directory) for Databricks Writer manually. In that case, specify Client ID, Client Secret, Refresh Token, and Tenant ID (see Configuring Microsoft Entra ID (formerly Azure Active Directory) for Databricks Writer manually). For simpler configuration, create a connection profile (see Introducing connection profiles). With the default setting PersonalAccessToken, Striim's connection to Databricks is authenticated using the token specified in Personal Access Token.
CDDL Action	enum	Process	See Handling schema evolution. If TRUNCATE commands may be entered in the source and you do not want to delete events in the target, precede the writer with a CQ with the select statement `SELECT * FROM <input stream name> WHERE META(x, OperationName).toString() != 'Truncate';` (replacing `<input stream name>` with the name of the writer's input stream). Note that there will be no record in the target that the affected events were deleted.
Client ID	string		Appears in Flow Designer only when Connection Profile is False and Authentication Type is Manual OAuth. This property is required when Manual OAuth is selected as the value of the Authentication Type property.
Client Secret	encrypted password		Appears in Flow Designer only when Connection Profile is False and Authentication Type is Manual OAuth. This property is required when Manual OAuth is selected as the value of the Authentication Type property.
Connection Profile Name	Enum		Appears in Flow Designer only when Use Connection Profile is True. See Introducing connection profiles.
Connection Retry Policy	String	initialRetryDelay=10s, retryDelayMultiplier=2, maxRetryDelay=1m, maxAttempts=5, totalTimeout=10m	Do not change unless instructed to by Striim support.
Connection URL	String		Appears in Flow Designer only when Connection Profile is False. Provide the JDBC URL from the JDBC/ODBC tab of the Databricks cluster's Advanced options (see Get connection details for a cluster). If the URL starts with `jdbc:spark://` change that to `jdbc:databricks://` (this is required by the upgraded driver bundled with Striim).
External Stage Connection Profile Name	enum		Appears in Flow Designer only when Use Connection Profile is True and External Stage Type is ADLSGen2 or S3. Select or specify the name of the connection profile for the external stage. (When Databricks Writer uses a connection profile, you must use a connection profile for ADLSGen2 or S3 as well.)
External Stage Type	enum	`DBFSROOT`	Set to ADLSGen2 or S3 to match the stage type you chose in Choose which staging area to use. Note Personal staging locations have been deprecated by AWS (see Create metastore-level storage) and Microsoft (see Create metastore-level storage).
Ignorable Exception Code	String		Set to TABLE_NOT_FOUND to prevent the application from terminating when Striim tries to write to a table that does not exist in the target. See Handling "table not found" errors for more information. Ignored exceptions will be written to the application's exception store (see CREATE EXCEPTIONSTORE).
Mode	enum	AppendOnly	Set to Merge if that was your choice in Choose which writing mode to use.
Optimized Merge	Boolean	false	Appears in Flow Designer only when Mode is Merge. Set to True only when Mode is MERGE and the target's input stream is the output of an HP NonStop reader, MySQL Reader, or Oracle Reader source and the source events will include partial records. For example, with Oracle Reader, when supplemental logging has not been enabled for all columns, partial records are sent for updates. When the source events will always include full records, leave this set to false.
Parallel Threads	Integer		Not supported when Mode is Merge. See Creating multiple writer instances (parallel threads).
Personal Access Token	encrypted password		Appears in Flow Designer only when Connection Profile is False and Authentication Type is Personal Access Token. Used to authenticate with the Databricks cluster (see Generate a personal access token). The user associated with the token must have read and write access to DBFS (see Important information about DBFS permissions). If table access control has been enabled, the user must also have MODIFY and READ_METADATA (see Data object privileges - Data governance model).
Personal Staging User Name	String		Personal staging locations have been deprecated by AWS (see Create metastore-level storage) and Microsoft (see Create metastore-level storage).
Refresh Token	encrypted password		Appears in Flow Designer only when Connection Profile is False and Authentication Type is Manual OAuth. This property is required when Manual OAuth is selected as the value of the Authentication Type property. The token expires in 90 days, after which the application will halt. To avoid that, use a connection profile (see Introducing connection profiles), which will allow you to update the token without stopping the application. Alternatively, prior to expiry stop the application and update the token.
Stage Location	String	`/`	See Choose which staging area to use.
Tables	String		The name(s) of the table(s) to write to. The table(s) must exist in the database. Specify target table names as `<catalog>.<database>.<table>`. Not specifying the catalog (`<database>.<table>`) may result in errors if a table in another catalog has the same name. Names are case-insensitive. When the target's input stream is a user-defined event, specify a single table. The only special character allowed in target table names is underscore (`_`). When the input stream of the target is the output of a DatabaseReader, IncrementalBatchReader, or SQL CDC source (that is, when replicating data from one database to another), it can write to multiple tables. In this case, specify the names of both the source and target tables. You may use the `%` wildcard only for tables, not for schemas or databases. If the reader uses three-part names, you must use them here as well. Note that Oracle CDB/PDB source table names must be specified in two parts when the source is Database Reader or Incremental Batch reader (`schema.%,schema.%`) but in three parts when the source is Oracle Reader or OJet ((`database.schema.%,schema.%`). Note that SQL Server source table names must be specified in three parts when the source is Database Reader or Incremental Batch Reader (`database.schema.%,schema.%`) but in two parts when the source is MS SQL Reader or MS Jet (`schema.%,schema.%`). Examples: source.emp,target_database.emp source_schema.%,target_catalog.target_database.% source_database.source_schema.%,target_database.% source_database.source_schema.%, target_catalog.target_database.% MySQL and Oracle names are case-sensitive, SQL Server names are not. Specify names as `<schema name>.<table name>` for MySQL and Oracle and as `<database name>.<schema name>.<table name>` for SQL Server. See Mapping columns for additional options.
Tenant ID	String		Appears in Flow Designer only when Connection Profile is False and Authentication Type is Manual OAuth. This property is required when Manual OAuth is selected as the value of the Authentication Type property.
Upload Policy	String	eventcount:100000, interval:60s	The upload policy may include eventcount and/or interval (see Setting output names and rollover / upload policies for syntax). Buffered data is written to the storage account every time any of the specified values is exceeded. With the default value, data will be written every 60 seconds or sooner if the buffer contains 100,000 events. When the app is quiesced, any data remaining in the buffer is written to the storage account; when the app is undeployed, any data remaining in the buffer is discarded.
Use Connection Profile	Boolean	False	Set to True to use a connection profile instead of specifying the connection properties here. See Introducing connection profiles.

Azure Data Lake Storage (ADLS) Gen2 properties for Databricks Writer

To use an ADLS Gen2 container as your staging area, your Databricks instance should be using Databricks Runtime 11.0 or later.

property	type	default value	notes
Azure Account Access Key	encrypted password		When Authentication Type is set to ServiceAccountKey, specify the account access key from Storage accounts > <account name> > Access keys. When Authentication Type is set to AzureAD, this property is ignored in TQL and not displayed in the Flow Designer.
Azure Account Name	String		the name of the Azure storage account for the blob container
Azure Container Name	String	striim-deltalakewriter-container	the blob container name from Storage accounts > <account name> > Containers If it does not exist, it will be created.

property

type

default value

notes

Azure Account Access Key

encrypted password

When Authentication Type is set to ServiceAccountKey, specify the account access key from Storage accounts > <account name> > Access keys.

When Authentication Type is set to AzureAD, this property is ignored in TQL and not displayed in the Flow Designer.

Azure Account Name

String

the name of the Azure storage account for the blob container

Azure Container Name

String

striim-deltalakewriter-container

the blob container name from Storage accounts > <account name> > Containers

If it does not exist, it will be created.

Amazon S3 properties for Databricks Writer

To use an Amazon S3 bucket as your staging area, your Databricks instance should be using Databricks Runtime 11.0 or later.

property	type	default value	notes
S3 Access Key	String		an AWS access key ID (created on the AWS Security Credentials page) for a user with read and write permissions on the bucket If the Striim host has default credentials stored in the `.aws` directory, you may leave this blank.
S3 Bucket Name	String	striim-deltalake-bucket	Specify the S3 bucket to be used for staging. If it does not exist, it will be created.
S3 Region	String	us-west-1	the AWS region of the bucket
S3 Secret Access Key	encrypted password		the secret access key for the access key If the Striim host has default credentials stored in the `.aws` directory, you may leave this blank.

Databricks Writer data type support and correspondence

TQL type	Delta Lake type
java.lang.Byte	binary
java.lang.Double	double
java.lang.Float	float
java.lang.Integer	int
java.lang.Long	bigint
java.lang.Short	smallint
java.lang.String	string
org.joda.time.DateTime	timestamp

For additional data type mappings, see Data type support & mapping for schema conversion & evolution.

In this section:

Striim Platform 5.0 documentation

Databricks Writer programmer's reference

Databricks Writer properties

Note

Azure Data Lake Storage (ADLS) Gen2 properties for Databricks Writer

Amazon S3 properties for Databricks Writer

Databricks Writer data type support and correspondence

Search results