Skip to main content

Striim Platform 5.0 documentation

Databricks Writer programmer's reference

Databricks Writer properties

property

type

default value

notes

Authentication Type

enum

PersonalAccessToken

Appears in Flow Designer only when Connection Profile is False.

The simplest way to use Microsoft Entra ID (OAuth) is to create a connection profile (see Introducing connection profiles). Alternatively, select Manual OAuth and follow the instructions in Configuring Microsoft Entra ID (formerly Azure Active Directory) for Databricks Writer manually. In that case, specify Client ID, Client Secret, Refresh Token, and Tenant ID (see Configuring Microsoft Entra ID (formerly Azure Active Directory) for Databricks Writer manually). For simpler configuration, create a connection profile (see Introducing connection profiles).

With the default setting PersonalAccessToken, Striim's connection to Databricks is authenticated using the token specified in Personal Access Token.

CDDL Action

enum

Process

See Handling schema evolution.

If TRUNCATE commands may be entered in the source and you do not want to delete events in the target, precede the writer with a CQ with the select statement SELECT * FROM <input stream name> WHERE META(x, OperationName).toString() != 'Truncate'; (replacing <input stream name> with the name of the writer's input stream). Note that there will be no record in the target that the affected events were deleted.

Client ID

string

Appears in Flow Designer only when Connection Profile is False and Authentication Type is Manual OAuth.

This property is required when Manual OAuth is selected as the value of the Authentication Type property.

Client Secret

encrypted password

Appears in Flow Designer only when Connection Profile is False and Authentication Type is Manual OAuth.

This property is required when Manual OAuth is selected as the value of the Authentication Type property.

Connection Profile Name

Enum

Appears in Flow Designer only when Use Connection Profile is True. See Introducing connection profiles.

Connection Retry Policy

String

initialRetryDelay=10s, retryDelayMultiplier=2, maxRetryDelay=1m, maxAttempts=5, totalTimeout=10m

Do not change unless instructed to by Striim support.

Connection URL

String

Appears in Flow Designer only when Connection Profile is False.

Provide the JDBC URL from the JDBC/ODBC tab of the Databricks cluster's Advanced options (see Get connection details for a cluster). If the URL starts with jdbc:spark:// change that to jdbc:databricks:// (this is required by the upgraded driver bundled with Striim).

External Stage Connection Profile Name

enum

Appears in Flow Designer only when Use Connection Profile is True and External Stage Type is ADLSGen2 or S3.

Select or specify the name of the connection profile for the external stage. (When Databricks Writer uses a connection profile, you must use a connection profile for ADLSGen2 or S3 as well.)

External Stage Type

enum

DBFSROOT

Set to ADLSGen2 or S3 to match the stage type you chose in Choose which staging area to use.

Note

Personal staging locations have been deprecated by AWS (see Create metastore-level storage) and Microsoft (see Create metastore-level storage).

Ignorable Exception Code

String

Set to TABLE_NOT_FOUND to prevent the application from terminating when Striim tries to write to a table that does not exist in the target. See Handling "table not found" errors for more information.

Ignored exceptions will be written to the application's exception store (see CREATE EXCEPTIONSTORE).

Mode

enum

AppendOnly

Set to Merge if that was your choice in Choose which writing mode to use.

Optimized Merge

Boolean

false

Appears in Flow Designer only when Mode is Merge.

Set to True only when Mode is MERGE and the target's input stream is the output of an HP NonStop reader, MySQL Reader, or Oracle Reader source and the source events will include partial records. For example, with Oracle Reader, when supplemental logging has not been enabled for all columns, partial records are sent for updates. When the source events will always include full records, leave this set to false.

Parallel Threads

Integer

Not supported when Mode is Merge.

See Creating multiple writer instances (parallel threads).

Personal Access Token

encrypted password

Appears in Flow Designer only when Connection Profile is False and Authentication Type is Personal Access Token.

Used to authenticate with the Databricks cluster (see Generate a personal access token). The user associated with the token must have read and write access to DBFS (see Important information about DBFS permissions). If table access control has been enabled, the user must also have MODIFY and READ_METADATA (see Data object privileges - Data governance model).

Personal Staging User Name

String

Personal staging locations have been deprecated by AWS (see Create metastore-level storage) and Microsoft (see Create metastore-level storage).

Refresh Token

encrypted password

Appears in Flow Designer only when Connection Profile is False and Authentication Type is Manual OAuth.

This property is required when Manual OAuth is selected as the value of the Authentication Type property. The token expires in 90 days, after which the application will halt. To avoid that, use a connection profile (see Introducing connection profiles), which will allow you to update the token without stopping the application. Alternatively, prior to expiry stop the application and update the token.

Stage Location

String

/

See  Choose which staging area to use.

Tables

String

The name(s) of the table(s) to write to. The table(s) must exist in the database.

Specify target table names as <catalog>.<database>.<table>. Not specifying the catalog (<database>.<table>) may result in errors if a table in another catalog has the same name. Names are case-insensitive.

When the target's input stream is a user-defined event, specify a single table.

The only special character allowed in target table names is underscore (_).

When the input stream of the target is the output of a DatabaseReader, IncrementalBatchReader, or SQL CDC source (that is, when replicating data from one database to another), it can write to multiple tables. In this case, specify the names of both the source and target tables. You may use the % wildcard only for tables, not for schemas or databases. If the reader uses three-part names, you must use them here as well. Note that Oracle CDB/PDB source table names must be specified in two parts when the source is Database Reader or Incremental Batch reader (schema.%,schema.%) but in three parts when the source is Oracle Reader or OJet ((database.schema.%,schema.%). Note that SQL Server source table names must be specified in three parts when the source is Database Reader or Incremental Batch Reader (database.schema.%,schema.%) but in two parts when the source is MS SQL Reader or MS Jet (schema.%,schema.%). Examples:

source.emp,target_database.emp
source_schema.%,target_catalog.target_database.%
source_database.source_schema.%,target_database.%
source_database.source_schema.%,
  target_catalog.target_database.%

MySQL and Oracle names are case-sensitive, SQL Server names are not. Specify names as <schema name>.<table name> for MySQL and Oracle and as <database name>.<schema name>.<table name> for SQL Server.

See Mapping columns for additional options.

Tenant ID

String

Appears in Flow Designer only when Connection Profile is False and Authentication Type is Manual OAuth.

This property is required when Manual OAuth is selected as the value of the Authentication Type property.

Upload Policy

String

eventcount:100000, interval:60s

The upload policy may include eventcount and/or interval (see Setting output names and rollover / upload policies for syntax). Buffered data is written to the storage account every time any of the specified values is exceeded. With the default value, data will be written every 60 seconds or sooner if the buffer contains 100,000 events. When the app is quiesced, any data remaining in the buffer is written to the storage account; when the app is undeployed, any data remaining in the buffer is discarded.

Use Connection Profile

Boolean

False

Set to True to use a connection profile instead of specifying the connection properties here. See Introducing connection profiles.

Azure Data Lake Storage (ADLS) Gen2 properties for Databricks Writer

To use an ADLS Gen2 container as your staging area, your Databricks instance should be using Databricks Runtime 11.0 or later.

property

type

default value

notes

Azure Account Access Key

encrypted password

When Authentication Type is set to ServiceAccountKey, specify the account access key from Storage accounts > <account name> > Access keys.

When Authentication Type is set to AzureAD, this property is ignored in TQL and not displayed in the Flow Designer.

Azure Account Name

String

the name of the Azure storage account for the blob container

Azure Container Name

String

striim-deltalakewriter-container

the blob container name from Storage accounts > <account name> > Containers

If it does not exist, it will be created.

Amazon S3 properties for Databricks Writer

To use an Amazon S3 bucket as your staging area, your Databricks instance should be using Databricks Runtime 11.0 or later.

property

type

default value

notes

S3 Access Key

String

an AWS access key ID (created on the AWS Security Credentials page) for a user with read and write permissions on the bucket

If the Striim host has default credentials stored in the .aws directory, you may leave this blank.

S3 Bucket Name

String

striim-deltalake-bucket

Specify the S3 bucket to be used for staging. If it does not exist, it will be created.

S3 Region

String

us-west-1

the AWS region of the bucket

S3 Secret Access Key

encrypted password

the secret access key for the access key

If the Striim host has default credentials stored in the .aws directory, you may leave this blank.

Databricks Writer data type support and correspondence

TQL type

Delta Lake type

java.lang.Byte

binary

java.lang.Double

double

java.lang.Float

float

java.lang.Integer

int

java.lang.Long

bigint

java.lang.Short

smallint

java.lang.String

string

org.joda.time.DateTime

timestamp

For additional data type mappings, see Data type support & mapping for schema conversion & evolution.