Skip to main content

Striim Platform 5.0 documentation

Databricks Writer

Databricks Writer writes to Delta Lake tables in Databricks on AWS or Azure. Delta Lake is an open-source tabular storage framework that includes a transaction log to support features typically associated with relational databases, such as ACID transactions and optimistic concurrency control.

You can use Striim's Databricks Writer to write data from transactional databases such as Oracle and SQL Server, applications such as Salesforce and ServiceNow, NoSQL databases such as Cosmos DB and MongoDB, object stores such as Amazon S3 and Google Cloud Storage, and other supported sources to Delta Lake tables in Databricks on AWS or Azure.

Databricks Writer summary

Supported sources

Databricks Writer can write data from all sources supported by Striim.

Authentication

Azure Databricks: Databricks Writer authenticates its connection using a personal access token or Microsoft Entra (formerly Azure Active Directory).

Databricks on AWS: Databricks Writer authenticates its connection using a personal access token.

Supported write modes

Databricks Writer supports two write modes:

  • Merge: Records inserted, updated, or deleted from the source database(s) are inserted, updated, or deleted in Databricks, so the data in Databricks duplicates the data in the source database(s).

  • Append Only: Insert, update, and delete operations in the source database(s) are all treated as inserts in Databricks. Thus, you can use Databricks to query old data that no longer exists in the source database(s), for example, for month-over-month or year-over-year reports.

Additional writing features

  • Supports auto-quiesce after an initial load from Cosmos DB Reader, Database Reader, Mongo Cosmos DB Reader, or MongoDB Reader.

  • Supports schema evolution to detect and propagate DDL changes from supported sources to the BigQuery tables.

Supported staging areas

Databricks requires a staging area to temporarily hold new data while it is being written to tables. Databricks Writer supports the following staging areas:

  • Azure Databricks: Azure Data Lake Storage Gen2 or Databricks File System

  • Databricks on AWS: S3 or Databricks File System

Resilience and recovery

  • Supports connection retry to avoid application halting due to transient connection issues.

  • Supports recovery with at-least-once processing (see Recovering applications).

Performance

Parallel threads (see Creating multiple writer instances (parallel threads)) can increase throughput to the target in certain situations.Creating multiple writer instances

Programmability

  • Flow Designer

  • TQL

  • wizards in the web UI to create applications from the following sources:

    • Initial load with Auto Schema Conversion (using Database Reader) from BigQuery, MariaDB, MySQL, Oracle, PostgreSQL, Salesforce, Snowflake, or SQL Server

    • CDC from MariaDB, MySQL, Oracle, PostgreSQL, Salesforce, Snowflake, or SQL Server

    • ADLS

    • Amazon S3

    • Google Ads

    • Google Cloud Storage

    • HDFS

    • HubSpot

    • Incremental Batch Reader

    • Intercom

    • Jira

    • Salesforce

    • ServiceNow

    • Stripe

    • Zendesk

Metrics and auditing

Key metrics are available through Striim's monitoring features (see Monitoring Guide).

drivers and other third-party libraries

Databricks Writer uses Databricks JDBC driver version 2.6.29. It also uses the following:

  • for authentication using Azure Active Directory and staging in ADLS Gen2: azure-identity version 1.5.3

  • for staging in ADLS Gen2: azure-storage-blob version 12.18.0

  • for staging in DBFS: databricks-rest-client version 3.2.2

  • for staging in S3: aws-java-sdk-s3 version 1.12.589 and aws-java-sdk-sts version 1.11.320

Key limitations

Data is written in batch mode. Streaming mode is not supported in this release.

For more information, see: