Skip to main content

Striim Platform 5.0 documentation

GCS Writer

Google Cloud Storage is a cloud storage platform designed to store large, unstructured data sets. It provides unified file storage for data. You can use the GCS Writer to write data to a Google Cloud Storage bucket.

Summary

APIs used/data supported

Google Cloud Storage APIs: Put, Create and List bucket

Supported formatters

Avro, DSV, JSON, Parquet, XML. See Supported writer-formatter combinations.Supported writer-formatter combinations

Supported sources

All sources supported by Striim such as Oracle, SQL Server, MySQL, Amazon S3, or Azure Storage, that generate the following event types: JSONNodeEvent, ParquetEvent, user-defined, WAEvent, XMLNodeEvent. Unsupported sources include those that generate the AvroEvent type.

See Readers overview.Readers overview

Security and authentication

GCS Writer authenticates its connection to Google Cloud Storage using a Google service account key.

GCS Writer supports private endpoints. See Using Private Service Connect with Google Cloud adapters.

Resilience / recovery

  • Supports recovery with at-least-once processing (A1P). See Recovering applications.Recovering applications

  • Any API call to the GCS bucket is retried on a connection failure based on the Connection Retry Policy property (see GCS Writer properties).

Performance

Supports parallel threads to increase throughput to the target. See Creating multiple writer instances (parallel threads).Creating multiple writer instances

Programmability

  • Wizards in the web UI to build pipelines from sources such as databases or apps

  • Striim Flow Designer

  • Striim TQL

Metrics and auditing

Key metrics available through Striim monitoring. See GCS Reader monitoring metrics.

Typical use case and integration

A typical use case is to use Google Cloud Storage as a centralized repository that stores, processes, and secures files of any format. Striim is used to process and store the newly generated event from the source.

GCS Writer overview

Google Cloud Storage is binary large object (blob) storage from Google, and is functionally similar to Amazon S3 or Azure Blob Storage. Similar to Striim's S3 Writer, the GCS Writer stages files locally before they are uploaded to GCS. Like other blob storage, GCS doesn't allow appending to a file; a file is uploaded as is.

The local staging area is located at .striim/<appname>/<bucket-name>/<user-provided-folder-path>/<file>. Once a file is uploaded, it is deleted from the local staging area.

Google supports a number of different methods for uploading to cloud storage (see Uploads and downloads). GCS Writer uses the Upload objects from memory method. Once GCS Writer stages the file locally, it reads the contents at once in a byte array and uploads them to the bucket. This adapter requires you to provide:

  • A Google service account key (which you can download in JSON format from your Google Cloud account).

  • A project ID to which the Google service account has read/write permissions to an existing GCS bucket.

Initial Google Cloud Storage setup

Before you can use GCS Writer in a Striim application, you must perform initial set up steps. These include creating a custom Google Cloud Storage role with permissions on the bucket, assigning the role to a service account you create, and downloading the service account key needed to authenticate the GCS Writer.

  1. Create a custom Google Cloud Storage role with the following permissions on the bucket and assign it to your service account:

    • storage.buckets.create

    • storage.buckets.get

    • storage.objects.create

    • storage.objects.delete

    gcs-writer-1.png
  2. Create a service account and assign the above role to it (see Service accounts overview).

    1. Create a service account.

      gcs-writer-2.png
    2. Assign the role to the service account..

      gcs-writer-3.png
  3. Download the service account key needed to authenticate the GCS Writer.

    gcs-writer-4.png
Configuring Striim to work with Google Cloud Storage

All clients and drivers required by GCS Writer are bundled with Striim. No additional setup is required.

Create a GCS Writer application

Create a GCS Writer application using a wizard
gcs_writer_template.png

In Striim, select Apps > Create New, enter Target: Google Cloud Storage in the search bar, click the kind of application you want to create, and follow the prompts. See Creating apps using wizards for more information.Creating apps using templates

Create a GCS Writer application using TQL
CREATE APPLICATION testGCS;

CREATE SOURCE PosSource USING FileReader ( 
  wildcard: 'PosDataPreview.csv',
  directory: 'Samples/PosApp/appData',
  positionByEOF:false )
PARSE USING DSVParser (
  header:Yes,
  trimquote:false ) 
OUTPUT TO PosSource_Stream;

CREATE CQ PosSource_Stream_CQ 
INSERT INTO PosSource_TransformedStream
SELECT TO_STRING(data[1]) AS MerchantId,
  TO_DATE(data[4]) AS DateTime,
  TO_DOUBLE(data[7]) AS AuthAmount,
  TO_STRING(data[9]) AS Zip
FROM PosSource_Stream;

CREATE TARGET GCSOut USING GCSWriter  ( 
  bucketName: 'mybucket',
  objectName: 'myobjectname',
  serviceAccountKey: 'conf/myproject-ec6f8b0e3afe.json',
  projectId: 'myproject' ) 
FORMAT USING DSVFormatter  () 
INPUT FROM PosSource_TransformedStream;

END APPLICATION testGCS;
Create a GCS Writer application using the Flow Designer

See Creating and modifying apps using the Flow Designer.Creating and modifying apps using the Flow Designer

GCS Writer programmer's reference

GCS Writer properties

property

type

default value

notes

Bucket Name

String

The GCS bucket name. If it does not exist, it will be created (provided Location and Storage Class are specified).

See Setting output names and rollover / upload policies for advanced options.

Note the limitations in Google's Bucket and Object Naming Guidelines. Note particularly that bucket names must be unique not just within your project or account but across all Google Cloud Storage accounts.

See Setting output names and rollover / upload policies for advanced Striim options.

Client Configuration

String

Optionally, specify one or more of the following property-value pairs, separated by commas, to override Google's defaults (see Class RetrySettings):

  • connectionTimeout=<value>: default is 30000

  • msxErrorRetry=<value>: default is 3

  • retryDelay=<value>: default is 30000

Compression Type

String

Set to gzip to compress the output using gzip.

Data Encryption Key Passphrase

encrypted password

See Setting encryption policies.

Encryption Policy

String

See Setting encryption policies.

Folder Name

String

Optionally, specify a folder within the specified bucket. If it does not exist, it will be created.

See Setting output names and rollover / upload policies for advanced options.

Location

String

The location of the bucket, which you can find on the bucket's overview tab (see Bucket Locations).

Object Name

String

The base name of the files to be written. See Google's Bucket and Object Naming Guidelines.

See Setting output names and rollover / upload policies for advanced Striim options.

Parallel Threads

Integer

See Creating multiple writer instances (parallel threads).

Partition Key

String

This property has been deprecated and will be removed in a future release.. 

Private Service Connect Endpoint

String

If using Private Service Connect with Google Cloud Storage, specify the endpoint created in the target Virtual Private Cloud, as described in Private Service Connect support for Google cloud adapters.

Project Id

String

The Google Cloud Platform project for the specified bucket.

Roll Over on DDL

Boolean

True

Has effect only when the input stream is the output stream of a CDC reader source. With the default value of True, rolls over to a new file when a DDL event is received. Set to False to keep writing to the same file.

Service Account Key

String

The path (from root or the Striim program directory) and file name to the .json credentials file downloaded from Google (see Service Accounts). This file must be copied to the same location on each Striim server that will run this adapter, or to a network location accessible by all servers. The associated service account must have the Storage Legacy Bucket Writer role for the specified bucket.

Storage Class

String

The storage class of the bucket, which you can find on the bucket's overview tab (see Bucket Locations).

Upload Policy

String

eventcount:10000, interval:5m

The upload policy may include eventcount, interval, and/or filesize (see Setting output names and rollover / upload policies for syntax). Cached data is written to GCS every time any of the specified values is exceeded. With the default value, data will be written every five minutes or sooner if the cache contains 10,000 events. When the app is undeployed, all remaining data is written to GCS.

This adapter has a choice of formatters. See Supported writer-formatter combinations for more information.Supported writer-formatter combinations

GCS Writer runtime considerations

GCS Writer monitoring metrics

The following monitoring metrics are published by GCS Writer:

  • External I/O Latency: Time duration to complete I/O.

  • Last Checkpointed Position: Value of the last checkpointed position.

  • Last I/O Size: Size of the last upload in bytes.

  • Last I/O Time: Timestamp of the last I/O.

  • Last Uploaded Filename: Name of the last uploaded file.

  • Total Events In Last Upload: Total number of events in the last upload.

Optimizing GCS Writer performance

GCS Writer's Parallel Threads property may in some circumstances allow you to improve performance. For more information, see Creating multiple writer instances (parallel threads). Each instance runs in its own thread, increasing throughput. If your application were deployed to multiple servers, each server would run multiple instances, so the total number of instances would be the Parallel Threads value times the number of servers. Use Parallel Threads only when the target is not able to keep up with incoming events (that is, when its input stream has backpressure). See Understanding and managing backpressure. Otherwise, the overhead imposed by additional threads could reduce the application's performance.Creating multiple writer instances

Troubleshooting GCS Writer

This topic describes errors you may see when using GCS Writer, and possible resolutions.

Exception

Resolution

com.google.cloud.storage.StorageException: gcsreaderaccesslevelzero@xyz.gserviceaccount.com does not have access to the Google Cloud Storage object

  • Make sure you have the required permissions so GCSWriter can access Google Cloud Storage.

  • Make sure you have write and get permissions to the bucket (you can change permissions by going to the Permissions tab for a bucket).