Skip to main content

Striim Platform 5.0 documentation

Replicating Oracle data to Hive

See Hive Writer for information on storage types supported and limitations.

The following example assumes the following Oracle source table:

create table employee (Employee_name varchar2(30), 
Employeed_id number, 
CONSTRAINT employee_pk PRIMARY KEY (Employeed_id));

and the following Hive target table:

CREATE TABLE employee (emp_name string, emp_id int)
CLUSTERED BY (emp_id) into 2 buckets 
STORED AS ORC TBLPROPERTIES ('transactional'='true');

The following application will load existing data from Oracle to Hive:

CREATE SOURCE OracleJDBCSource USING DatabaseReader (
  Username:'oracleuser',
  Password:'********',
  ConnectionURL:'192.0.2.75:1521:orcl',
  Tables:'DEMO.EMPLOYEE',
  FetchSize:1
)
OUTPUT TO DataStream;

CREATE TARGET HiveTarget USING HiveWriter (
  ConnectionURL:’jdbc:hive2://localhost:10000’,
  Username:’hiveuser’, 
  Password:’********’,
  hadoopurl:'hdfs://18.144.17.75:9000/',
  Mode:’initiaload’,
  Tables:’DEMO.EMPLOYEE,employee’
)
INPUT FROM DataStream;

Once initial load is complete, the following application will read new data and continuously replicate it to Hive:

CREATE SOURCE OracleCDCSource USING OracleReader (
  Username:'oracleuser',
  Password:'********',
  ConnectionURL:'192.0.2.75:1521:orcl',
  Tables:'DEMO.EMPLOYEE',
  FetchSize:1
)
OUTPUT TO DataStream;

CREATE TARGET HiveTarget USING HiveWriter (
  ConnectionURL:’jdbc:hive2://192.0.2.76:10000’,
  Username:’hiveuser’, 
  Password:’********’,
  hadoopurl:'hdfs://192.0.2.76:9000/',
  Mode:’incrementalload’,
  Tables:’DEMO.EMPLOYEE,employee keycolumns(emp_id)’’
)
INPUT FROM DataStream;