Rainstor integration on a hadoop, Teradata... combo idea
TDCH Import command - Export Data from Teradata to HADOOP
All,
Please can you help and advise on the following:-
We are in the process of exporting data from Teradata to Hadoop using the TDCH (Teradata Connector for Hadoop). We have managed to successfully import full tables from Teradata to Hadoop, however where the Teradata tables are > 200GB we want to only import the delta's on a daily basis.
We have changed our script to use SOURCEQUERY in place of SOURCETABLE and supplied the SQL with the where clause which only selects a subset of the data based on the date processing. We have also specified the method as split.by.hash, however this is being overridden by split.by.partition when using the SOURCEQUERY parameter.
Using split.by.partition causes the staging table to be created in the DB area which is the full size of the exisitng table, this is causing us issues since we do not have the spare space of replicating the table in the Databse Area and therefore our job abends with "2644 - No more Room in Database".
Please can anyone help why the method split.by.hash cannot be used when using SOURCEQUERY ?
This is a smaple script just to show the parameters we have used to invoke the IMPORT process using TDCH.
hadoop jar /usr/lib/sqoop/lib/teradata-connector-1.4.1.jar terajdbc4.jar \
com.teradata.hadoop.tool.TeradataImportTool \
-D mapreduce.job.queuename=insights \
-url jdbc:teradata://tdprod/database=insight \
-username xxxxxxxx \
-password xxxxxxxxx \
-classname com.teradata.jdbc.TeraDriver \
-fileformat textfile \
-splitbycolumn address_id \
-jobtype hdfs \
-method split.by.hash \
-targetpaths hdfs://dox/user/user1/td_prd_addresses/mail_drop_date='2015-11-20'/ \
-nummappers 1 \
-sourcequery "select col1, col2, col3, col4 from EDWPRODT.ADDRESSES where mail_drop_date = '2015-11-20'"
Hadoop Interview
Support for SUSE v12 on Teradata Appliance
Hi,
I'm trying to match up a configuration of Informatica's Big Data Management (BDM) tool with our Teradata Hadoop appliance. The BDM tool will support SUSE v12 in May 2016. What are the plans for Teradata to support SUSE v12 on its appliance ?
Thanks,
-Rich
Top 10 Don’ts Of Big Data Project
Hello Guys,
These Top 10 don’ts of Big Data projects speak of practices that disallow the creation of data lakes for profit. Steer clear of them and go the right way to make your project a success.
While many big data projects offer significant profitability and increase their activities in a short period of time, there are plentiful initiatives that distance themselves from being fool-proof – courtesy these wrong practices.
Using TDCH Export to move data from Hive view or query into Teradata
I have a Hive table that is too big for my Teradata database, but if I can exclude some of the columns, it should be OK. Since I don't want to have a duplicate copy of the table with fewer columns on my hadoop server I have two choices for import: 1) use a view to filter out the columns or 2) use a query to filter out the columns.
I have tried both with the TeradataExportTool but I don't see any options in the documentation (Teradata Connector for Hadoop Tutorial v1.0 is all I can find) for using a source query on export (only import). Using the view throws an error. Anyone know how this is done or can point me to proper documentation. I am good on either but would like to know both since I have plans to do both. Thanks,
POM.xml for mapreduce development - "Hadoop 2.7.1.2.3.2.0-2950"
Hi - My organization uses Hadoop 2.7.1.2.3.2.0-2950. I am new to whole hadoop and mapreduce programming.
I am looking for POM.xml for mapreduce program development. is there seperate repository detail available for teradata hadoop ?
Can someone please help? Thank you.
Teradata connector points to old lib when runing TPT TDCH to import from Teradata to Hadoop
Hi everyone,
I have installed Teradata connector for hadoop 1.4.4 using teradata-connector-1.4.4-hadoop2.x.noarch.rpm and I can run "hadoop jar /usr/lib/tdch/1.4/lib/teradata-connector-1.4.4.jar" using it. However, when I run TPT jobs via TDCH, the job log shows "the teradata connector for hadoop version is: 1.3.4".
I realised that the following files exist and TPT references it all the time, which triggers 1.3.4
/opt/teradata/client/15.00/tbuild/jar/teradata-connector-hdp2.1.jar
/opt/teradata/client/15.00/tbuild/jar/teradata-connector-hdp1.3.jar
Those files may be left over from previous installation by someone. Apparently when I install TDCH1.4.4, I did not unintall them. However, when checking the readme file, I could not figure out how to uninstall them, because I don't know the original package name. I simply move them to an archive directory but the TPT job just failed with error "Not a valid JAR: /opt/teradata/client/15.00/tbuild/jar/teradata-connector-hdp2.1.jar"
My question is if there is a way to set the PATH to let TPT points to /usr/lib/tdch/1.4/lib/teradata-connector-1.4.4.jar, which is the right location for the latest TDCH? I have tried a few options, e.g. the following but it seems not working.
export TDCH_JAR=/usr/lib/tdch/1.4/lib/teradata-connector-1.4.4.jar
export PATH=$PATH:/usr/lib/tdch/1.4/lib/teradata-connector-1.4.4.jar
thanks
teradata.connector.plugins.xml
Hi,
I am trying to import data to hive (HDP 2.4) from teradata 14.10 using TDCH.
TDCH ver is 1.3.4 as for 1.4.1 the hadoop jar .... command gives invalid jar file error. here is command i am using.
export HADOOP_HOME=/usr/hdp/current/hadoop-client/
export HIVE_HOME=/usr/hdp/current/hive-client/
export HCAT_HOME=/usr/hdp/current/hive-webhcat/
export USERLIBTDCH=/usr/lib/tdch/1.3/lib/teradata-connector-1.3.4.jar
export LIB_JARS=/usr/hdp/2.4.0.0-169/sqoop/lib/avro-1.7.5.jar,/usr/hdp/2.4.0.0-169/sqoop/lib/avro-mapred-1.7.5-hadoop2.jar,$HIVE_HOME/conf,$HIVE_HOME/lib/antlr-runtime-3.4.jar,$HIVE_HOME/lib/commons-dbcp-1.4.jar,$HIVE_HOME/lib/commons-pool-1.5.4.jar,$HIVE_HOME/lib/datanucleus-api-jdo-3.2.6.jar,$HIVE_HOME/lib/datanucleus-core-3.2.10.jar,$HIVE_HOME/lib/datanucleus-rdbms-3.2.9.jar,$HIVE_HOME/lib/hive-cli.jar,$HIVE_HOME/lib/hive-exec.jar,$HIVE_HOME/lib/hive-jdbc.jar,$HIVE_HOME/lib/hive-metastore.jar,$HIVE_HOME/lib/jdo-api-3.0.1.jar,$HIVE_HOME/lib/libfb303-0.9.2.jar,$HIVE_HOME/lib/libthrift-0.9.2.jar,$HCAT_HOME/share/hcatalog/hive-hcatalog-core.jar,/usr/lib/ambari-agent/DBConnectionVerification.jar
yarn jar $USERLIBTDCH com.teradata.connector.common.tool.ConnectorImportTool -libjars $LIB_JARS \
-url jdbc:teradata://ipaddress/database=db1 -username user1 -password pwd1 \
-jobtype hive -fileformat rcfile -sourcetable ITEM_GROUP_TYPE -nummappers 1 \
-targettable ITEM_GROUP_TYPE
when i run above command. following error is returned
16/07/30 21:33:46 INFO tool.ConnectorImportTool: ConnectorImportTool starts at 1469914426192
16/07/30 21:33:49 INFO common.ConnectorPlugin: load plugins in file:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/hadoop-unjar561556663592675147/teradata.connector.plugins.xml
16/07/30 21:33:50 INFO tool.ConnectorImportTool: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/metastore/api/NoSuchObjectException
at com.teradata.connector.common.tool.ConnectorImportTool.processArgs(ConnectorImportTool.java:607)
at com.teradata.connector.common.tool.ConnectorImportTool.run(ConnectorImportTool.java:57)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at com.teradata.connector.common.tool.ConnectorImportTool.main(ConnectorImportTool.java:721)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.metastore.api.NoSuchObjectException
there is no such file created /var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/hadoop-unjar561556663592675147/teradata.connector.plugins.xml
am i missing something in commadn line or do i need to install any pluging explicity?
Thanks
Sqoop importing into Hive - supported file formats
Hi, I'm new to Hadoop. We are using CDH5. I am managing to use sqoop to load from Teradata into Hive as textfile, but that seems to be the only supported option. It seems like avro should work, but when I specify --as-avrodatafile
, I get a null pointer exception.
I'm using sqoop version 1.4.6-cdh5.7.1
, which is using "Cloudera Connector Powered by Teradata' on version 1.4c5"
- is avro expected to work? Or do I need to avoid sqoop, and use TDCH if I want to import into Hive in anything apart from textfile?
TDCH to Hive inserts NULL values
I have been trying to import Teradata Tables into Hive using the TDCH utility. The utility seems to be working fine since the connection is successful. However after the job completes and when I check the values in the respective hive table I get all NULL values. The separator and delimiters are fine. So either it's not able to fetch the data properly or there seems to be some problem with my TDCH code. Please find below my TDCH code and let me know what the possible problem might be:-
$Prefix $Start_Date and $End_Date are the arguments to my script in which this TDCH snippet is. Also $Prefix is the name of the source table in Teradata DB
hadoop jar $TDCH_JAR com.teradata.connector.common.tool.ConnectorImportTool \
-Ddfs.blocksize=134217728 \
-libjars $HIVE_LIB_JARS \
-classname com.teradata.jdbc.TeraDriver \
-url jdbc:teradata://xxxxxx/DATABASE=xxxx \
-username xxxx \
-password $PASS \
-jobtype hive \
-fileformat orcfile \
-method split.by.hash \
-sourcequery "SELECT $SRC_FIELDS FROM $Prefix WHERE CAST(xxxxxx AS DATE)>=${Start_Date} AND CAST(xxxxxx AS DATE) <=${End_Date}" \
-nummappers 2 \
-separator ',' \
-targetdatabase vijit \
-targettable $Curr_Table_Name \
-targetfieldnames "$TGT_FIELDS" 2>&1 | tee -a $Log_File
What is the best way to import multiple tables into Hive from Teradata using TDCH?
Having an Issue with CHAR,VARCHAR,DATE Data types while exporting/importing from Hive to Teradata using TDCH
Hello,
I am trying to Import/Export Tables from/to Teradata into Hive. But If Hive table has CHAR/VARCHAR/DATE data types, I am getting below error from TDCH Connector
INFO tool.ConnectorExportTool: ConnectorExportTool starts at 1473252737445
INFO common.ConnectorPlugin: load plugins in file:/tmp/hadoop-unjar6516039745100009834/teradata.connector.plugins.xml
INFO hive.metastore: Trying to connect to metastore with URI thrift://el3207.bc:9083
INFO hive.metastore: Connected to metastore.
INFO processor.TeradataOutputProcessor: output postprocessor com.teradata.connector.teradata.processor.TeradataBatchInsertProcessor starts at: 1473252738715
INFO processor.TeradataOutputProcessor: output postprocessor com.teradata.connector.teradata.processor.TeradataBatchInsertProcessor ends at: 1473252738715
INFO processor.TeradataOutputProcessor: the total elapsed time of output postprocessor com.teradata.connector.teradata.processor.TeradataBatchInsertProcessor is: 0s
INFO tool.ConnectorExportTool: com.teradata.connector.common.exception.ConnectorException: CHAR(6) Field data type is not supported
at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:140)
at com.teradata.connector.common.tool.ConnectorExportTool.run(ConnectorExportTool.java:62)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at com.teradata.connector.common.tool.ConnectorExportTool.main(ConnectorExportTool.java:780)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
16/09/07 14:52:19 INFO tool.ConnectorExportTool: job completed with exit code 14006
alternate way I can see here is changing CHAR/VARCHAR/DATE to string type in Hive table and do the import/export.
Changing the existing tables is not an optimal solution , Can anyone please help on this?
Thanks,
Support for SUSE v12 on Teradata Appliance
Hi,
I'm trying to match up a configuration of Informatica's Big Data Management (BDM) tool with our Teradata Hadoop appliance. The BDM tool will support SUSE v12 in May 2016. What are the plans for Teradata to support SUSE v12 on its appliance ?
Thanks,
-Rich
Top 10 Don’ts Of Big Data Project
Hello Guys,
These Top 10 don’ts of Big Data projects speak of practices that disallow the creation of data lakes for profit. Steer clear of them and go the right way to make your project a success.
While many big data projects offer significant profitability and increase their activities in a short period of time, there are plentiful initiatives that distance themselves from being fool-proof – courtesy these wrong practices.
Using TDCH Export to move data from Hive view or query into Teradata
I have a Hive table that is too big for my Teradata database, but if I can exclude some of the columns, it should be OK. Since I don't want to have a duplicate copy of the table with fewer columns on my hadoop server I have two choices for import: 1) use a view to filter out the columns or 2) use a query to filter out the columns.
I have tried both with the TeradataExportTool but I don't see any options in the documentation (Teradata Connector for Hadoop Tutorial v1.0 is all I can find) for using a source query on export (only import). Using the view throws an error. Anyone know how this is done or can point me to proper documentation. I am good on either but would like to know both since I have plans to do both. Thanks,
POM.xml for mapreduce development - "Hadoop 2.7.1.2.3.2.0-2950"
Hi - My organization uses Hadoop 2.7.1.2.3.2.0-2950. I am new to whole hadoop and mapreduce programming.
I am looking for POM.xml for mapreduce program development. is there seperate repository detail available for teradata hadoop ?
Can someone please help? Thank you.
Teradata connector points to old lib when runing TPT TDCH to import from Teradata to Hadoop
Hi everyone,
I have installed Teradata connector for hadoop 1.4.4 using teradata-connector-1.4.4-hadoop2.x.noarch.rpm and I can run "hadoop jar /usr/lib/tdch/1.4/lib/teradata-connector-1.4.4.jar" using it. However, when I run TPT jobs via TDCH, the job log shows "the teradata connector for hadoop version is: 1.3.4".
I realised that the following files exist and TPT references it all the time, which triggers 1.3.4
/opt/teradata/client/15.00/tbuild/jar/teradata-connector-hdp2.1.jar
/opt/teradata/client/15.00/tbuild/jar/teradata-connector-hdp1.3.jar
Those files may be left over from previous installation by someone. Apparently when I install TDCH1.4.4, I did not unintall them. However, when checking the readme file, I could not figure out how to uninstall them, because I don't know the original package name. I simply move them to an archive directory but the TPT job just failed with error "Not a valid JAR: /opt/teradata/client/15.00/tbuild/jar/teradata-connector-hdp2.1.jar"
My question is if there is a way to set the PATH to let TPT points to /usr/lib/tdch/1.4/lib/teradata-connector-1.4.4.jar, which is the right location for the latest TDCH? I have tried a few options, e.g. the following but it seems not working.
export TDCH_JAR=/usr/lib/tdch/1.4/lib/teradata-connector-1.4.4.jar
export PATH=$PATH:/usr/lib/tdch/1.4/lib/teradata-connector-1.4.4.jar
thanks
teradata.connector.plugins.xml
Hi,
I am trying to import data to hive (HDP 2.4) from teradata 14.10 using TDCH.
TDCH ver is 1.3.4 as for 1.4.1 the hadoop jar .... command gives invalid jar file error. here is command i am using.
export HADOOP_HOME=/usr/hdp/current/hadoop-client/
export HIVE_HOME=/usr/hdp/current/hive-client/
export HCAT_HOME=/usr/hdp/current/hive-webhcat/
export USERLIBTDCH=/usr/lib/tdch/1.3/lib/teradata-connector-1.3.4.jar
export LIB_JARS=/usr/hdp/2.4.0.0-169/sqoop/lib/avro-1.7.5.jar,/usr/hdp/2.4.0.0-169/sqoop/lib/avro-mapred-1.7.5-hadoop2.jar,$HIVE_HOME/conf,$HIVE_HOME/lib/antlr-runtime-3.4.jar,$HIVE_HOME/lib/commons-dbcp-1.4.jar,$HIVE_HOME/lib/commons-pool-1.5.4.jar,$HIVE_HOME/lib/datanucleus-api-jdo-3.2.6.jar,$HIVE_HOME/lib/datanucleus-core-3.2.10.jar,$HIVE_HOME/lib/datanucleus-rdbms-3.2.9.jar,$HIVE_HOME/lib/hive-cli.jar,$HIVE_HOME/lib/hive-exec.jar,$HIVE_HOME/lib/hive-jdbc.jar,$HIVE_HOME/lib/hive-metastore.jar,$HIVE_HOME/lib/jdo-api-3.0.1.jar,$HIVE_HOME/lib/libfb303-0.9.2.jar,$HIVE_HOME/lib/libthrift-0.9.2.jar,$HCAT_HOME/share/hcatalog/hive-hcatalog-core.jar,/usr/lib/ambari-agent/DBConnectionVerification.jar
yarn jar $USERLIBTDCH com.teradata.connector.common.tool.ConnectorImportTool -libjars $LIB_JARS \
-url jdbc:teradata://ipaddress/database=db1 -username user1 -password pwd1 \
-jobtype hive -fileformat rcfile -sourcetable ITEM_GROUP_TYPE -nummappers 1 \
-targettable ITEM_GROUP_TYPE
when i run above command. following error is returned
16/07/30 21:33:46 INFO tool.ConnectorImportTool: ConnectorImportTool starts at 1469914426192
16/07/30 21:33:49 INFO common.ConnectorPlugin: load plugins in file:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/hadoop-unjar561556663592675147/teradata.connector.plugins.xml
16/07/30 21:33:50 INFO tool.ConnectorImportTool: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/metastore/api/NoSuchObjectException
at com.teradata.connector.common.tool.ConnectorImportTool.processArgs(ConnectorImportTool.java:607)
at com.teradata.connector.common.tool.ConnectorImportTool.run(ConnectorImportTool.java:57)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at com.teradata.connector.common.tool.ConnectorImportTool.main(ConnectorImportTool.java:721)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.metastore.api.NoSuchObjectException
there is no such file created /var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/hadoop-unjar561556663592675147/teradata.connector.plugins.xml
am i missing something in commadn line or do i need to install any pluging explicity?
Thanks
Sqoop importing into Hive - supported file formats
Hi, I'm new to Hadoop. We are using CDH5. I am managing to use sqoop to load from Teradata into Hive as textfile, but that seems to be the only supported option. It seems like avro should work, but when I specify --as-avrodatafile
, I get a null pointer exception.
I'm using sqoop version 1.4.6-cdh5.7.1
, which is using "Cloudera Connector Powered by Teradata' on version 1.4c5"
- is avro expected to work? Or do I need to avoid sqoop, and use TDCH if I want to import into Hive in anything apart from textfile?