Rainstor integration on a hadoop, Teradata... combo idea

November 5, 2015, 7:28 pm

≫ Next: TDCH Import command - Export Data from Teradata to HADOOP

≪ Previous: Teradata Binary file to Hadoop Sequence File

It looks lucrative to have Rainstor on systems that have Hadoop, Teradata ..... combo.
Anyone who has used it can shed some more light?

Thanks and regards,
Raja

Tags:

Rainstor

Forums:

Hadoop

↧

TDCH Import command - Export Data from Teradata to HADOOP

November 24, 2015, 10:19 am

≫ Next: Hadoop Interview

≪ Previous: Rainstor integration on a hadoop, Teradata... combo idea

All,
Please can you help and advise on the following:-
We are in the process of exporting data from Teradata to Hadoop using the TDCH (Teradata Connector for Hadoop). We have managed to successfully import full tables from Teradata to Hadoop, however where the Teradata tables are > 200GB we want to only import the delta's on a daily basis.
We have changed our script to use SOURCEQUERY in place of SOURCETABLE and supplied the SQL with the where clause which only selects a subset of the data based on the date processing. We have also specified the method as split.by.hash, however this is being overridden by split.by.partition when using the SOURCEQUERY parameter.
Using split.by.partition causes the staging table to be created in the DB area which is the full size of the exisitng table, this is causing us issues since we do not have the spare space of replicating the table in the Databse Area and therefore our job abends with "2644 - No more Room in Database".
Please can anyone help why the method split.by.hash cannot be used when using SOURCEQUERY ?
This is a smaple script just to show the parameters we have used to invoke the IMPORT process using TDCH.
hadoop jar /usr/lib/sqoop/lib/teradata-connector-1.4.1.jar terajdbc4.jar \
com.teradata.hadoop.tool.TeradataImportTool \
-D mapreduce.job.queuename=insights \
-url jdbc:teradata://tdprod/database=insight \
-username xxxxxxxx \
-password xxxxxxxxx \
-classname com.teradata.jdbc.TeraDriver \
-fileformat textfile \
-splitbycolumn address_id \
-jobtype hdfs \
-method split.by.hash \
-targetpaths hdfs://dox/user/user1/td_prd_addresses/mail_drop_date='2015-11-20'/ \
-nummappers 1 \
-sourcequery "select col1, col2, col3, col4 from EDWPRODT.ADDRESSES where mail_drop_date = '2015-11-20'"

Forums:

Hadoop

↧

Hadoop Interview

January 2, 2016, 5:18 am

≫ Next: Support for SUSE v12 on Teradata Appliance

≪ Previous: TDCH Import command - Export Data from Teradata to HADOOP

I am preparing for my Haddop Interview. Please suggest me some good questions.

Tags:

Forums:

↧

Support for SUSE v12 on Teradata Appliance

March 21, 2016, 3:18 pm

≫ Next: Top 10 Don’ts Of Big Data Project

≪ Previous: Hadoop Interview

Hi,
I'm trying to match up a configuration of Informatica's Big Data Management (BDM) tool with our Teradata Hadoop appliance. The BDM tool will support SUSE v12 in May 2016. What are the plans for Teradata to support SUSE v12 on its appliance ?
Thanks,
-Rich

Forums:

Hadoop

↧

Top 10 Don’ts Of Big Data Project

April 19, 2016, 11:26 pm

≫ Next: Using TDCH Export to move data from Hive view or query into Teradata

≪ Previous: Support for SUSE v12 on Teradata Appliance

Hello Guys,
These Top 10 don’ts of Big Data projects speak of practices that disallow the creation of data lakes for profit. Steer clear of them and go the right way to make your project a success.
While many big data projects offer significant profitability and increase their activities in a short period of time, there are plentiful initiatives that distance themselves from being fool-proof – courtesy these wrong practices.

Tags:

big data certification training

big data

Forums:

Hadoop

↧

Using TDCH Export to move data from Hive view or query into Teradata

June 16, 2016, 7:33 am

≫ Next: POM.xml for mapreduce development - "Hadoop 2.7.1.2.3.2.0-2950"

≪ Previous: Top 10 Don’ts Of Big Data Project

I have a Hive table that is too big for my Teradata database, but if I can exclude some of the columns, it should be OK. Since I don't want to have a duplicate copy of the table with fewer columns on my hadoop server I have two choices for import: 1) use a view to filter out the columns or 2) use a query to filter out the columns.
I have tried both with the TeradataExportTool but I don't see any options in the documentation (Teradata Connector for Hadoop Tutorial v1.0 is all I can find) for using a source query on export (only import). Using the view throws an error. Anyone know how this is done or can point me to proper documentation. I am good on either but would like to know both since I have plans to do both. Thanks,

Tags:

Forums:

↧

POM.xml for mapreduce development - "Hadoop 2.7.1.2.3.2.0-2950"

July 22, 2016, 7:56 pm

≫ Next: Teradata connector points to old lib when runing TPT TDCH to import from Teradata to Hadoop

≪ Previous: Using TDCH Export to move data from Hive view or query into Teradata

Hi - My organization uses Hadoop 2.7.1.2.3.2.0-2950. I am new to whole hadoop and mapreduce programming.
I am looking for POM.xml for mapreduce program development. is there seperate repository detail available for teradata hadoop ?
Can someone please help? Thank you.

Tags:

Forums:

↧

Teradata connector points to old lib when runing TPT TDCH to import from Teradata to Hadoop

July 26, 2016, 11:52 pm

≫ Next: teradata.connector.plugins.xml

≪ Previous: POM.xml for mapreduce development - "Hadoop 2.7.1.2.3.2.0-2950"

Hi everyone,
I have installed Teradata connector for hadoop 1.4.4 using teradata-connector-1.4.4-hadoop2.x.noarch.rpm and I can run "hadoop jar /usr/lib/tdch/1.4/lib/teradata-connector-1.4.4.jar" using it. However, when I run TPT jobs via TDCH, the job log shows "the teradata connector for hadoop version is: 1.3.4".
I realised that the following files exist and TPT references it all the time, which triggers 1.3.4
/opt/teradata/client/15.00/tbuild/jar/teradata-connector-hdp2.1.jar
/opt/teradata/client/15.00/tbuild/jar/teradata-connector-hdp1.3.jar

Those files may be left over from previous installation by someone. Apparently when I install TDCH1.4.4, I did not unintall them. However, when checking the readme file, I could not figure out how to uninstall them, because I don't know the original package name. I simply move them to an archive directory but the TPT job just failed with error "Not a valid JAR: /opt/teradata/client/15.00/tbuild/jar/teradata-connector-hdp2.1.jar"
My question is if there is a way to set the PATH to let TPT points to /usr/lib/tdch/1.4/lib/teradata-connector-1.4.4.jar, which is the right location for the latest TDCH? I have tried a few options, e.g. the following but it seems not working.
export TDCH_JAR=/usr/lib/tdch/1.4/lib/teradata-connector-1.4.4.jar
export PATH=$PATH:/usr/lib/tdch/1.4/lib/teradata-connector-1.4.4.jar
thanks

Forums:

Hadoop

↧

teradata.connector.plugins.xml

July 30, 2016, 3:08 pm

≫ Next: Sqoop importing into Hive - supported file formats

≪ Previous: Teradata connector points to old lib when runing TPT TDCH to import from Teradata to Hadoop

Hi,
I am trying to import data to hive (HDP 2.4) from teradata 14.10 using TDCH.
TDCH ver is 1.3.4 as for 1.4.1 the hadoop jar .... command gives invalid jar file error. here is command i am using.

export HADOOP_HOME=/usr/hdp/current/hadoop-client/

export HIVE_HOME=/usr/hdp/current/hive-client/

export HCAT_HOME=/usr/hdp/current/hive-webhcat/

export USERLIBTDCH=/usr/lib/tdch/1.3/lib/teradata-connector-1.3.4.jar

export LIB_JARS=/usr/hdp/2.4.0.0-169/sqoop/lib/avro-1.7.5.jar,/usr/hdp/2.4.0.0-169/sqoop/lib/avro-mapred-1.7.5-hadoop2.jar,$HIVE_HOME/conf,$HIVE_HOME/lib/antlr-runtime-3.4.jar,$HIVE_HOME/lib/commons-dbcp-1.4.jar,$HIVE_HOME/lib/commons-pool-1.5.4.jar,$HIVE_HOME/lib/datanucleus-api-jdo-3.2.6.jar,$HIVE_HOME/lib/datanucleus-core-3.2.10.jar,$HIVE_HOME/lib/datanucleus-rdbms-3.2.9.jar,$HIVE_HOME/lib/hive-cli.jar,$HIVE_HOME/lib/hive-exec.jar,$HIVE_HOME/lib/hive-jdbc.jar,$HIVE_HOME/lib/hive-metastore.jar,$HIVE_HOME/lib/jdo-api-3.0.1.jar,$HIVE_HOME/lib/libfb303-0.9.2.jar,$HIVE_HOME/lib/libthrift-0.9.2.jar,$HCAT_HOME/share/hcatalog/hive-hcatalog-core.jar,/usr/lib/ambari-agent/DBConnectionVerification.jar

yarn jar $USERLIBTDCH com.teradata.connector.common.tool.ConnectorImportTool -libjars $LIB_JARS \

-url jdbc:teradata://ipaddress/database=db1 -username user1 -password pwd1 \

-jobtype hive -fileformat rcfile -sourcetable ITEM_GROUP_TYPE -nummappers 1 \

-targettable ITEM_GROUP_TYPE

when i run above command. following error is returned

16/07/30 21:33:46 INFO tool.ConnectorImportTool: ConnectorImportTool starts at 1469914426192

16/07/30 21:33:49 INFO common.ConnectorPlugin: load plugins in file:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/hadoop-unjar561556663592675147/teradata.connector.plugins.xml

16/07/30 21:33:50 INFO tool.ConnectorImportTool: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/metastore/api/NoSuchObjectException

at com.teradata.connector.common.tool.ConnectorImportTool.processArgs(ConnectorImportTool.java:607)

at com.teradata.connector.common.tool.ConnectorImportTool.run(ConnectorImportTool.java:57)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)

at com.teradata.connector.common.tool.ConnectorImportTool.main(ConnectorImportTool.java:721)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.util.RunJar.run(RunJar.java:221)

at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.metastore.api.NoSuchObjectException

there is no such file created /var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/hadoop-unjar561556663592675147/teradata.connector.plugins.xml

am i missing something in commadn line or do i need to install any pluging explicity?
Thanks

Forums:

Hadoop

↧

Sqoop importing into Hive - supported file formats

August 3, 2016, 10:52 am

≫ Next: TDCH to Hive inserts NULL values

≪ Previous: teradata.connector.plugins.xml

Hi, I'm new to Hadoop. We are using CDH5. I am managing to use sqoop to load from Teradata into Hive as textfile, but that seems to be the only supported option. It seems like avro should work, but when I specify --as-avrodatafile, I get a null pointer exception.
I'm using sqoop version 1.4.6-cdh5.7.1, which is using "Cloudera Connector Powered by Teradata' on version 1.4c5" - is avro expected to work? Or do I need to avoid sqoop, and use TDCH if I want to import into Hive in anything apart from textfile?

Forums:

Hadoop

↧

TDCH to Hive inserts NULL values

August 10, 2016, 2:25 am

≫ Next: What is the best way to import multiple tables into Hive from Teradata using TDCH?

≪ Previous: Sqoop importing into Hive - supported file formats

I have been trying to import Teradata Tables into Hive using the TDCH utility. The utility seems to be working fine since the connection is successful. However after the job completes and when I check the values in the respective hive table I get all NULL values. The separator and delimiters are fine. So either it's not able to fetch the data properly or there seems to be some problem with my TDCH code. Please find below my TDCH code and let me know what the possible problem might be:-

$Prefix $Start_Date and $End_Date are the arguments to my script in which this TDCH snippet is. Also $Prefix is the name of the source table in Teradata DB

hadoop jar $TDCH_JAR com.teradata.connector.common.tool.ConnectorImportTool \

-Ddfs.blocksize=134217728 \

-libjars $HIVE_LIB_JARS \

-classname com.teradata.jdbc.TeraDriver \

-url jdbc:teradata://xxxxxx/DATABASE=xxxx \

-username xxxx \

-password $PASS \

-jobtype hive \

-fileformat orcfile \

-method split.by.hash \

-sourcequery "SELECT $SRC_FIELDS FROM $Prefix WHERE CAST(xxxxxx AS DATE)>=${Start_Date} AND CAST(xxxxxx AS DATE) <=${End_Date}" \

-nummappers 2 \

-separator ',' \

-targetdatabase vijit \

-targettable $Curr_Table_Name \

-targetfieldnames "$TGT_FIELDS" 2>&1 | tee -a $Log_File

Tags:

Forums:

↧

What is the best way to import multiple tables into Hive from Teradata using TDCH?

August 24, 2016, 1:05 pm

≫ Next: Having an Issue with CHAR,VARCHAR,DATE Data types while exporting/importing from Hive to Teradata using TDCH

≪ Previous: TDCH to Hive inserts NULL values

Using TDCH, What is the best way to import multiple tables into Hive from Teradata? Is there an option to move an entire database from Teradata to Hive?

Tags:

Forums:

↧

Having an Issue with CHAR,VARCHAR,DATE Data types while exporting/importing from Hive to Teradata using TDCH

September 7, 2016, 6:32 am

≫ Next: Support for SUSE v12 on Teradata Appliance

≪ Previous: What is the best way to import multiple tables into Hive from Teradata using TDCH?

Hello,
I am trying to Import/Export Tables from/to Teradata into Hive. But If Hive table has CHAR/VARCHAR/DATE data types, I am getting below error from TDCH Connector

INFO tool.ConnectorExportTool: ConnectorExportTool starts at 1473252737445

INFO common.ConnectorPlugin: load plugins in file:/tmp/hadoop-unjar6516039745100009834/teradata.connector.plugins.xml

INFO hive.metastore: Trying to connect to metastore with URI thrift://el3207.bc:9083

INFO hive.metastore: Connected to metastore.

INFO processor.TeradataOutputProcessor: output postprocessor com.teradata.connector.teradata.processor.TeradataBatchInsertProcessor starts at: 1473252738715

INFO processor.TeradataOutputProcessor: output postprocessor com.teradata.connector.teradata.processor.TeradataBatchInsertProcessor ends at: 1473252738715

INFO processor.TeradataOutputProcessor: the total elapsed time of output postprocessor com.teradata.connector.teradata.processor.TeradataBatchInsertProcessor is: 0s

INFO tool.ConnectorExportTool: com.teradata.connector.common.exception.ConnectorException: CHAR(6) Field data type is not supported

at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:140)

at com.teradata.connector.common.tool.ConnectorExportTool.run(ConnectorExportTool.java:62)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)

at com.teradata.connector.common.tool.ConnectorExportTool.main(ConnectorExportTool.java:780)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at org.apache.hadoop.util.RunJar.run(RunJar.java:221)

at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

16/09/07 14:52:19 INFO tool.ConnectorExportTool: job completed with exit code 14006

alternate way I can see here is changing CHAR/VARCHAR/DATE to string type in Hive table and do the import/export.

Changing the existing tables is not an optimal solution , Can anyone please help on this?

Thanks,

Forums:

Hadoop

↧

Support for SUSE v12 on Teradata Appliance

March 21, 2016, 3:18 pm

≫ Next: Top 10 Don’ts Of Big Data Project

≪ Previous: Having an Issue with CHAR,VARCHAR,DATE Data types while exporting/importing from Hive to Teradata using TDCH