blog.Ring.idv.tw

Sqoop: A database import tool for Hadoop

Sqoop: A database import tool for Hadoop


Sqoop("SQL-to-Hadoop").簡單來說它是用來協助開發人員將資料庫的資料轉換到HDFS的一項工具,不過要到Hadoop 0.21.0版本釋出才會正式的納進來,但如果現在就要使用的話可以直接Patch HADOOP-5815來使用。

Sqoop主要是透過JDBC來和各個資料庫連結,並自動產生O/R Mapping的java檔,最後透過MapReduce將資料庫的資料轉換到HDFS之中,下述是筆者的範例測試:

Sqoop example for MySQL

下載JDBC Driver for MySQL (Connector/J 5.1),並設定好它的CLASSPATH。

資料庫: test

資料表: student

create table student(
    id int not null primary key,
    name varchar(20),
    score tinyint
);

預設三筆資料:

mysql> select * from student;
+----+------+-------+
| id | name | score |
+----+------+-------+
|  1 | mary |    98 |
|  2 | jack |    94 |
|  3 | john |    40 |
+----+------+-------+
3 rows in set (0.00 sec)

這裡的目標是指定將student table轉出到HDFS之中:

hadoop jar sqoop.jar org.apache.hadoop.sqoop.Sqoop --connect jdbc:mysql://localhost/test --table student

執行過程:

09/10/30 06:48:47 INFO sqoop.Sqoop: Beginning code generation
09/10/30 06:48:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM student AS t WHERE 1 = 1
09/10/30 06:48:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM student AS t WHERE 1 = 1
09/10/30 06:48:48 INFO orm.CompilationManager: HADOOP_HOME is /home/hdp/hadoop-0.20.1/bin/..
09/10/30 06:48:48 INFO orm.CompilationManager: Found hadoop core jar at: /home/hdp/hadoop-0.20.1/bin/../hadoop-0.20.1-core.jar
09/10/30 06:48:48 INFO orm.CompilationManager: Invoking javac with args: -sourcepath ./ -d /tmp/sqoop/compile/ -classpath /home/hdp/hadoop-0.20.1/bin/../conf:/usr/lib/jvm/java-6-sun-1.6.0.10/lib/tools.jar:/home/hdp/hadoop-0.20.1/bin/..:/home/hdp/hadoop-0.20.1/bin/../hadoop-0.20.1-core.jar:/home/hdp/hadoop-0.20.1/bin/../lib/commons-cli-1.2.jar:/home/hdp/hadoop-0.20.1/bin/../lib/commons-codec-1.3.jar:/home/hdp/hadoop-0.20.1/bin/../lib/commons-el-1.0.jar:/home/hdp/hadoop-0.20.1/bin/../lib/commons-httpclient-3.0.1.jar:/home/hdp/hadoop-0.20.1/bin/../lib/commons-logging-1.0.4.jar:/home/hdp/hadoop-0.20.1/bin/../lib/commons-logging-api-1.0.4.jar:/home/hdp/hadoop-0.20.1/bin/../lib/commons-net-1.4.1.jar:/home/hdp/hadoop-0.20.1/bin/../lib/core-3.1.1.jar:/home/hdp/hadoop-0.20.1/bin/../lib/hsqldb-1.8.0.10.jar:/home/hdp/hadoop-0.20.1/bin/../lib/jasper-compiler-5.5.12.jar:/home/hdp/hadoop-0.20.1/bin/../lib/jasper-runtime-5.5.12.jar:/home/hdp/hadoop-0.20.1/bin/../lib/jets3t-0.6.1.jar:/home/hdp/hadoop-0.20.1/bin/../lib/jetty-6.1.14.jar:/home/hdp/hadoop-0.20.1/bin/../lib/jetty-util-6.1.14.jar:/home/hdp/hadoop-0.20.1/bin/../lib/junit-3.8.1.jar:/home/hdp/hadoop-0.20.1/bin/../lib/kfs-0.2.2.jar:/home/hdp/hadoop-0.20.1/bin/../lib/log4j-1.2.15.jar:/home/hdp/hadoop-0.20.1/bin/../lib/oro-2.0.8.jar:/home/hdp/hadoop-0.20.1/bin/../lib/servlet-api-2.5-6.1.14.jar:/home/hdp/hadoop-0.20.1/bin/../lib/slf4j-api-1.4.3.jar:/home/hdp/hadoop-0.20.1/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/hdp/hadoop-0.20.1/bin/../lib/sqoop.jar:/home/hdp/hadoop-0.20.1/bin/../lib/xmlenc-0.52.jar:/home/hdp/hadoop-0.20.1/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/hdp/hadoop-0.20.1/bin/../lib/jsp-2.1/jsp-api-2.1.jar:.:/home/hdp/hadoop-0.20.1/bin/../mysql-jdbc.jar:/home/hdp/hadoop-0.20.1/bin/../hadoop-0.20.1-core.jar ./student.java
09/10/30 06:48:49 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop/compile/student.jar
09/10/30 06:48:49 INFO mapred.ImportJob: Beginning data import of student
09/10/30 06:48:50 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM student AS t WHERE 1 = 1
09/10/30 06:48:50 INFO mapred.JobClient: Running job: job_200910300539_0002
09/10/30 06:48:51 INFO mapred.JobClient:  map 0% reduce 0%
09/10/30 06:49:04 INFO mapred.JobClient:  map 50% reduce 0%
09/10/30 06:49:07 INFO mapred.JobClient:  map 100% reduce 0%
09/10/30 06:49:09 INFO mapred.JobClient: Job complete: job_200910300539_0002
09/10/30 06:49:09 INFO mapred.JobClient: Counters: 6
09/10/30 06:49:09 INFO mapred.JobClient:   Job Counters
09/10/30 06:49:09 INFO mapred.JobClient:     Launched map tasks=2
09/10/30 06:49:09 INFO mapred.JobClient:   FileSystemCounters
09/10/30 06:49:09 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=30
09/10/30 06:49:09 INFO mapred.JobClient:   Map-Reduce Framework
09/10/30 06:49:09 INFO mapred.JobClient:     Map input records=3
09/10/30 06:49:09 INFO mapred.JobClient:     Spilled Records=0
09/10/30 06:49:09 INFO mapred.JobClient:     Map input bytes=3
09/10/30 06:49:09 INFO mapred.JobClient:     Map output records=3

在執行Sqoop上述指令後,它會自動產生一個「student.java」檔(ORM),並打包在「/tmp/sqoop/compile/student.jar」,而這個檔案除了包含「student.class」,同時也包含「sqoop.jar」在它的lib資料夾內,執行成功之後就可以在HDFS之中看到轉換後的資料。

從上述的執行過程中,不難發現其實Sqoop只是透過Map Phase來進行轉換。

相關資源

Introducing Sqoop

Getting Started With Sqoop

Sqoop

2009-10-30 19:20:01

4 comments on "Sqoop: A database import tool for Hadoop"

  1. 1. harold 說:

    請問一下那如果我要從hive table(not hdfs) export to mysql如何做

    2011-06-17 16:03:31

  2. 2. Shen 說:

    我想你要的應該是這個:
    http://wiki.apache.org/hadoop/Hive/HiveJDBCInterface

    2011-06-17 22:56:04

  3. 3. Harold 說:

    我目前有四台hadoop , 已經可以透過sqoop將mysql import to hadoop-hive,但是唯一一個問題,hive table無法透過sqoop export hive table to mysql, sqoop似乎只能export dfs上的data,我貼過程你就知道我在請教什麼了
    hadoop@hadoop-00:~$ hadoop dfs -ls
    Found 2 items
    drwxr-xr-x - hadoop supergroup 0 2011-06-16 12:14 /user/hadoop/ip14
    drwxr-xr-x - hadoop supergroup 0 2011-06-16 18:22 /user/hadoop/test2
    hadoop@hadoop-00:~$ sqoop export --connect jdbc:mysql://localhost/sqoop --username root --password 'gib^%$321' --table test --export-dir test2
    11/06/20 17:58:47 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    11/06/20 17:58:47 INFO tool.CodeGenTool: Beginning code generation
    11/06/20 17:58:47 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1
    11/06/20 17:58:47 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1
    11/06/20 17:58:47 INFO orm.CompilationManager: HADOOP_HOME is /home/hadoop/hadoop-0.20.2/bin/..
    11/06/20 17:58:47 INFO orm.CompilationManager: Found hadoop core jar at: /home/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar
    11/06/20 17:58:48 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/bfafe027889f98baea23583cfc1cf159/test.jar
    11/06/20 17:58:48 INFO mapreduce.ExportJobBase: Beginning export of test
    11/06/20 17:58:49 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1
    11/06/20 17:58:49 WARN mapreduce.ExportJobBase: Input path hdfs://hadoop-00:9000/user/hadoop/test2 contains no files
    11/06/20 17:58:49 INFO input.FileInputFormat: Total input paths to process : 0
    11/06/20 17:58:49 INFO input.FileInputFormat: Total input paths to process : 0
    11/06/20 17:58:49 INFO mapred.JobClient: Running job: job_201106161627_0047
    11/06/20 17:58:50 INFO mapred.JobClient: map 0% reduce 0%
    11/06/20 17:58:57 INFO mapred.JobClient: Job complete: job_201106161627_0047
    11/06/20 17:58:57 INFO mapred.JobClient: Counters: 0
    11/06/20 17:58:57 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 8.6208 seconds (0 bytes/sec)
    11/06/20 17:58:57 INFO mapreduce.ExportJobBase: Exported 0 records.

    以上是dfs 資料可匯入mysql
    以下是hive table無法匯入mysql
    hadoop@hadoop-00:~$ hive -e "show tables;"
    Hive history file=/tmp/hadoop/hive_job_log_hadoop_201106201759_1266528486.txt
    OK
    db_1_table_1
    iis_vcip
    iis_vcip14
    iis_vcip14_count_asc
    iis_vcip14_count_desc
    iis_vcip15_count_asc
    iis_vcip15_count_desc
    iislog
    iislog14
    iislog15
    varnish
    vcip
    Time taken: 2.528 seconds
    FAILED: Parse Error: line 0:-1 cannot recognize input '<EOF>'

    hadoop@hadoop-00:~$ sqoop export --connect jdbc:mysql://localhost/sqoop --username root --password 'gib^%$321' --table iis_vcip14_count_desc --export-dir iis_vcip14_count_desc
    11/06/20 18:01:06 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    11/06/20 18:01:06 INFO tool.CodeGenTool: Beginning code generation
    11/06/20 18:01:06 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `iis_vcip14_count_desc` AS t LIMIT 1
    11/06/20 18:01:06 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `iis_vcip14_count_desc` AS t LIMIT 1
    11/06/20 18:01:06 INFO orm.CompilationManager: HADOOP_HOME is /home/hadoop/hadoop-0.20.2/bin/..
    11/06/20 18:01:06 INFO orm.CompilationManager: Found hadoop core jar at: /home/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar
    11/06/20 18:01:07 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/324a6378cde2c12529deb263545bc637/iis_vcip14_count_desc.jar
    11/06/20 18:01:07 INFO mapreduce.ExportJobBase: Beginning export of iis_vcip14_count_desc
    11/06/20 18:01:08 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `iis_vcip14_count_desc` AS t LIMIT 1
    11/06/20 18:01:08 WARN mapreduce.ExportJobBase: Input path hdfs://hadoop-00:9000/user/hadoop/iis_vcip14_count_desc does not exist
    11/06/20 18:01:08 ERROR tool.ExportTool: Encountered IOException running export job: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://hadoop-00:9000/user/hadoop/iis_vcip14_count_desc
    hadoop@hadoop-00:~$

    2011-06-20 18:13:42

  4. 4. Shen 說:

    從你的貼文顯示「iis_vcip14_count_desc」該table不存在?
    用HDFS cmd列出來檢查一下~

    2011-06-20 21:11:33

Leave a Comment

Copyright (C) Ching-Shen Chen. All rights reserved.

::: 搜尋 :::

::: 分類 :::

::: Ads :::

::: 最新文章 :::

::: 最新回應 :::

::: 訂閱 :::

Atom feed
Atom Comment