blog.Ring.idv.tw

Hadoop - 探討RunJar

Hadoop - 探討RunJar


通常要執行一個Hadoop Job時,會透過下述的指令來達成:

${HADOOP_HOME}/bin/hadoop jar your.jar mainClass args

當送出上述指令之後,透過「jps」指令可以觀察到有一個「org.apache.hadoop.util.RunJar」的程式正在執行:

19141 RunJar

而該「org.apache.hadoop.util.RunJar」的程式就是透過「${HADOOP_HOME}/bin/hadoop」shell來執行對應的Command「jar」,並啟動「org.apache.hadoop.util.RunJar」來進行Hadoop Job的第一步。

hadoop shell (lines:228-229)

elif [ "$COMMAND" = "jar" ] ; then
  CLASS=org.apache.hadoop.util.RunJar

main method開始來看,它會從你所執行的「your.jar」來試著取得manifest的Main-Class屬性用來當作mainClassName,如果沒有指定的話就從參數取得。

RunJar.java (lines:94-107)

Manifest manifest = jarFile.getManifest();
    if (manifest != null) {
      mainClassName = manifest.getMainAttributes().getValue("Main-Class");
    }
    jarFile.close();

    if (mainClassName == null) {
      if (args.length < 2) {
        System.err.println(usage);
        System.exit(-1);
      }
      mainClassName = args[firstArg++];
    }
    mainClassName = mainClassName.replaceAll("/", ".");

接著該程式會將「your.jar」解壓縮在一個暫存的目錄裡面,該目錄的位置會取決於「hadoop.tmp.dir」的設定,從「${HAOOP_HOME}/src/core/core-default.xml」可以得知該設定的預設值為「/tmp/hadoop-${user.name}」,所以從下述的原始碼可得知,它一開始會試著建立「/tmp/hadoop-${user.name}」目錄(通常只有第一次執行時),然後再透過「File.createTempFile()」方法來建立一個「hadoop-unjar*」的暫存目錄,所以「your.jar」解壓縮後的class檔都會放在此目錄裡面,當執行結束之後也會一併刪除該目錄。

RunJar.java (lines:109-132)

File tmpDir = new File(new Configuration().get("hadoop.tmp.dir"));
    tmpDir.mkdirs();
    if (!tmpDir.isDirectory()) { 
      System.err.println("Mkdirs failed to create " + tmpDir);
      System.exit(-1);
    }
    final File workDir = File.createTempFile("hadoop-unjar", "", tmpDir);
    workDir.delete();
    workDir.mkdirs();
    if (!workDir.isDirectory()) {
      System.err.println("Mkdirs failed to create " + workDir);
      System.exit(-1);
    }
    
    Runtime.getRuntime().addShutdownHook(new Thread() {
        public void run() {
          try {
            FileUtil.fullyDelete(workDir);
          } catch (IOException e) {
          }
        }
      });
     
    unJar(file, workDir);

最後才會透過Reflection機制來達成動態載入「your.jar」的mainClass。

RunJar.java (lines:134-159)

ArrayList<URL> classPath = new ArrayList<URL>();
    classPath.add(new File(workDir+"/").toURL());
    classPath.add(file.toURL());
    classPath.add(new File(workDir, "classes/").toURL());
    File[] libs = new File(workDir, "lib").listFiles();
    if (libs != null) {
      for (int i = 0; i < libs.length; i++) {
        classPath.add(libs[i].toURL());
      }
    }
    
    ClassLoader loader =
      new URLClassLoader(classPath.toArray(new URL[0]));

    Thread.currentThread().setContextClassLoader(loader);
    Class<?> mainClass = Class.forName(mainClassName, true, loader);
    Method main = mainClass.getMethod("main", new Class[] {
      Array.newInstance(String.class, 0).getClass()
    });
    String[] newArgs = Arrays.asList(args)
      .subList(firstArg, args.length).toArray(new String[0]);
    try {
      main.invoke(null, new Object[] { newArgs });
    } catch (InvocationTargetException e) {
      throw e.getTargetException();
    }

2009-12-16 00:09:37

Leave a Comment

Copyright (C) Ching-Shen Chen. All rights reserved.

::: 搜尋 :::

::: 分類 :::

::: Ads :::

::: 最新文章 :::

::: 最新回應 :::

::: 訂閱 :::

Atom feed
Atom Comment