blog.Ring.idv.tw

一個值得研究的領域 - Hadoop

一個值得研究的領域 - Hadoop

.2009/03/21 - 新增「Hadoop on Windows with Eclipse」
.2010/07/07 - 新增「Hadoop Summit 2010 - Presentation Slides」

Hadoop

.Hadoop is a software platform that lets one easily write and run applications that process vast amounts of data.

.Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS) MapReduce divides applications into many small blocks of work. HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster. MapReduce can then process the data where it is located.

.Hadoop is a Lucene sub-project that contains the distributed computing platform that was formerly a part of Nutch.

Hadoop相關資源

Hadoop Summit 2010 - Presentation Slides

Hadoop Resources | Scale Unlimited

NCHC Cloud Computing Research Group

Hadoop Summit and Data-Intensive Computing Symposium Videos and Slides

用 Hadoop 进行分布式并行编程, 第 1 部分

Academic Cluster Computing Initiative

Hadoop学习笔记一 简要介绍

Hadoop学习笔记二 安装部署

Getting Started with Hadoop, Part 1

Yahoo!'s bet on Hadoop

Open Source Distributed Computing: Yahoo's Hadoop Support

Yahoo! Launches World's Largest Hadoop Production Application

Hadoop Wikipedia

Google Code for Distributed Systems

Scaling Powerset using Amazon's EC2 and S3

Running Hadoop MapReduce on Amazon EC2 and Amazon S3

Building an Inverted Index for an Online E-Book Store

Running Hadoop On Ubuntu Linux (Single-Node Cluster)

Running Hadoop On Ubuntu Linux (Multi-Node Cluster)

Yahoo! Hadoop Tutorial

Cloud Computing and Grid Computing

Hadoop on Windows with Eclipse

MapReduce相關資源

Why Should You Care About MapReduce?

Distributed Systems - Google Code University - Google Code

MapReduce

Can Your Programming Language Do This?[中譯]

Chubby相關資源

An Introduction to ZooKeeper Video (Hadoop and Distributed Computing at Yahoo!)

ZooKeeper: Because coordinating distributed systems is a Zoo

Yahoo! Project: Zookeeper

SourceForge.net: ZooKeeper

KFS相關資源

Kosmos Distributed File System (KFS)

2007-11-06 16:59:12

4 comments on "一個值得研究的領域 - Hadoop"

  1. 1. didi 說:

    因為沒實際用過Hadoop,所以請問一下

    我們將一檔案上傳到一個hadoop伺服器上,他是如何儲存這個檔案?

    假設我們有3個node而當我們上傳至其中一個伺服器時Hadoop會自動做切分

    分散到這3個node上嗎?然後可用name Node追蹤資料位置?

    Hadoop並不允許使用者指定檔案在切分段後所擺放的節點?

    以上謝謝回答~~~

    2011-03-23 10:52:30

  2. 2. Shen 說:

    Dear didi,

    1. 基本上將一個檔案寫入HDFS之中,它會根據你的檔案大小進行切分,預設超過64MB(dfs.blocksize)就會被切分,切分後的Block檔案會透過pipeline的方式寫到三台機器上(dfs.replication = 3)

    2. 至於Block的位置是在HDFS啟動後,Datanode才會回報這些Block位置資訊給NameNode,NameNode只記錄所有目錄和檔案的metadata

    3. 目前HDFS沒有提供這方面的API讓你可以指定你的Block要放在哪個節點上

    2011-03-23 11:15:09

  3. 3. elf 說:

    hello...
    使用Hadoop就代表不用備份了嗎?
    在中小型企業中為了減少機器數量使用虛擬化技術後,再使用Hadoop是否會增加機器數量而導致實體機器數量又再度增加了?

    謝謝!

    2011-03-29 09:47:03

  4. 4. Shen 說:

    Dear elf,

    1. 理論上是的,目前除了NameNode之外,其它的資料基本上會分成三份並分散到不同的機器上
    2. 虛擬化和Hadoop沒有直接的關係,中小企業若是為了省錢解決某項問題,可以用Amazon的EC2來啟動一些虛擬機器(如:100台),並裝個Hadoop來分散式處理以節省成本。

    2011-03-29 15:23:33

Leave a Comment

Copyright (C) Ching-Shen Chen. All rights reserved.

::: 搜尋 :::

::: 分類 :::

::: 最新文章 :::

::: 最新回應 :::

::: 訂閱 :::

Atom feed
Atom Comment