下載軟體
- 虛擬化軟體
- Linux作業系統
- PieTTY(PuTTY)
- Java
免金鑰登入
- ssh-keygen -t rsa -P ""
- ssh-copy-id root@master
- ssh-copy-id root@slave01
- ssh-copy-id root@slave02
管理介面
- HDFS:
- NameNode
- DataNode
- YARN:
- ResourceManager
- NodeManager
- HBase
- Ambari
- cloudera
啟動服務
- Master
- start-all.sh
- zkServer.sh start
- start-hbase.sh
- hadoop-daemon.sh start datanode
- yarn-daemon.sh start datanode
- Slave
- zkServer.sh start
- hadoop-daemon.sh start datanode
- yarn-daemon.sh start datanode
測試用檔案
- 谷騰堡計劃
- http://www.gutenberg.org
- 傲慢與偏見(Pride and Prejudice)
- http://www.gutenberg.org/ebooks/1342
- wget http://www.gutenberg.org/files/1342/1342-0.txt -O ~/TestWord.txt
- 傲慢與偏見(Pride and Prejudice)
- http://www.gutenberg.org
- 產生大檔案
- dd if=/dev/zero of=TestBig.tmp bs=1G count=1
- dd if=/dev/zero of=TestBig.tmp bs=1G count=3
- dd if=/dev/zero of=TestBig.tmp bs=1G count=5
開放資料(Open Data)
- 政府資料開放平臺
- 臺北市政府資料開放平台
- 氣象資料開放平臺
Word Count
- Linux
- cat TestWord.txt | tr -sc 'a-zA-Z' '\n' | grep -v '^$' | sort | uniq -c | awk '{t=$1;$1=$2;$2=t;print;}' | tr ' ' ',' > ~/TTT
- Hadoop
- hadoop jar /opt/hadoop-2.9.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.0.jar wordcount /TestWord.txt /Output
增加叢集節點
- Hadoop、Zookeeper、HBase
- hadoop-daemon.sh start datanode
- yarn-daemon.sh start nodemanager
- hdfs dfsadmin -refreshNodes
- start-balancer.sh
更改 Secondary NameNode
- hdfs-site.xml
<property>
<name>dfs.namenode.http-address</name>
<value>master:50070</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>slave01:50090</value>
</property>
Hadoop Ecosystem 下載大全
- wget http://apache.stu.edu.tw/hadoop/common/stable/hadoop-2.9.0.tar.gz
- wget http://apache.stu.edu.tw/zookeeper/stable/zookeeper-3.4.10.tar.gz
- wget http://apache.stu.edu.tw/hbase/stable/hbase-1.2.6-bin.tar.gz
- wget http://apache.stu.edu.tw/hive/stable-2/apache-hive-2.3.3-bin.tar.gz
- wget http://apache.stu.edu.tw/pig/latest/pig-0.17.0.tar.gz
- wget http://apache.stu.edu.tw/mahout/0.13.0/apache-mahout-distribution-0.13.0.tar.gz
- wget http://downloads.lightbend.com/scala/2.12.5/scala-2.12.5.tgz
- wget http://apache.stu.edu.tw/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz
- wget http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.6.1.5/ambari.repo
- wget https://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin
Hadoop Ecosystem 設定檔大全
- hosts
192.168.56.100 master master.hadoop
192.168.56.101 slave01 slave01.hadoop
192.168.56.102 slave02 slave02.hadoop
- profile
export JAVA_HOME=/usr/java/jdk1.8.0_161
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/opt/hadoop-2.9.0
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export ZOOKEEPER_HOME=/opt/zookeeper-3.4.10
export PATH=$PATH:$ZOOKEEPER_HOME/bin
export HBASE_HOME=/opt/hbase-1.2.6
export PATH=$PATH:$HBASE_HOME/bin
export HIVE_HOME=/opt/hive-2.3.3
export PATH=$PATH:$HIVE_HOME/bin
export PIG_HOME=/opt/pig-0.17.0
export PATH=$PATH:$PIG_HOME/bin
export MAHOUT_HOME=/opt/mahout-0.13.0
export PATH=$PATH:$MAHOUT_HOME/bin
export HADOOP_CONF_DIR=/opt/hadoop-2.9.0/etc/hadoop
export SCALA_HOME=/opt/scala-2.12.5
export PATH=$PATH:$SCALA_HOME/bin
export SPARK_HOME=/opt/spark-2.3.0
export PATH=$PATH:$SPARK_HOME/bin
- start-dfs.sh
HDFS_DATANODE_USER=root
# HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
- start-yarn.sh
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
- core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-2.9.0/tmp</value>
</property>
- hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///opt/hadoop-2.9.0/NameNode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///opt/hadoop-2.9.0/DataNode</value>
</property>
- yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
- mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
- hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/opt/hbase-1.2.6/tmp</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master,slave01,slave02</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/opt/zookeeper-3.4.10/data</value>
</property>
- hive-site.xml
<property>
<name>system:java.io.tmpdir</name>
<value>/opt/hive-2.3.3/tmp</value>
</property>
<property>
<name>system:user.name</name>
<value>${user.name}</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>a12345</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.mariadb.jdbc.Driver</value>
</property>
Hadoop Ecosystem 測試大全
- Hadoop
- hadoop fs -put ~/TestWord.txt /
- hadoop jar /opt/hadoop-2.9.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.0.jar wordcount /TestWord.txt /Out
- HBase
- hbase shell
- create 'scores','grad','course'
- put 'scores','kath','grad:','1'
- put 'scores','kath','course:math','87'
- get 'scores','kath'
- get 'scores','kath',{COLUMN=>'course:math'}
- hbase shell
- Hive
- hive
- CREATE DATABASE myhive;
- SHOW DATABASES;
- USE myhive;
- SHOW TABLES;
- CREATE TABLE student(name STRING, scores INT);
- echo "kath 92" > student.txt
- echo "john 87" >> student.txt
- LOAD DATA LOCAL INPATH "student.txt" INTO TABLE student;
- SELECT * FROM student;
- hive
- Spark
- echo "My Name is Tony, I am a Teacher, I am Fine, Nice to Meet You." > TestTony.txt
- hadoop fs -ls TestTony.txt /
- spark-shell
- val txtFile=sc.textFile("hdfs://master:9000/TestTony.txt")
- val stringRDD=txtFile.flatMap(line => line.split(" "))
- val countsRDD=stringRDD.map(word => (word,1)).reduceByKey(_ + _)
- countsRDD.sortByKey().collect.foreach(println)
- Hadoop Streaming
- hadoop jar /opt/hadoop-2.9.0/share/hadoop/tools/lib/hadoop-streaming-2.9.0.jar -input /Input -output /Output1 -mapper /bin/cat -reducer /usr/bin/wc
- hadoop jar /opt/hadoop-2.9.0/share/hadoop/tools/lib/hadoop-streaming-2.9.0.jar -input /Input -output /Output2 -mapper org.apache.hadoop.mapred.lib.IdentityMapper -reducer /bin/wc
留言列表