7.1 wordCount.java介紹
http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html
7.2 編輯wordCount.java
Step1 建立wordcount目錄
mkdir -p ~/wordcount/input cd ~/wordcountStep2 編輯WordCount.java
gedit WordCount.javaStep3 Import 相關Lib與WordCount class
import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount { }Step4 建立main function
public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); }Step5 建立TokenizerMapper類別
public static class TokenizerMapper extends MapperStep6 建立IntSumReducer類別
public static class IntSumReducer extends Reducer7.3 編譯wordCount.java{ private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } }
Step1 修改編譯所需要的環境變數檔
sudo gedit ~/.bashrc輸入下列內容
export PATH=${JAVA_HOME}/bin:${PATH} export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jarStep2 讓 ~/.bashrc 修改的設定值生效
source ~/.bashrcStep3 開始編譯
hadoop com.sun.tools.javac.Main WordCount.java jar cf wc.jar WordCount*.class ll7.4 上傳文字檔至HDFS
hadoop fs -mkdir -p /user/hduser/wordcount/input cd /usr/local/hadoop ll LICENSE.txt hadoop fs -copyFromLocal LICENSE.txt /user/hduser/wordcount/input hadoop fs -ls /user/hduser/wordcount/input7.5 執行wordCount.java
cd ~/wordcount hadoop jar wc.jar WordCount /user/hduser/wordcount/input/LICENSE.txt /user/hduser/wordcount/output7.6 查看執行結果
hadoop fs -ls /user/hduser/wordcount/output hadoop fs -cat /user/hduser/wordcount/output/part-r-00000
以上內容節錄自這本書有詳細介紹:
Hadoop+Spark大數據巨量分析與機器學習整合開發實戰 http://www.books.com.tw/products/0010695285
您好 我照著書上的步驟 在編譯出現錯誤訊息
回覆刪除WordCount.java:38: error: incompatible types
for (IntWritable val : values) {
^
required: IntWritable
found: Object
Note: WordCount.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
1 error
不確定該如何解決,故來詢問一下,謝謝
可以下載範例程式裡面的WordCount.java檔,將內容複製貼上即可
刪除1.原本範例
回覆刪除public void reduce(Text key, Iterable values,
2.改為書中的內容
public void reduce(Text key, Iterable values,
3.測試OK
不好意思,上一則,貼錯了
回覆刪除1.原本範例
public void reduce(Text key, Iterable values,
2.改為書中的內容
public void reduce(Text key, Iterable values,
3.測試OK
不知內容會被蓋掉了
回覆刪除改為書中的內容
public void reduce(Text key, Iterable values,
回覆刪除改為書中的內容 : Iterable values,
改為書中的內容 :
回覆刪除改為書中的內容 : public void reduce(Text key, Iterable{中括號}IntWritable{中括號}values,
回覆刪除作者已經移除這則留言。
回覆刪除老師,能不能發一下hadoop集群環境的VOF文件,我照著書上執行每一步,但是還是錯誤,每次只有一個節點能存活(live)其他的都不無法顯示
回覆刪除