第8章 Apache Spark 安裝指令


8.1 安裝scala
Step1~4 下載安裝 Scala
wget http://www.scala-lang.org/files/archive/scala-2.11.6.tgz
tar xvf scala-2.11.6.tgz
sudo mv scala-2.11.6 /usr/local/scala
Step5 Scala使用者環境變數設定
修改~/.bashrc
sudo gedit ~/.bashrc
輸入下列內容
export SCALA_HOME=/usr/local/scala
export PATH=$PATH:$SCALA_HOME/bin
Step6 使讓~/.bashrc修改生效
source ~/.bashrc
8.2 安裝Spark
Step1~3 下載安裝 Spark
wget http://apache.stu.edu.tw/spark/spark-1.4.0/spark-1.4.0-bin-hadoop2.6.tgz
tar zxf spark-1.4.0-bin-hadoop2.6.tgz
sudo mv spark-1.4.0-bin-hadoop2.6 /usr/local/spark/
Step4 Spark使用者環境變數設定
修改~/.bashrc
sudo gedit ~/.bashrc
輸入下列內容
export SPARK_HOME=/usr/local/spark
export PATH=$PATH:$SPARK_HOME/bin
Step5 使讓~/.bashrc修改生效
source ~/.bashrc

8.4 啟動spark-shell互動介面

spark-shell
8.5 設定spark-shell 顯示訊息

cd /usr/local/spark/conf
cp log4j.properties.template log4j.properties 
修改log4j.properties
sudo gedit log4j.properties
開啟gedit編輯log4j.properties,原本是INFO改為WARN
8.6 啟動Hadoop
start-all.sh
8.7 本機執行Spark-shell 程式
Step1 進入spark-shell
spark-shell  --master local[4]
Step2 讀取本機檔案
val textFile=sc.textFile("/usr/local/spark/README.md")
textFile.count
Step3 讀取HDFS檔案
val textFile = sc.textFile("hdfs://master:9000/user/hduser/wordcount/input/pg5000.txt")
textFile.count
8.8 在Hadoop YARN執行spark-shell
Step1 在Hadoop YARN執行spark-shell
SPARK_JAR=/usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/bin/spark-shell
Step2 讀取本機檔案
錯誤語法
val textFile=sc.textFile("/usr/local/spark/README.md")
textFile.count
正確語法
val textFile=sc.textFile("file:/usr/local/spark/README.md")
textFile.count
Step3 讀取HDFS檔案
val textFile = sc.textFile("/user/hduser/wordcount/input/pg5000.txt")
val textFile = sc.textFile("hdfs://master:9000/user/hduser/wordcount/input/pg5000.txt")
textFile.count
此圖出自Spark官網 https://spark.apache.org/
Share on Google Plus

About kevin

This is a short description in the author block about the author. You edit it by entering text in the "Biographical Info" field in the user admin panel.
    Blogger Comment
    Facebook Comment

7 意見:

  1. 你好
    我在8-10節裡執行"spark-shell --master spark://master:7077" 後再執行後面讀取文字的動作時,會有資源不足的問題發生
    去其他網站查到改直接用"spark-shell" 不接後面的參數就可正常執行了,可以給碰到的朋友參考一下

    回覆刪除
  2. 補充 :

    export SCALA_HOME=/usr/local/scala/scala-2.11.6 檔案所在之目錄/

    export SPARK_HOME=/usr/local/spark/spark-1.5.2-bin-hadoop2.6 檔案所在之目錄/
    cd /usr/local/spark/spark-1.5.2-bin-hadoop2.6 檔案所在之目錄/conf

    以下有類似,依此類推 spark= spark/spark-1.5.2-bin-hadoop2.6 檔案所在之目錄 代替

    回覆刪除
  3. 補充 :
    SPARK_JAR=/usr/local/spark/spark-1.5.2-bin-hadoop2.6/lib/spark-assembly-1.5.2-hadoop2.6.0.jar
    HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/spark-1.5.2-bin-hadoop2.6/bin/spark-shell

    回覆刪除
  4. 補充 :
    SPARK_JAR=/usr/local/spark/spark-1.5.2-bin-hadoop2.6/lib/spark-assembly-1.5.2-hadoop2.6.0.jar
    HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/spark-1.5.2-bin-hadoop2.6/bin/spark-shell

    回覆刪除
  5. 補充 :

    export SCALA_HOME=/usr/local/scala/scala-2.11.6 檔案所在之目錄/

    export SPARK_HOME=/usr/local/spark/spark-1.5.2-bin-hadoop2.6 檔案所在之目錄/
    cd /usr/local/spark/spark-1.5.2-bin-hadoop2.6 檔案所在之目錄/conf

    以下有類似,依此類推 spark= spark/spark-1.5.2-bin-hadoop2.6 檔案所在之目錄 代替

    回覆刪除
  6. 林老師 :

    書上 P.168 在 Hadoop Yarn 執行 spark-shell 讀取本機檔案,執行後,出現找不到檔案;讀取HDFS檔案沒問題

    請問可以協助嗎 ?謝謝!


    --讀取HDFS檔案沒問題--
    scala> val textFile = sc.textFile("hdfs://master:9000/user/hduser/wordcount/input/pg5000.txt")
    textFile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at :21
    scala> textFile.count
    res0: Long = 418
    -----its ok---
    --讀取本機檔案,執行後 ,出現找不到檔案--
    scala> val textFile=sc.textFile("file:/usr/local/spark/spark-1.5.2-bin-hadoop2.6/README.md")
    textFile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[3] at textFile at :21

    scala> textFile.count
    16/09/01 12:13:12 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 2, data2): java.io.FileNotFoundException: File file:/usr/local/spark/spark-1.5.2-bin-hadoop2.6/README.md does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)

    --讀取本機檔案,執行後 ,出現找不到檔案--請協助?謝謝!
    --APP INVENOR 2 學習筆記---請刪掉重複,網路問題-時斷時有所造成--謝謝!

    回覆刪除
  7. 想請教您
    使用版本是 spark 2.2.0 hadoop2.7.2
    在執行SPARK_JAR=/usr/local/spark/lib/spark-assembly-2.2.0-hadoop2.7.2.jar HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/bin/spark-shell

    時會有:14: error: not found: value spark 的問題
    請教您怎麼解決 謝謝~

    回覆刪除