2015年9月18日 星期五

第8章 Apache Spark 安裝指令


8.1 安裝scala
Step1~4 下載安裝 Scala
wget http://www.scala-lang.org/files/archive/scala-2.11.6.tgz
tar xvf scala-2.11.6.tgz
sudo mv scala-2.11.6 /usr/local/scala
Step5 Scala使用者環境變數設定
修改~/.bashrc
sudo gedit ~/.bashrc
輸入下列內容
export SCALA_HOME=/usr/local/scala
export PATH=$PATH:$SCALA_HOME/bin
Step6 使讓~/.bashrc修改生效
source ~/.bashrc
8.2 安裝Spark
Step1~3 下載安裝 Spark
wget http://apache.stu.edu.tw/spark/spark-1.4.0/spark-1.4.0-bin-hadoop2.6.tgz
tar zxf spark-1.4.0-bin-hadoop2.6.tgz
sudo mv spark-1.4.0-bin-hadoop2.6 /usr/local/spark/
Step4 Spark使用者環境變數設定
修改~/.bashrc
sudo gedit ~/.bashrc
輸入下列內容
export SPARK_HOME=/usr/local/spark
export PATH=$PATH:$SPARK_HOME/bin
Step5 使讓~/.bashrc修改生效
source ~/.bashrc

8.4 啟動spark-shell互動介面

spark-shell
8.5 設定spark-shell 顯示訊息

cd /usr/local/spark/conf
cp log4j.properties.template log4j.properties 
修改log4j.properties
sudo gedit log4j.properties
開啟gedit編輯log4j.properties,原本是INFO改為WARN
8.6 啟動Hadoop
start-all.sh
8.7 本機執行Spark-shell 程式
Step1 進入spark-shell
spark-shell  --master local[4]
Step2 讀取本機檔案
val textFile=sc.textFile("/usr/local/spark/README.md")
textFile.count
Step3 讀取HDFS檔案
val textFile = sc.textFile("hdfs://master:9000/user/hduser/wordcount/input/pg5000.txt")
textFile.count
8.8 在Hadoop YARN執行spark-shell
Step1 在Hadoop YARN執行spark-shell
SPARK_JAR=/usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/bin/spark-shell
Step2 讀取本機檔案
錯誤語法
val textFile=sc.textFile("/usr/local/spark/README.md")
textFile.count
正確語法
val textFile=sc.textFile("file:/usr/local/spark/README.md")
textFile.count
Step3 讀取HDFS檔案
val textFile = sc.textFile("/user/hduser/wordcount/input/pg5000.txt")
val textFile = sc.textFile("hdfs://master:9000/user/hduser/wordcount/input/pg5000.txt")
textFile.count
此圖出自Spark官網 https://spark.apache.org/

8 則留言:

  1. 你好
    我在8-10節裡執行"spark-shell --master spark://master:7077" 後再執行後面讀取文字的動作時,會有資源不足的問題發生
    去其他網站查到改直接用"spark-shell" 不接後面的參數就可正常執行了,可以給碰到的朋友參考一下

    回覆刪除
  2. 補充 :

    export SCALA_HOME=/usr/local/scala/scala-2.11.6 檔案所在之目錄/

    export SPARK_HOME=/usr/local/spark/spark-1.5.2-bin-hadoop2.6 檔案所在之目錄/
    cd /usr/local/spark/spark-1.5.2-bin-hadoop2.6 檔案所在之目錄/conf

    以下有類似,依此類推 spark= spark/spark-1.5.2-bin-hadoop2.6 檔案所在之目錄 代替

    回覆刪除
  3. 補充 :
    SPARK_JAR=/usr/local/spark/spark-1.5.2-bin-hadoop2.6/lib/spark-assembly-1.5.2-hadoop2.6.0.jar
    HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/spark-1.5.2-bin-hadoop2.6/bin/spark-shell

    回覆刪除
  4. 補充 :
    SPARK_JAR=/usr/local/spark/spark-1.5.2-bin-hadoop2.6/lib/spark-assembly-1.5.2-hadoop2.6.0.jar
    HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/spark-1.5.2-bin-hadoop2.6/bin/spark-shell

    回覆刪除
  5. 補充 :

    export SCALA_HOME=/usr/local/scala/scala-2.11.6 檔案所在之目錄/

    export SPARK_HOME=/usr/local/spark/spark-1.5.2-bin-hadoop2.6 檔案所在之目錄/
    cd /usr/local/spark/spark-1.5.2-bin-hadoop2.6 檔案所在之目錄/conf

    以下有類似,依此類推 spark= spark/spark-1.5.2-bin-hadoop2.6 檔案所在之目錄 代替

    回覆刪除
  6. 林老師 :

    書上 P.168 在 Hadoop Yarn 執行 spark-shell 讀取本機檔案,執行後,出現找不到檔案;讀取HDFS檔案沒問題

    請問可以協助嗎 ?謝謝!


    --讀取HDFS檔案沒問題--
    scala> val textFile = sc.textFile("hdfs://master:9000/user/hduser/wordcount/input/pg5000.txt")
    textFile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at :21
    scala> textFile.count
    res0: Long = 418
    -----its ok---
    --讀取本機檔案,執行後 ,出現找不到檔案--
    scala> val textFile=sc.textFile("file:/usr/local/spark/spark-1.5.2-bin-hadoop2.6/README.md")
    textFile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[3] at textFile at :21

    scala> textFile.count
    16/09/01 12:13:12 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 2, data2): java.io.FileNotFoundException: File file:/usr/local/spark/spark-1.5.2-bin-hadoop2.6/README.md does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)

    --讀取本機檔案,執行後 ,出現找不到檔案--請協助?謝謝!
    --APP INVENOR 2 學習筆記---請刪掉重複,網路問題-時斷時有所造成--謝謝!

    回覆刪除
  7. 想請教您
    使用版本是 spark 2.2.0 hadoop2.7.2
    在執行SPARK_JAR=/usr/local/spark/lib/spark-assembly-2.2.0-hadoop2.7.2.jar HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/bin/spark-shell

    時會有:14: error: not found: value spark 的問題
    請教您怎麼解決 謝謝~

    回覆刪除
  8. Failed in command spark-shell. How to resolve?

    hduser@master:~$ spark-shell
    WARNING: An illegal reflective access operation has occurred
    WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/usr/local/spark/jars/hadoop-auth-2.6.5.jar) to method sun.security.krb5.Config.getInstance()
    WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
    WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
    WARNING: All illegal access operations will be denied in a future release
    18/08/20 12:47:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

    Failed to initialize compiler: object java.lang.Object in compiler mirror not found.
    ** Note that as of 2.8 scala does not assume use of the java classpath.
    ** For the old behavior pass -usejavacp to scala, or if using a Settings
    ** object programmatically, settings.usejavacp.value = true.

    Failed to initialize compiler: object java.lang.Object in compiler mirror not found.
    ** Note that as of 2.8 scala does not assume use of the java classpath.
    ** For the old behavior pass -usejavacp to scala, or if using a Settings
    ** object programmatically, settings.usejavacp.value = true.
    Exception in thread "main" java.lang.NullPointerException
    at scala.reflect.internal.SymbolTable.exitingPhase(SymbolTable.scala:256)
    at scala.tools.nsc.interpreter.IMain$Request.x$20$lzycompute(IMain.scala:896)
    at scala.tools.nsc.interpreter.IMain$Request.x$20(IMain.scala:895)
    at scala.tools.nsc.interpreter.IMain$Request.headerPreamble$lzycompute(IMain.scala:895)
    at scala.tools.nsc.interpreter.IMain$Request.headerPreamble(IMain.scala:895)
    at scala.tools.nsc.interpreter.IMain$Request$Wrapper.preamble(IMain.scala:918)
    at scala.tools.nsc.interpreter.IMain$CodeAssembler$$anonfun$apply$23.apply(IMain.scala:1337)
    at scala.tools.nsc.interpreter.IMain$CodeAssembler$$anonfun$apply$23.apply(IMain.scala:1336)
    at scala.tools.nsc.util.package$.stringFromWriter(package.scala:64)
    at scala.tools.nsc.interpreter.IMain$CodeAssembler$class.apply(IMain.scala:1336)
    at scala.tools.nsc.interpreter.IMain$Request$Wrapper.apply(IMain.scala:908)
    at scala.tools.nsc.interpreter.IMain$Request.compile$lzycompute(IMain.scala:1002)
    at scala.tools.nsc.interpreter.IMain$Request.compile(IMain.scala:997)
    at scala.tools.nsc.interpreter.IMain.compile(IMain.scala:579)
    at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:567)
    at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
    at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)
    at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)
    at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)
    at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(SparkILoop.scala:79)
    at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(SparkILoop.scala:79)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at

    回覆刪除