8.1 安裝scala
Step1~4 下載安裝 Scala
wget http://www.scala-lang.org/files/archive/scala-2.11.6.tgz tar xvf scala-2.11.6.tgz sudo mv scala-2.11.6 /usr/local/scalaStep5 Scala使用者環境變數設定
修改~/.bashrc
sudo gedit ~/.bashrc輸入下列內容
export SCALA_HOME=/usr/local/scala export PATH=$PATH:$SCALA_HOME/binStep6 使讓~/.bashrc修改生效
source ~/.bashrc8.2 安裝Spark
Step1~3 下載安裝 Spark
wget http://apache.stu.edu.tw/spark/spark-1.4.0/spark-1.4.0-bin-hadoop2.6.tgz tar zxf spark-1.4.0-bin-hadoop2.6.tgz sudo mv spark-1.4.0-bin-hadoop2.6 /usr/local/spark/Step4 Spark使用者環境變數設定
修改~/.bashrc
sudo gedit ~/.bashrc輸入下列內容
export SPARK_HOME=/usr/local/spark export PATH=$PATH:$SPARK_HOME/binStep5 使讓~/.bashrc修改生效
source ~/.bashrc
8.4 啟動spark-shell互動介面
spark-shell8.5 設定spark-shell 顯示訊息
cd /usr/local/spark/conf cp log4j.properties.template log4j.properties修改log4j.properties
sudo gedit log4j.properties開啟gedit編輯log4j.properties,原本是INFO改為WARN
8.6 啟動Hadoop
start-all.sh8.7 本機執行Spark-shell 程式
Step1 進入spark-shell
spark-shell --master local[4]Step2 讀取本機檔案
val textFile=sc.textFile("/usr/local/spark/README.md") textFile.countStep3 讀取HDFS檔案
val textFile = sc.textFile("hdfs://master:9000/user/hduser/wordcount/input/pg5000.txt") textFile.count8.8 在Hadoop YARN執行spark-shell
Step1 在Hadoop YARN執行spark-shell
SPARK_JAR=/usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/bin/spark-shellStep2 讀取本機檔案
錯誤語法
val textFile=sc.textFile("/usr/local/spark/README.md") textFile.count正確語法
val textFile=sc.textFile("file:/usr/local/spark/README.md") textFile.countStep3 讀取HDFS檔案
val textFile = sc.textFile("/user/hduser/wordcount/input/pg5000.txt") val textFile = sc.textFile("hdfs://master:9000/user/hduser/wordcount/input/pg5000.txt") textFile.count此圖出自Spark官網 https://spark.apache.org/
你好
回覆刪除我在8-10節裡執行"spark-shell --master spark://master:7077" 後再執行後面讀取文字的動作時,會有資源不足的問題發生
去其他網站查到改直接用"spark-shell" 不接後面的參數就可正常執行了,可以給碰到的朋友參考一下
補充 :
回覆刪除export SCALA_HOME=/usr/local/scala/scala-2.11.6 檔案所在之目錄/
export SPARK_HOME=/usr/local/spark/spark-1.5.2-bin-hadoop2.6 檔案所在之目錄/
cd /usr/local/spark/spark-1.5.2-bin-hadoop2.6 檔案所在之目錄/conf
以下有類似,依此類推 spark= spark/spark-1.5.2-bin-hadoop2.6 檔案所在之目錄 代替
補充 :
回覆刪除SPARK_JAR=/usr/local/spark/spark-1.5.2-bin-hadoop2.6/lib/spark-assembly-1.5.2-hadoop2.6.0.jar
HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/spark-1.5.2-bin-hadoop2.6/bin/spark-shell
補充 :
回覆刪除SPARK_JAR=/usr/local/spark/spark-1.5.2-bin-hadoop2.6/lib/spark-assembly-1.5.2-hadoop2.6.0.jar
HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/spark-1.5.2-bin-hadoop2.6/bin/spark-shell
補充 :
回覆刪除export SCALA_HOME=/usr/local/scala/scala-2.11.6 檔案所在之目錄/
export SPARK_HOME=/usr/local/spark/spark-1.5.2-bin-hadoop2.6 檔案所在之目錄/
cd /usr/local/spark/spark-1.5.2-bin-hadoop2.6 檔案所在之目錄/conf
以下有類似,依此類推 spark= spark/spark-1.5.2-bin-hadoop2.6 檔案所在之目錄 代替
林老師 :
回覆刪除書上 P.168 在 Hadoop Yarn 執行 spark-shell 讀取本機檔案,執行後,出現找不到檔案;讀取HDFS檔案沒問題
請問可以協助嗎 ?謝謝!
--讀取HDFS檔案沒問題--
scala> val textFile = sc.textFile("hdfs://master:9000/user/hduser/wordcount/input/pg5000.txt")
textFile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at :21
scala> textFile.count
res0: Long = 418
-----its ok---
--讀取本機檔案,執行後 ,出現找不到檔案--
scala> val textFile=sc.textFile("file:/usr/local/spark/spark-1.5.2-bin-hadoop2.6/README.md")
textFile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[3] at textFile at :21
scala> textFile.count
16/09/01 12:13:12 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 2, data2): java.io.FileNotFoundException: File file:/usr/local/spark/spark-1.5.2-bin-hadoop2.6/README.md does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
--讀取本機檔案,執行後 ,出現找不到檔案--請協助?謝謝!
--APP INVENOR 2 學習筆記---請刪掉重複,網路問題-時斷時有所造成--謝謝!
想請教您
回覆刪除使用版本是 spark 2.2.0 hadoop2.7.2
在執行SPARK_JAR=/usr/local/spark/lib/spark-assembly-2.2.0-hadoop2.7.2.jar HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/bin/spark-shell
時會有:14: error: not found: value spark 的問題
請教您怎麼解決 謝謝~
Failed in command spark-shell. How to resolve?
回覆刪除hduser@master:~$ spark-shell
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/usr/local/spark/jars/hadoop-auth-2.6.5.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
18/08/20 12:47:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Failed to initialize compiler: object java.lang.Object in compiler mirror not found.
** Note that as of 2.8 scala does not assume use of the java classpath.
** For the old behavior pass -usejavacp to scala, or if using a Settings
** object programmatically, settings.usejavacp.value = true.
Failed to initialize compiler: object java.lang.Object in compiler mirror not found.
** Note that as of 2.8 scala does not assume use of the java classpath.
** For the old behavior pass -usejavacp to scala, or if using a Settings
** object programmatically, settings.usejavacp.value = true.
Exception in thread "main" java.lang.NullPointerException
at scala.reflect.internal.SymbolTable.exitingPhase(SymbolTable.scala:256)
at scala.tools.nsc.interpreter.IMain$Request.x$20$lzycompute(IMain.scala:896)
at scala.tools.nsc.interpreter.IMain$Request.x$20(IMain.scala:895)
at scala.tools.nsc.interpreter.IMain$Request.headerPreamble$lzycompute(IMain.scala:895)
at scala.tools.nsc.interpreter.IMain$Request.headerPreamble(IMain.scala:895)
at scala.tools.nsc.interpreter.IMain$Request$Wrapper.preamble(IMain.scala:918)
at scala.tools.nsc.interpreter.IMain$CodeAssembler$$anonfun$apply$23.apply(IMain.scala:1337)
at scala.tools.nsc.interpreter.IMain$CodeAssembler$$anonfun$apply$23.apply(IMain.scala:1336)
at scala.tools.nsc.util.package$.stringFromWriter(package.scala:64)
at scala.tools.nsc.interpreter.IMain$CodeAssembler$class.apply(IMain.scala:1336)
at scala.tools.nsc.interpreter.IMain$Request$Wrapper.apply(IMain.scala:908)
at scala.tools.nsc.interpreter.IMain$Request.compile$lzycompute(IMain.scala:1002)
at scala.tools.nsc.interpreter.IMain$Request.compile(IMain.scala:997)
at scala.tools.nsc.interpreter.IMain.compile(IMain.scala:579)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:567)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)
at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)
at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)
at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(SparkILoop.scala:79)
at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(SparkILoop.scala:79)
at scala.collection.immutable.List.foreach(List.scala:381)
at