参考
http://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn
里面有详细的步骤,在这里记录一下按照每步做的时候遇到的特殊情况。
特别注意:在按照说明之前,要先安装好JDK和R。不然会报错。
首先到spark下载地址中下载
选择的是
1.building with build/mvn
执行./make-distribution.sh --name custom-spark --tgz -Psparkr -Phadoop-2.4 -Phive -Phive-thriftserver -Pyarn
报了一个error:
The goal you specified requires a project to execute but there is no POM in this directory
在提示的最后面有相关参考网址,进去MissingProjectException后发现需要加上pom路径的参数,加上即ok
加的是spark-1.6.0目录下的pom.xml
2.Building a Runnable Distribution
执行./make-distribution.sh --name custom-spark --tgz -Psparkr -Phadoop-2.4 -Phive -Phive-thriftserver -Pyarn
一开始没安装R直接做这步,结果有个error是无法执行Rscript
安装好即ok
3.Setting up Maven’s Memory Usage
执行export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
没有error
4.Specifying the Hadoop Version
参见Hadoop版本介绍:
http://blog.csdn.net/wind520/article/details/37757411
在这里选择的是Hadoop 2.2.X
执行命令mvn -Pyarn -Phadoop-2.2 -DskipTests clean package
报的错跟1一样,重做
5.Building With Hive and JDBC Support
执行mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package
没有遇到error
6.Building for Scala 2.11
执行./dev/change-scala-version.sh 2.11
执行mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -DskipTests clean package
没有遇到error
7.Spark Tests in Maven
看说明,有些test需要spark是打包好的,因此
1.执行mvn -Pyarn -Phadoop-2.3 -DskipTests -Phive -Phive-thriftserver clean package失败,报的错是
bad symbolic reference. A signature in WebUI.class refers to term eclipse in package org which is not available
参考http://stackoverflow.com/questions/16719029/scala-either-and-akka-dispatch-future-missing-while-compiling
加上-Dscala-2.1后再执行,mvn -Pyarn -Phadoop-2.3 -Dscala-2.1 -DskipTests -Phive -Phive-thriftserver clean package
成功
2.