Spark 从入门到精通
环境搭建
准备工作
创建安装目录
mkdir /opt/soft
cd /opt/soft
下载scala
wget https://downloads.lightbend.com/scala/2.13.10/scala-2.13.10.tgz -P /opt/soft
解压scala
tar -zxvf scala-2.13.10.tgz
修改scala目录名称
mv scala-2.13.10 scala-2
下载spark
wget https://dlcdn.apache.org/spark/spark-3.4.0/spark-3.4.0-bin-hadoop3-scala2.13.tgz -P /opt/soft
解压spark
tar -zxvf spark-3.4.0-bin-hadoop3-scala2.13.tgz
修改目录名称
mv spark-3.4.0-bin-hadoop3-scala2.13 spark3
修改环境遍历
vim /etc/profile
export JAVA_HOME=/opt/soft/jdk8
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export ZOOKEEPER_HOME=/opt/soft/zookeeper
export HADOOP_HOME=/opt/soft/hadoop3
export HADOOP_INSTALL=${HADOOP_HOME}
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
export HIVE_HOME=/opt/soft/hive3
export HCAT_HOME=/opt/soft/hive3/hcatalog
export SQOOP_HOME=/opt/soft/sqoop-1
export FLUME_HOME=/opt/soft/flume
export HBASE_HOME=/opt/soft/hbase2
export PHOENIX_HOME=/opt/soft/phoenix
export SCALA_HOME=/opt/soft/scala-2
export SPARK_HOME=/opt/soft/spark3
export SPARKPYTHON=/opt/soft/spark3/python
export PATH=$PATH:$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HCAT_HOME/bin:$SQOOP_HOME/bin:$FLUME_HOME/bin:$HBASE_HOME/bin:$PHOENIX_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin:$SPARKPYTHON
source /etc/profile
Local模式
scala java
启动
spark-shell
页面地址:http://spark01:4040
![sparkl local spark-shell
退出
:quit
pyspark
启动
pyspark
页面地址:http://spark01:4040
退出
quit() or Ctrl-D
本地模式提交应用
在spark目录下执行
bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master local[2] \
./examples/jars/spark-examples_2.13-3.4.0.jar \
10
- –class表示要执行程序的主类,此处可以更换为咱们自己写的应用程序
- –master local[2] 部署模式,默认为本地模式,数字表示分配的虚拟CPU核数量
- spark-examples_2.13-3.4.0.jar 运行的应用类所在的jar包,实际使用时,可以设定为咱们自己打的jar包
- 数字10表示程序的入口参数,用于设定当前应用的任务数量
Standalone模式
编写核心配置文件
cont目录下
cd /opt/soft/spark3/conf
cp spark-env.sh.template spark-env.sh
vim spark-env.sh
export JAVA_HOME=/opt/soft/jdk8
export HADOOP_HOME=/opt/soft/hadoop3
export HADOOP_CONF_DIR=/opt/soft/hadoop3/etc/hadoop
export JAVA_LIBRAY_PATH=/opt/soft/hadoop3/lib/native
export SPARK_MASTER_HOST=spark01
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_MEMORY=4g
export SPARK_WORKER_CORES=4
export SPARK_MASTER_WEBUI_PORT=6633
编辑slaves
cp workers.template workers
vim workers
spark01
spark02
spark03
配置历史日志
cp spark-defaults.conf.template spark-defaults.conf
vim spark-defaults.conf
spark.eventLog.enabled true
spark.eventLog.dir hdfs://lihaozhe/spark-log
hdfs dfs -mkdir /spark-log
vim spark-env.sh
export SPARK_HISTORY_OPTS="
-Dspark.history.ui.port=18080
-Dspark.history.retainedApplications=30
-Dspark.history.fs.logDirectory=hdfs://lihaozhe/spark-log"
修改启动文件名称
mv sbin/start-all.sh sbin/start-spark.sh
mv sbin/stop-all.sh sbin/stop-spark.sh
分发搭配其他节点
scp -r /opt/soft/spark3 root@spark02:/opt/soft
scp -r /opt/soft/spark3 root@spark03:/opt/soft
scp -r /etc/profile root@spark02:/etc
scp -r /etc/profile root@spark03:/etc
在其它节点刷新环境遍历
source /etc/profile
启动
start-spark.sh
start-history-server.sh
webui
http://spark01:6633
http://spark01:18080
提交作业到集群
bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://spark01:7077 \
./examples/jars/spark-examples_2.13-3.4.0.jar \
10
HA模式
编写核心配置文件
cont目录下
cd /opt/soft/spark3/conf
cp spark-env.sh.template spark-env.sh
vim spark-env.sh
export JAVA_HOME=/opt/soft/jdk8
export HADOOP_HOME=/opt/soft/hadoop3
export HADOOP_CONF_DIR=/opt/soft/hadoop3/etc/hadoop
export JAVA_LIBRAY_PATH=/opt/soft/hadoop3/lib/native
SPARK_DAEMON_JAVA_OPTS="
-Dspark.deploy.recoveryMode=ZOOKEEPER
-Dspark.deploy.zookeeper.url=spark01:2181,spark02:2181,spark03:2181
-Dspark.deploy.zookeeper.dir=/spark3"
export SPARK_WORKER_MEMORY=4g
export SPARK_WORKER_CORES=4
export SPARK_MASTER_WEBUI_PORT=6633
hdfs dfs -mkdir /spark3
编辑slaves
cp workers.template workers
vim workers
spark01
spark02
spark03
配置历史日志
cp spark-defaults.conf.template spark-defaults.conf
vim spark-defaults.conf
spark.eventLog.enabled true
spark.eventLog.dir hdfs://lihaozhe/spark-log
hdfs dfs -mkdir /spark-log
vim spark-env.sh
export SPARK_HISTORY_OPTS="
-Dspark.history.ui.port=18080
-Dspark.history.retainedApplications=30
-Dspark.history.fs.logDirectory=hdfs://lihaozhe/spark-log"
修改启动文件名称
mv sbin/start-all.sh sbin/start-spark.sh
mv sbin/stop-all.sh sbin/stop-spark.sh
分发搭配其他节点
scp -r /opt/soft/spark3 root@spark02:/opt/soft
scp -r /opt/soft/spark3 root@spark03:/opt/soft
scp -r /etc/profile root@spark02:/etc
scp -r /etc/profile root@spark03:/etc
在其它节点刷新环境遍历
source /etc/profile
启动
start-spark.sh
start-history-server.sh
webui
http://spark01:6633
http://spark01:18080
提交作业到集群
bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://spark01:7077 \
./examples/jars/spark-examples_2.13-3.4.0.jar \
10
提交作业到Yarn
bin/spark-submit --master yarn \
--class org.apache.spark.examples.SparkPi ./examples/jars/spark-examples_2.13-3.4.0.jar 10
spark-code
spark-core
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.lihaozhe</groupId>
<artifactId>spark-code</artifactId>
<version>1.0.0</version>
<properties>
<jdk.version>1.8</jdk.version>
<scala.version>2.13.10</scala.version>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<commons-lang3.version>3.12.0</commons-lang3.version>
<java-testdata-generator.version>1.1.2</java-testdata-generator.version>
</properties>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>com.github.binarywang</groupId>
<artifactId>java-testdata-generator</artifactId>
<version>${java-testdata-generator.version}</version>
</dependency>
<!-- spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.13</artifactId>
<version>3.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.13</artifactId>
<version>3.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.13</artifactId>
<version>3.4.0</version>
</dependency>
<!-- junit-jupiter-api -->
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-api</artifactId>
<version>5.9.3</version>
<scope>test</scope>
</dependency>
<!-- junit-jupiter-engine -->
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-engine</artifactId>
<version>5.9.3</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.18.26</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j-impl</artifactId>
<version>2.14.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>3.3.5</version>
</dependency>
<!-- commons-pool2 -->
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-pool2</artifactId>
<version>2.11.1</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>3.1.3</version>
</dependency>
<!-- commons-lang3 -->
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>${commons-lang3.version}</version>
</dependency>
</dependencies>
<build>
<finalName>${project.artifactId}</finalName>
<!--<outputDirectory>../package</outputDirectory>-->
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.11.0</version>
<configuration>
<!-- 设置编译字符编码 -->
<encoding>UTF-8</encoding>
<!-- 设置编译jdk版本 -->
<source>${jdk.version}</source>
<target>${jdk.version}</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-clean-plugin</artifactId>
<version>3.2.0</version>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-resources-plugin</artifactId>
<version>3.3.1</version>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-war-plugin</artifactId>
<version>3.3.2</version>
</plugin>
<!-- 编译级别 -->
<!-- 打包的时候跳过测试junit begin -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.22.2</version>
<configuration>
<skip>true</skip>
</configuration>
</plugin>
<!-- 该插件用于将Scala代码编译成class文件 -->
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>4.8.1</version>
<configuration>
<scalaCompatVersion>2.13</scalaCompatVersion>
<scalaVersion>2.13.10</scalaVersion>
</configuration>
<executions>
<execution>
<goals>
<goal>testCompile</goal>
</goals>
</execution>
<execution>
<id>compile-scala</id>
<phase>compile</phase>
<goals>
<goal>add-source</goal>
<goal>compile</goal>
</goals>
</execution>
<execution>
<id>test-compile-scala</id>
<phase>test-compile</phase>
<goals>
<goal>add-source</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>3.5.0</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
hdfs-conf
在 resources 目录下存放 hdfs 核心配置文件 core-site.xml 和hdfs-site.xml
被引入的hdfs配置文件为测试集群配置文件
由于生产环境与测试环境不同,项目打包的时候排除hdfs配置文件
rdd
数据集方式构建RDD
package com.lihaozhe.course01;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import java.util.Arrays;
import java.util.List;
/**
* 借助并行数据集 Parallelized Collections 构建 RDD
*
* @author 李昊哲
* @version 1.0.0 2023/5/12 下午5:23
*/
public class JavaDemo01 {
public static void main(String[] args) {
String appName = "rdd";
// SparkConf conf = new SparkConf().setAppName(appName).setMaster("local");
// spark基础配置
SparkConf conf = new SparkConf().setAppName(appName);
// 本地运行
conf.setMaster("local");
try (JavaSparkContext sc = new JavaSparkContext(conf)) {
List<Integer> data = Arrays.asList(1, 2, 3, 4, 5);
// 借助并行数据集 Parallelized Collections 构建 RDD
JavaRDD<Integer> javaRDD = sc.parallelize(data);
List<Integer> result = javaRDD.collect();
result.forEach(System.out::println);
}
}
}
package com.lihaozhe.course01
import org.apache.spark.{SparkConf, SparkContext}
/**
* 借助并行数据集 Parallelized Collections 构建 RDD
*
* @author 李昊哲
* @version 1.0.0 2023/5/12 下午4:59
*/
object ScalaDemo01 {
def main(args: Array[String]): Unit = {
val appName: String = "rdd"
// val conf = new SparkConf().setAppName(appName).setMaster("local")
// spark基础配置
val conf = new SparkConf().setAppName(appName)
// 本地运行
conf.setMaster("local")
// 构建 SparkContext spark 上下文
val sc = new SparkContext(conf)
val data = Array(1, 2, 3, 4, 5)
// 从集合中创建 RDD
// Parallelized Collections
val distData = sc.parallelize(data)
data.foreach(println)
}
}
本地文件构建RDD
words.txt
linux shell
java mysql jdbc
hadoop hdfs mapreduce
hive presto
flume kafka
hbase phoenix
scala spark
sqoop flink
linux shell
java mysql jdbc
hadoop hdfs mapreduce
hive presto
flume kafka
hbase phoenix
scala spark
sqoop flink
base phoenix
scala spark
sqoop flink
linux shell
java mysql jdbc
hadoop hdfs mapreduce
java mysql jdbc
hadoop hdfs mapreduce
hive presto
flume kafka
hbase phoenix
scala spark
java mysql jdbc
hadoop hdfs mapreduce
java mysql jdbc
hadoop hdfs mapreduce
hive presto
package com.lihaozhe.course01;
import org.apache.spark.Partition;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import java.util.List;
/**
* 从文件中读取数据集创建 RDD
*
* @author 李昊哲
* @version 1.0.0 2023/5/12 下午5:23
*/
public class JavaDemo02 {
public static void main(String[] args) {
String appName = "rdd";
// SparkConf conf = new SparkConf().setAppName(appName).setMaster("local");
// spark基础配置
SparkConf conf = new SparkConf().setAppName(appName);
// 本地运行
conf.setMaster("local");
try (JavaSparkContext sc = new JavaSparkContext(conf)) {
// 从文件中读取数据集创建 RDD
JavaRDD<String> javaRDD = sc.textFile("file:///home/lsl/IdeaProjects/spark-code/words.txt");
List<String> lines = javaRDD.collect();
lines.forEach(System.out::println);
}
}
}
package com.lihaozhe.course01
import org.apache.spark.{SparkConf, SparkContext}
/**
* 从文件中读取数据集创建 RDD
*
* @author 李昊哲
* @version 1.0.0 2023/5/12 下午4:59
*/
object ScalaDemo02 {
def main(args: Array[String]): Unit = {
val appName: String = "rdd"
// val conf = new SparkConf().setAppName(appName).setMaster("local")
// spark基础配置
val conf = new SparkConf().setAppName(appName)
// 本地运行
conf.setMaster("local")
// 构建 SparkContext spark 上下文
val sc = new SparkContext(conf)
// 从文件中读取数据集创建 RDD
val distFile = sc.textFile("file:///home/lsl/IdeaProjects/spark-code/words.txt")
// 遍历输出
distFile.foreach(println)
}
}
HDFS文件构建RDD
package com.lihaozhe.course01;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import java.util.List;
/**
* 从文件中读取数据集创建 RDD
*
* @author 李昊哲
* @version 1.0.0 2023/5/12 下午5:23
*/
public class JavaDemo03 {
public static void main(String[] args) {
System.setProperty("HADOOP_USER_NAME", "root");
String appName = "rdd";
// SparkConf conf = new SparkConf().setAppName(appName).setMaster("local");
// spark基础配置
SparkConf conf = new SparkConf().setAppName(appName);
// 本地运行
conf.setMaster("local");
try (JavaSparkContext sc = new JavaSparkContext(conf)) {
// 从文件中读取数据集创建 RDD
// JavaRDD<String> javaRDD = sc.textFile("hdfs://lihaozhe/data/words.txt");
JavaRDD<String> javaRDD = sc.textFile("data/words.txt");
List<String> lines = javaRDD.collect();
lines.forEach(System.out::println);
}
}
}
package com.lihaozhe.course01
import org.apache.spark.{SparkConf, SparkContext}
/**
* 从文件中读取数据集创建 RDD
*
* @author 李昊哲
* @version 1.0.0 2023/5/12 下午4:59
*/
object ScalaDemo03 {
def main(args: Array[String]): Unit = {
System.setProperty("HADOOP_USER_NAME", "root")
val appName: String = "rdd"
// val conf = new SparkConf().setAppName(appName).setMaster("local")
// spark基础配置
val conf = new SparkConf().setAppName(appName)
// 本地运行
conf.setMaster("local")
// 构建 SparkContext spark 上下文
val sc = new SparkContext(conf)
// 从文件中读取数据集创建 RDD
// val distFile = sc.textFile("hdfs://lihaozhe/data/words.txt")
// hdfs://lihaozhe/user/root/data/words.txt
val distFile = sc.textFile("hdata/words.txt")
// 遍历输出
distFile.foreach(println)
}
}
WordCount
JavaWordCount
package com.lihaozhe.course02;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import scala.Tuple2;
import java.util.Arrays;
import java.util.List;
/**
* @author 李昊哲
* @version 1.0.0 2023/5/12 下午9:37
*/
public class JavaWordCount {
public static void main(String[] args) {
System.setProperty("HADOOP_USER_NAME", "root");
String appName = "rdd";
// SparkConf conf = new SparkConf().setAppName(appName).setMaster("local");
// spark基础配置
SparkConf conf = new SparkConf().setAppName(appName);
// 本地运行
conf.setMaster("local");
try (JavaSparkContext sc = new JavaSparkContext(conf)) {
// 从文件中读取数据集创建 RDD
JavaRDD<String> javaRDD = sc.textFile("data/words.txt");
// FlatMapFunction 方法中 第一个参数输入RDD数据类型 第二个参数为输出RDD数据类型
// javaRDD.flatMap(new FlatMapFunction<String, String>() {
// @Override
// public Iterator<String> call(String s) throws Exception {
// return null;
// }
// });
JavaRDD<String> wordsRdd = javaRDD.flatMap((FlatMapFunction<String, String>) line -> Arrays.asList(line.split(" ")).listIterator());
// (love,1)
// wordsRdd.mapToPair(new PairFunction<String, String, Integer>() {
// @Override
// public Tuple2<String, Integer> call(String s) throws Exception {
// return null;
// }
// });
JavaPairRDD<String, Integer> pairRDD = wordsRdd.mapToPair((PairFunction<String, String, Integer>) word -> new Tuple2<>(word, 1));
// 根据 word 分组
// pairRDD.reduceByKey(new Function2<Integer, Integer, Integer>() {
// @Override
// public Integer call(Integer integer, Integer integer2) throws Exception {
// return null;
// }
// });
// pairRDD.reduceByKey((Function2<Integer, Integer, Integer>) (x, y) -> x + y);
JavaPairRDD<String, Integer> javaPairRDD = pairRDD.reduceByKey((Function2<Integer, Integer, Integer>) Integer::sum);
List<Tuple2<String, Integer>> collect = javaPairRDD.collect();
collect.forEach(System.out::println);
javaPairRDD.saveAsTextFile("result");
}
}
}
ScalaWordCount
package com.lihaozhe.course02
import org.apache.spark.{SparkConf, SparkContext}
/**
* @author 李昊哲
* @version 1.0.0 2023/5/12 下午9:38
*/
object ScalaWordCount01 {
def main(args: Array[String]): Unit = {
System.setProperty("HADOOP_USER_NAME", "root")
// val conf = new SparkConf().setAppName("词频统计").setMaster("local")
val conf = new SparkConf().setAppName("词频统计")
conf.setMaster("local")
val sc = new SparkContext(conf)
val content = sc.textFile("data/words.txt")
// 遍历输出
content.foreach(println)
// 讲结果转为 flatMap
val words = content.flatMap(_.split(" "))
words.foreach(println)
// 转为 key value结果
// (love,Seq(love, love, love, love, love))
val wordGroup = words.groupBy(word => word)
wordGroup.foreach(println)
val wordCount = wordGroup.mapValues(_.size)
wordCount.foreach(println)
// 将计算结果落地本地磁盘
wordCount.saveAsTextFile("result")
sc.stop()
}
}
package com.lihaozhe.course02
import org.apache.spark.{SparkConf, SparkContext}
/**
* @author 李昊哲
* @version 1.0.0 2023/5/12 下午9:38
*/
object ScalaWordCount02 {
def main(args: Array[String]): Unit = {
System.setProperty("HADOOP_USER_NAME", "root")
// val conf = new SparkConf().setAppName("词频统计").setMaster("local")
val conf = new SparkConf().setAppName("词频统计")
conf.setMaster("local")
val sc = new SparkContext(conf)
val content = sc.textFile("data/words.txt")
// 遍历输出
content.foreach(println)
// 讲结果转为 flatMap
val words = content.flatMap(_.split(" "))
words.foreach(println)
// 转为 key value结果
// 通过隐式转换 讲列表中的单词 转为map 返回
// (China,1)
val wordMap = words.map((_, 1))
wordMap.foreach(println)
val wordCount = wordMap.reduceByKey(_ + _)
wordCount.foreach(println)
// 将计算结果落地本地磁盘
wordCount.saveAsTextFile("result")
sc.stop()
}
}
项目打包发布
mvn package
上传jar文件到集群
在集群上提交
spark-submit --master yarn --class com.lihaozhe.course02.JavaWordCount spark-code.jar
spark-submit --master yarn --class com.lihaozhe.course02.ScalaWordCount01 spark-code.jar
spark-submit --master yarn --class com.lihaozhe.course02.ScalaWordCount02 spark-code.jar