第一次WordCount小游戏

在idea客户端上面进行WordCount统计

1:创建mapper类继承mapper(选hadoop类型)

public class wordcountMapper extends Mapper<LongWritable, Text,Text, IntWritable> {//LongWritable(表示mapper输入数据的key每一行数据的编号)
//Text(表示输入数据的value,相当于每一行数据上面的所有单词)
//Text(是输出的key,指一个单词)
//IntWritable(表示每个单词计1次)}

1.2重写map方法 ctrl+o 选map

public class wordcountMapper extends Mapper<LongWritable, Text,Text, IntWritable> {@Overrideprotected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {//key 指每一行的编号(偏移量)//value 一串单词//context上下文输出}
}

1.3 mapper步骤

public class wordcountMapper extends Mapper<LongWritable, Text,Text, IntWritable> {@Overrideprotected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {//1获取行数据String line = value.toString();//2切割空格String[] words = line.split(" ");for (String word : words) {//判断是否有两个空格,直接跳过if (word.equals("")){continue;}//3输出每一个单词context.write(new Text(word),new IntWritable(1));}}
}

2:创建reduce类继承reduce(选hadoop类型)

public class wordcountReduce extends Reducer<Text, IntWritable,Text,IntWritable> {//Text(表示接收mapper输出的单个单词)//IntWritable(表示每一个单词计1次)//Text(表示reduce输出的单个单词)//IntWritable(表示reduce输出每个单词的统计总个数)}

2.2重写reduce方法 ctrl+o 选reduce

public class wordcountReduce extends Reducer<Text, IntWritable,Text,IntWritable> {@Overrideprotected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {//key 每一个单词//values 单词个数的list集合 {1,1,1,1,1,1,1}//context上下文输出}
}

2.3reduce 步骤

public class wordcountReduce extends Reducer<Text, IntWritable,Text,IntWritable> {@Overrideprotected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {//1统计所有单词个数int count = 0;for (IntWritable value : values) {count += value.get();}//2输出所有单词context.write(key, new IntWritable(count));}
}

3:创建主类 driver

public class wordcountDriver {public static void main(String[] args) throws Exception {// 1获取job的对象信息Configuration conf = new Configuration();Job job = Job.getInstance(conf);//2加载jar位置job.setJarByClass(wordcountDriver.class);//3设置mapper和reduce的class类job.setMapperClass(wordcountMapper.class);job.setReducerClass(wordcountReduce.class);//4设置mapper输出类型job.setMapOutputKeyClass(Text.class);job.setMapOutputValueClass(IntWritable.class);//5设置最终端数据类型job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);//6设置输入数据和输出数据路径//处理数据所在位置FileInputFormat.setInputPaths(job, "hdfs://192.168.100.100:8020/hello/mapreduce/test.txt");//处理完数据保存的位置FileOutputFormat.setOutputPath(job, new Path("hdfs://192.168.100.100:8020/hello/mapreduce/wordcountout/"));//7提交boolean result = job.waitForCompletion(true);System.exit(result ? 0 :  1);}
}

3.1 检查test.txt(未处理)

运行driver

运行结果

web查看并下载结果(ip:50070)

打开查看

在linux集群上面进行WordCount统计

1:修改driver里面的输入数据和输出数据路径

public class wordcountDriver {public static void main(String[] args) throws Exception {// 1获取job的对象信息Configuration conf = new Configuration();Job job = Job.getInstance(conf);//2加载jar位置job.setJarByClass(wordcountDriver.class);//3设置mapper和reduce的class类job.setMapperClass(wordcountMapper.class);job.setReducerClass(wordcountReduce.class);//4设置mapper输出类型job.setMapOutputKeyClass(Text.class);job.setMapOutputValueClass(IntWritable.class);//5设置最终端数据类型job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);//6设置输入数据和输出数据路径//处理数据所在位置//FileInputFormat.setInputPaths(job, "hdfs://192.168.100.100:8020/hello/mapreduce/test.txt");//处理完数据保存的位置//FileOutputFormat.setOutputPath(job, new Path("hdfs://192.168.100.100:8020/hello/mapreduce/wordcountout/"));FileInputFormat.setInputPaths(job,new Path(args [0]));FileOutputFormat.setOutputPath(job,new Path(args [1]));//7提交boolean result = job.waitForCompletion(true);System.exit(result ? 0 :  1);}
}

2:idea打jar包(跳过打jar的流程)

3:把jar包导入到Linux(导入的方式也有 N种)

4:把处理前的text文件传入hdfs

5: 通过 hadoop jar *******.jar /处理前文件路径input/text.txt /输出结果路径output

6:运行结果

7:web下载处理后文件

8:查看处理后文件

MapReduce之WordCount字数统计相关推荐

MapReduce示例——WordCount（统计单词）
MapReduce示例--WordCount(统计单词) 过程分析统计单词,把数据中的单词分别统计出出现的次数过程图(图片源自网络): 实现Mapper.Reducer.Driver WordCo ...
Hadoop | MapReduce之 WordCount词频统计
WordCount词频统计词频统计 WordCountMap.java // Map类,继承于org.apache.hadoop.mapreduce.Mapper; public class Wor ...
Hadoop实例之利用MapReduce实现Wordcount单词统计 (附源代码)
大致思路是将hdfs上的文本作为输入,MapReduce通过InputFormat会将文本进行切片处理,并将每行的首字母相对于文本文件的首地址的偏移量作为输入键值对的key,文本内容作为输入键值对的v ...
Hadoop 2.x MapReduce（MR V1）字数统计示例
Before reading this post, please go through my previous post at "How MapReduce Algorithm Works& ...
Akka的字数统计MapReduce
在我与Akka的日常工作中,我最近写了一个字数映射表简化示例. 本示例实现了Map Reduce模型,该模型非常适合横向扩展设计方法. 流客户端系统(FileReadActor)读取文本文件,并将每 ...
MapReduce实现改进版WordCount词频统计
新手入门MapReduce实现改进版WordCount词频统计一.实验任务要求本实验是为了实现改进版的词频统计WordCount.要求根据所给的英文名著数据集和停用词表,统计英文名著数据集中词频, ...
第一个MapReduce程序-------WordCount
本关任务词频统计是最能体现MapReduce思想的程序,结构简单,上手容易. 词频统计的大致功能是:统计单个或者多个文本文件中每个单词出现的次数,并将每个单词及其出现频率按照<k,v>键 ...
【百度编辑器ueditor】工具，如何去掉百度编辑器 ueditor 元素路径、字数统计等...
去掉如下截图: 在百度编辑器 ueditor 根目录下: ueditor.config.js 文件中搜索并将参数elementPathEnabled设置成false即可常用功能开关如下: ,ele ...
textarea还剩余字数统计
<!DOCTYPE html> <html> <head> <meta charset="utf-8" /> <title&g ...

MapReduce之WordCount字数统计

第一次WordCount小游戏

在idea客户端上面进行WordCount统计

1:创建mapper类继承mapper(选hadoop类型)

1.2重写map方法 ctrl+o 选map

1.3 mapper步骤

2:创建reduce类继承reduce(选hadoop类型)

2.2重写reduce方法 ctrl+o 选reduce

2.3reduce 步骤

3:创建主类 driver

3.1 检查test.txt(未处理)

运行driver

运行结果

web查看并下载结果(ip:50070)

打开查看

在linux集群上面进行WordCount统计

1:修改driver里面的输入数据和输出数据路径

2:idea打jar包(跳过打jar的流程)

3:把jar包导入到Linux(导入的方式也有 N种)

4:把处理前的text文件传入hdfs

5: 通过 hadoop jar *******.jar /处理前文件路径input/text.txt /输出结果路径output

6:运行结果

7:web下载处理后文件

8:查看处理后文件

MapReduce之WordCount字数统计相关推荐

最新文章

热门文章