mapreduce剖析气象站平均气温

一、气象数据

按行并以 ASCII 格式存储，每一行是一条记录。下图展示了一行采样数据。

1998            #year
03          #month
09          #day
17          #hour
11           #temperature
-100            #dew
10237           #pressure
60          #wind_direction
72          #wind_speed
0           #sky_condition
0           #rain_1h
-9999           #rain_6h

将每个气象站的数据文件拼接成一个单独的文件更容易处理，预处理过的数据文件示例如下所示：

1985 07 31 02   200    94 10137   220    26     1     0 -9999
1985 07 31 03   172    94 10142   240     0     0     0 -9999
1985 07 31 04   156    83 10148   260    10     0     0 -9999
1985 07 31 05   133    78 -9999   250     0 -9999     0 -9999
1985 07 31 06   122    72 -9999    90     0 -9999     0     0
1985 07 31 07   117    67 -9999    60     0 -9999     0 -9999
1985 07 31 08   111    61 -9999    90     0 -9999     0 -9999
1985 07 31 09   111    61 -9999    60     5 -9999     0 -9999
1985 07 31 10   106    67 -9999    80     0 -9999     0 -9999
1985 07 31 11   100    56 -9999    50     5 -9999     0 -9999

二、数据导入hdfs

使用myeclipse链接集群hdfs，将数据导入hdfs系统的/home/hadoop/weather目录下（自行创建一个存放数据的目录）。

命令行访问

三、思路分析

1.map 阶段输入原始数据。选择文本格式作为输入格式，将数据集的每一行作为文本输入。提取气象站和气温信息，并将它们作为输出。

2.reduce阶段接收map 函数的输出结果：每个气象站后面紧跟着一系列气温数据，使用reduce 函数遍历整个列表并统计出平均气温。

四、代码

1.编写 Mapper 类，实现 map() 方法，提取气象站和气温数据。

/** 定义一个Mapper类实现map（）方法解析气象站数据* input key=偏移量，input value =每行数据* output key=weatherStationID， output value =temperature */public class TempratureMapper extends Mapper<LongWritable,Text,Text,IntWritable>{public void map(LongWritable key,Text value,Context context) throws IOException, InterruptedException{ //将每行数据转换成string类型String line = value.toString();//提取气温值int temperature = Integer.parseInt(line.substring(14,19).trim());//过滤无效数据if(temperature != -9999){//获取分片FileSplit filesplit = (FileSplit) context.getInputSplit();//获取气象站编号String weatherStationID  = filesplit.getPath().getName().substring(5, 10);//输出处理好的数据context.write(new Text(weatherStationID),new IntWritable(temperature));}}

2.编写reduce函数，统计平均气温

/** 定义一个reducer类实现reduce()方法统计各个气象站的平均气温* input key=weatherStationID，input value =temperature* output key=weatherStationID， output value =average(temperature) */public class TemperatureReducer extends Reducer<Text,IntWritable,Text,IntWritable>{//自定义成员变量result用以保存输出结果private IntWritable result = new IntWritable();public void reduce(Text key,Iterable< IntWritable> values,Context context) throws IOException, InterruptedException{//统计相同气象站的所有气温值int sum = 0;int count = 0;for(IntWritable i : values){//对所有气温值累加sum += i.get();//统计集合大小count++;}//求同一个气象站的平均气温result.set(sum/count);context.write(key, result);}}

ps：Iterable：迭代器，就是专门处理集合元素的对象，例如删除和获取集合中的元素。但是该对象比较特殊，不能直接创建对象（通过new），该对象是以内部类的形式存在于每个集合类的内部。

3.编写函数，运行 MapReduce 作业。

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
/*** * @function 统计美国各个气象站30年来的平均气温* @author 小讲**/
public class Temperature extends Configured implements Tool {/*** @function 任务驱动方法* @param args* @return* @throws Exception*/@Overridepublic int run(String[] args) throws Exception {// TODO Auto-generated method stubConfiguration conf = new Configuration();//读取配置文件Path mypath = new Path(args[1]);FileSystem hdfs = mypath.getFileSystem(conf);if (hdfs.isDirectory(mypath)) {hdfs.delete(mypath, true);}Job job = new Job(conf, "temperature");//新建一个任务job.setJarByClass(Temperature.class);// 设置主类FileInputFormat.addInputPath(job, new Path(args[0]));// 输入路径FileOutputFormat.setOutputPath(job, new Path(args[1]));// 输出路径job.setMapperClass(TemperatureMapper.class);// Mapperjob.setReducerClass(TemperatureReducer.class);// Reducerjob.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);     return job.waitForCompletion(true)?0:1;//提交任务}/*** @function main 方法* @param args* @throws Exception*/public static void main(String[] args) throws Exception {//数据输入路径和输出路径String[] args0 = {"hdfs://pc1:9000/home/hadoop/weather/","hdfs://pc1:9000/home/hadoop/weather/out/"};int ec = ToolRunner.run(new Configuration(), new Temperature(), args0);System.exit(ec);}

4.对代码进行单元测试及debug调试。

编写mapper测试方法

public class TemperatureMapperTest {private Mapper mapper;//定义一个Mapper对象private MapDriver driver;//定义一个MapDriver 对象@Beforepublic void init() {mapper = new Temperature.TemperatureMapper();//实例化一个Temperature中的TemperatureMapper对象driver = new MapDriver(mapper);//实例化MapDriver对象}@Testpublic void test() throws IOException {//输入一行测试数据String line = "1985 07 31 02   200    94 10137   220    26     1     0 -9999";driver.withInput(new LongWritable(), new Text(line))//跟TemperatureMapper输入类型一致.withOutput(new Text("weatherStationId"), new IntWritable(200))//跟TemperatureMapper输出类型一致.runTest();}
}

鼠标放在 TemperatureMapperTest 类上右击，选择 Run As ——> JUnit test，运行结果如下图所示:

左边的对话框里显示”Runs：1/1，Errors：0，Failures：0”，说明 Mapper 测试成功了。

编写reduce测试方法

public class TemperatureReduceTest {private Reducer reducer;//定义一个Reducer对象 private ReduceDriver driver;//定义一个ReduceDriver对象@Beforepublic void init() {reducer = new Temperature.TemperatureReducer();//实例化一个Temperature中的TemperatureReducer对象driver = new ReduceDriver(reducer);//实例化ReduceDriver对象}@Testpublic void test() throws IOException {String key = "weatherStationId";//声明一个key值List values = new ArrayList();values.add(new IntWritable(200));//添加第一个value值values.add(new IntWritable(100));//添加第二个value值driver.withInput(new Text("weatherStationId"), values)//跟TemperatureReducer输入类型一致.withOutput(new Text("weatherStationId"), new IntWritable(150))//跟TemperatureReducer输出类型一致.runTest();}
}

右键 Temperature 项目，选择”Debug As” ——> “java Application”，在程序中打上断点后直接进入调试模式，如下图所示。

五、运行程序

1.在myeclipse上运行程序

2.将 MapReduce作业打成jar包即temperature.jar，然后然后上传至/home/hadoop/ 目录下，由 hadoop 脚本来执行。

保存为Temp.jar，通过rz命令将Temp.jar上传到、home/hadoop/app/hadoop目录下。执行命令：

[hadoop@pc1 hadoop]$ bin/hadoop jar Temp.jar com.pc1.hadoop.test.Temperature home/hadoop/weather/ home/hadoop/weather/out/

最后在 out目录查看结果，与本地结果一致。

mapreduce剖析气象站平均气温相关推荐

spark分析某国气象站平均气温实例
一.数据集分析数据文件按照气象站和日期进行组织,每个气象站都是一个总目录,而且每个气象站下面从 1980 年到 2010 年,每一年又都作为一个子目录. 因为某国有成千上万个气象站,所以整个数据集由 ...
mapreduce实战：统计美国各个气象站30年来的平均气温项目分析（MapReduce处理多文件数据）
气象数据集我们要写一个气象数据挖掘的程序.气象数据是通过分布在美国各地区的很多气象传感器每隔一小时进行收集,这些数据是半结构化数据且是按照记录方式存储的,因此非常适合使用 MapReduce 程序来 ...
mapreduce实战：统计美国各个气象站30年来的平均气温项目分析
气象数据集我们要写一个气象数据挖掘的程序.气象数据是通过分布在美国各地区的很多气象传感器每隔一小时进行收集,这些数据是半结构化数据且是按照记录方式存储的,因此非常适合使用 MapReduce 程序来 ...
各纬度气候分布图_读我国一月平均气温分布图，寻找我国冬季气温最高和最低的地方...
地图是地理学的语言,地图能够科学地反映出自然和社会经济现象的分布特征及其相互关系,所以我们阅读地图也能获取许多的地理信息.之前我们已经通过读中国年平均气温分布图.世界年降水量分布图.世界年平均气温分布 ...
各纬度气候分布图_读中国年平均气温分布图，寻找中国全年平均气温最高和最低的地方...
自然地理环境包括地形.气候.土壤.植被和水文等要素,一个地区的气候又是由气温.降水.光照等要素来组成的.在分析一个地区的气温特征时,我们主要从年均温和气温年较差两个角度来入手,影响气温高低的因素有很多 ...
利用Excel VBA批量计算气象数据多个台站多年来春季和冬季降水量和平均气温
气象数据是地理数据的重要组成部分,存储量虽然不大,但是处理过程非常繁琐,长时序数据更不用说.本文总结了一个气象数据的基本处理方法. 如下图所示,气象数据的排列格式是区站号→年→月→降水量→平均气温,时 ...
java mapreduce程序_简单的java Hadoop MapReduce程序(计算平均成绩)从打包到提交及运行...
[TOC] 简单的java Hadoop MapReduce程序(计算平均成绩)从打包到提交及运行程序源码 import java.io.IOException; import java.util. ...
运用自回归滑动平均模型、灰色预测模型、BP神经网络三种模型分别预测全球平均气温，并进行预测精度对比（附代码、数据）
大家好,我是带我去滑雪,每天教你一个小技巧!全球变暖是近十年来,人们关注度最高的话题.2022年夏天,蔓延全球40℃以上的极端天气不断刷新人们对于高温的认知,人们再也不会像从前那样认为全球变暖离我们遥 ...
数据分享｜中国各省、各市、各区县分年、分月、逐日平均气温数据（2000年～2019年）
今天分享给大家的是 2000 年-2019 年中国各省.各市.各县的分年.分月.逐日的平均气温数据(单位:摄氏度) 原始数据来源于国家气象科学数据共享服务平台-中国地面气候资料日值数据集(V3.0), ...

mapreduce剖析气象站平均气温

mapreduce剖析气象站平均气温相关推荐

最新文章

热门文章