大数据 流式计算 apache storm 学习笔记 01 ---汪文君
汪文君Apache Storm1.2.2实战
9个小时
https://www.bilibili.com/video/BV1et41147xq?p=1
P1汪文君Apache Storm1.2.2实战-01讲-Apache Storm的介绍 20:53
P2汪文君Apache Storm1.2.2实战-02讲-Apache Storm关键组件的详细讲解 14:42
P3汪文君Apache Storm1.2.2实战-03讲-Apache Storm集群环境搭建详解 33:09
P4汪文君Apache Storm1.2.2实战-04讲-第一个Topology程序开发本地模式 23:53
P5汪文君Apache Storm1.2.2实战-05讲-第一个Topology程序开发集群模式 23:26
P6汪文君Apache Storm1.2.2实战-06讲-Storm并行度核心概念讲解 42:54
P7汪文君Apache Storm1.2.2实战-07讲-通过程序运行分析Topology的并行度-上 29:56
P8汪文君Apache Storm1.2.2实战-08讲-通过程序运行分析Topology的并行度-下 17:39
P9汪文君Apache Storm1.2.2实战-09讲-Storm Topology的Rebalance 10:41
P10汪文君Apache Storm1.2.2实战-10讲-分析Storm Topology并行度内容补充 11:26
P11汪文君Apache Storm1.2.2实战-11讲-shuffle grouping数据分组详解 25:46
P12汪文君Apache Storm1.2.2实战-12讲-shuffle grouping数据分组详解(补充) 05:49
P13汪文君Apache Storm1.2.2实战-13讲-fields grouping数据分组详解(戴耳机听) 23:29
P14汪文君Apache Storm1.2.2实战-14讲-all grouping讲解前的分析过程 20:26
P15汪文君Apache Storm1.2.2实战-15讲-all grouping详细讲解 14:16
P16汪文君Apache Storm1.2.2实战-16讲-global grouping详细讲解 25:51
P17汪文君Apache Storm1.2.2实战-17讲-direct grouping详细讲解 26:17
P18汪文君Apache Storm1.2.2实战-18讲-none grouping& localOrShuffe grouping 04:52
P19汪文君Apache Storm1.2.2实战-19讲-如何自定义storm grouping详解 33:33
P20汪文君Apache Storm1.2.2实战-20讲-综合案例之电信号码主叫被叫实时统计 24:01
P21汪文君Apache Storm1.2.2实战-21讲-综合案例之Word Count 25:05
P22汪文君Apache Storm1.2.2实战-22讲-Storm担保数据被处理ack,failed,timeout,exeption讲解(戴耳机听) 26:53
P23汪文君Apache Storm1.2.2实战-23讲-Storm Fully担保数据被处理的案例分析 15:38
P24汪文君Apache Storm1.2.2实战-24讲-一些担保数据被处理的方案讲解 16:30
P25汪文君Apache Storm1.2.2实战-25讲-高级Tuple的使用(Tick Tuple) 16:42
指令
-----启动nimbus
storm nimbus &
--- 修改 日志级别 ------方便查看日志 比如将日志从 info 降级为 warn
[root@storm51 log4j2]# vim /usr/local/storm/log4j2/worker.xml
---- 查看日志记录
[root@storm51 logs]# tail -f 200 /usr/local/storm/logs/nimbus.log
--- 查看 topology 的日志
[root@storm53 workers-artifacts]# pwd
/usr/local/storm/logs/workers-artifacts
[root@storm53 workers-artifacts]# ll
total 0
drwxr-xr-x. 3 root root 18 Apr 10 02:13 tp1-1-1618046001
drwxr-xr-x. 3 root root 18 Apr 10 02:51 tp2-2-1618048284
drwxr-xr-x. 3 root root 18 Apr 10 03:51 tp6-7-1618051908
P1汪文君Apache Storm1.2.2实战-01讲-Apache Storm的介绍 20:53
项目 | Value |
---|---|
1 | Java8 In Action |
1 | Apache Flume |
1 | PowerMock |
1 | Concordion |
1 | Mockito |
1 | Apache Sqoop |
1 | Java Concurrency |
1 | Google Guava |
1 | Scala In Action |
1 | Apache Kafka 0.11.x |
1 | Metrics |
1 | JMH |
1 |
官网: https://storm.apache.org/
流式计算产品 Streaming Compute System
项目 | Value |
---|---|
1 | Apache Spark Streaming |
1 | Apache Flink |
1 | Apache Kafka Streaming |
1 | Apache Storm |
1 | … … |
课程大纲
P2汪文君Apache Storm1.2.2实战-02讲-Apache Storm关键组件的详细讲解 14:42
A Storm cluster follows a master-slave model
1.Master主节点-------Nimbus
The Nimbus node is the master in a Storm cluster.
The Nimbus is stateless and stores all of its data in Zookeeper.There is a single Nimbus node in a Storm cluster.
2.Supervisor Nodes 工作节点
3.Tuples --The Storm data model
4.Storm Topology
5.Spout&Stream&Bolt
5.1 Spout: A spout is the source of tuples in a Storm Topology
5.2 Bolt: A bolt is the processing powerhouse of a Storm topology , and is responsible for transforming a stream
5.3 Stream: The key abstraction in Storm is tha of a stream.
P3汪文君Apache Storm1.2.2实战-03讲-Apache Storm集群环境搭建详解 33:09![](/assets/blank.gif)
视频作者用时30分钟讲完,leo用时4个小时搭建完成(20210405),作者用1.2.2 ,Leo用 1.2.3
linux centos 7.6 安装 Apache Storm1.2.3
https://blog.csdn.net/wei198621/article/details/115449855
P4汪文君Apache Storm1.2.2实战-04讲-第一个Topology程序开发本地模式 23:53
spout 生成随机数 交个 bolt 显示
****outputCollector 用于发送数据
step1. RandomStringSpout.java ------------- SpoutOutputCollector.emit(new Tuple(...)); 将数据通过collector 以tuple 格式 发送 ------------- OutputFieldsDeclarer.declare(new Fields("stream")); 命名发送的tuple 为stream
step2. WrapStarBolt.java Tuple.getStringByField("stream"); 获取spout 发过来的 名为 stream 的 tuple
step3. WrapWellBolt.java Tuple.getStringByField("stream"); 获取spout 发过来的 名为 stream 的 tuple step4. RandomStringTopologyLocal.java
4.1 topoBuilder= new TopologyBuilder() 将 1 2 3 步骤的 spout Bolt 赋值给此 TopologyBuilder
4.2 cluster = new LocalCluster() 定义cluster
4.3 conf = new Config() 定义cluster 的配置
4.4 cluster.submitTopology( *, conf, topoBuilder); cluster 提交 TopologyBuilder
P5汪文君Apache Storm1.2.2实战-05讲-第一个Topology程序开发集群模式 23:26
启动zookeeper storm 集群
[root@centos7-7 bin]# ll
total 24
-rwxr--r--. 1 root root 466 Apr 8 06:06 ctlkafkaauto.sh
-rwxr--r--. 1 root root 133 Apr 8 09:23 ctlstorm.sh
-rwxr--r--. 1 root root 394 Apr 8 06:06 ctlzookeeperauto.sh
lrwxrwxrwx. 1 root root 36 Apr 8 06:09 jps -> /usr/local/java/jdk1.8.0_251/bin/jps
-rwxr--r--. 1 root root 154 Apr 8 06:06 xcallkafka.sh
-rwxr--r--. 1 root root 153 Apr 8 08:31 xcallstorm.sh
-rwxr--r--. 1 root root 145 Apr 8 06:06 xcallzk.sh
[root@centos7-7 bin]# ctlzookeeperauto.sh start
... 暂时还不可以批量执行 需要手动执行
[root@centos7-7 bin]# cat ctlstorm.sh
ssh storm51 "storm ui 1>/dev/null 2>&1"
ssh storm51 "storm nimbus 1>/dev/null 2>&1"
ssh storm52 "storm supervisor 1>/dev/null 2>&1"
ssh storm53 "storm supervisor 1>/dev/null 2>&1"
[root@centos7-7 bin]# xcallzk.sh jps
============= zk1 jps =============
9360 QuorumPeerMain
29191 Jps
============= zk2 jps =============
29269 Jps
9384 QuorumPeerMain
============= zk3 jps =============
9181 QuorumPeerMain
29085 Jps[root@centos7-7 bin]# xcallstorm.sh jps
============= storm51 jps =============
12126 core
13054 nimbus
17663 Jps
============= storm52 jps =============
30615 Supervisor
31870 Jps
============= storm53 jps =============
30256 worker
27252 Supervisor
31726 Jps
启动 storm ui
http://192.168.121.51:8080/index.html
启动脚本
[root@storm51 ~]# storm jar storm-1.0-SNAPSHOT-jar-with-dependencies.jar com.tiza.leo.bigdata.storm.test01Random.RandomStringTopologyRemote
2340 [main] WARN o.a.s.u.Utils - STORM-VERSION new 1.2.3 old 1.2.3
3033 [main] INFO o.a.s.StormSubmitter - Finished submitting topology: RandomStringTopologyRemote
[root@storm51 ~]#
查看topology是否启动脚本
[root@storm51 ~]# storm list
Running: .../usr/local/apache-storm-1.2.3/bin org.apache.storm.command.list
5402 [main] INFO o.a.s.u.NimbusClient - Found leader nimbus : master:6627
Topology_name Status Num_tasks Num_workers Uptime_secs
-------------------------------------------------------------------
RandomStringTopologyRemote ACTIVE 12 3 98
[root@storm51 ~]#
此时查看jps运行进程
[root@centos7-7 bin]# xcallstorm.sh jps
============= storm51 jps =============
12126 core
13054 nimbus
17663 Jps
============= storm52 jps =============
30615 Supervisor
31870 Jps
============= storm53 jps =============
30256 worker
27252 Supervisor
30214 worker
30215 worker
30185 LogWriter
30186 LogWriter
30221 LogWriter
31726 Jps
敦化 暂停 一个 topology
storm deactivate RandomStringTopologyRemote
P6汪文君Apache Storm1.2.2实战-06讲-Storm并行度核心概念讲解 42:54![](/assets/blank.gif)
![](/assets/blank.gif)
![](/assets/blank.gif)
![](/assets/blank.gif)
更改日志节点为warn ,fang变查看
[root@storm52 log4j2]# vim /usr/local/storm/log4j2/worker.xml
[root@storm53 log4j2]# vim /usr/local/storm/log4j2/worker.xml
P7汪文君Apache Storm1.2.2实战-07讲-通过程序运行分析Topology的并行度-上 29:56
到 zhan 笔记本上面 用脚本启动 zookeerper storm
http://192.168.121.51:8080/index.html 确认一切OK ,
编译上一步的文件 ,给到 storm 主节点 storm 51
测试1
[root@storm51 ~]# storm jar storm-1.0-SNAPSHOT-jar-with-dependencies.jar com.tiza.leo.bigdata.storm.test02Parallel.SimpleTopology
------ 报错,需要参数
====================================================================================[root@storm51 ~]# storm jar storm-1.0-SNAPSHOT-jar-with-dependencies.jar com.tiza.leo.bigdata.storm.test02Parallel.SimpleTopology tp1 tp1 1 2 1 2 1
{topologyName='tp1', prefix='tp1', workers=1, spoutParallelHint=2, spoutTasks=1, boltParallelHint=2, boltTasks=1}-------------------------------------------------------------------------------
结论 并行度(parallelHint) Num executors = 1worker + 1spoutTask + 1boltTask =3
====================================================================================
测试2
[root@storm51 ~]# storm jar storm-1.0-SNAPSHOT-jar-with-dependencies.jar com.tiza.leo.bigdata.storm.test02Parallel.SimpleTopology tp2 tp2 2 2 1 2 1
{topologyName='tp2', prefix='tp2', workers=2, spoutParallelHint=2, spoutTasks=1, boltParallelHint=2, boltTasks=1}-------------------------------------------------------------------------------
结论 并行度(parallelHint) Num executors = 2 worker + 1spoutTask + 1boltTask =4
测试3
====================================================================================
[root@storm51 ~]# storm jar storm-1.0-SNAPSHOT-jar-with-dependencies.jar com.tiza.leo.bigdata.storm.test02Parallel.SimpleTopology tp3 tp3 1 2 2 2 2
{topologyName='tp3', prefix='tp3', workers=1, spoutParallelHint=2, spoutTasks=2, boltParallelHint=2, boltTasks=2}
-------------------------------------------------------------------------------
结论 并行度(parallelHint) Num executors = 1 worker + 2 spoutTask + 2 boltTask =5
测试4
====================================================================================
[root@storm51 ~]# storm jar storm-1.0-SNAPSHOT-jar-with-dependencies.jar com.tiza.leo.bigdata.storm.test02Parallel.SimpleTopology tp4 tp4 3 3 1 3 2
{topologyName='tp4', prefix='tp4', workers=3, spoutParallelHint=3, spoutTasks=1, boltParallelHint=3, boltTasks=2}
-------------------------------------------------------------------------------
推测结论 并行度(parallelHint) Num executors = 3 worker + 1 spoutTask + 2 boltTask =6
实际结论 并行度(parallelHint) Num executors = 3 worker + 2 spoutTask + 2 boltTask =7 why ? why ? why ?
2021 10 30 ,原因是 task 数据量=(worker+ + boltTask*2 )
测试5
====================================================================================
[root@storm51 ~]# storm jar storm-1.0-SNAPSHOT-jar-with-dependencies.jar com.tiza.leo.bigdata.storm.test02Parallel.SimpleTopology tp5 tp5 3 2 2 2 2
{topologyName='tp5', prefix='tp5', workers=3, spoutParallelHint=2, spoutTasks=2, boltParallelHint=2, boltTasks=2}
-------------------------------------------------------------------------------
结论 并行度(parallelHint) Num executors = 3 worker + 2 spoutTask + 2 boltTask =7
测试6
====================================================================================
[root@storm51 ~]# storm jar storm-1.0-SNAPSHOT-jar-with-dependencies.jar com.tiza.leo.bigdata.storm.test02Parallel.SimpleTopology tp6 tp6 1 2 4 2 4
{topologyName='tp6', prefix='tp6', workers=1, spoutParallelHint=2, spoutTasks=4, boltParallelHint=2, boltTasks=4}
-------------------------------------------------------------------------------
结论 并行度(parallelHint) Num executors = 1 worker + 4 spoutTask + 4 boltTask =9
测试7
====================================================================================
[root@storm51 ~]# storm jar storm-1.0-SNAPSHOT-jar-with-dependencies.jar com.tiza.leo.bigdata.storm.test02Parallel.SimpleTopology tp7 tp7 1 1 4 1 4
{topologyName='tp7', prefix='tp7', workers=1, spoutParallelHint=1, spoutTasks=4, boltParallelHint=1, boltTasks=4}
-------------------------------------------------------------------------------
结论 并行度(parallelHint) Num executors = 1 worker + 4 spoutTask + 4 boltTask =9
之前的参数是 -n 1 -e tp7-SimpleSpout=4 -e tp7-SimpleBolt=4
修改为
storm reblance tp7 -n 2 -e tp7-SimpleSpout=8 -e tp7-SimpleBolt=8
----------------------------topo名称-------------2个;----spout 4个 -----bolt 4 个 -----
bin/storm rebalance SampleStormClusterTopology -n 2 -e SampleSpout=4 -e SapmleBolt=4 [root@storm51 ~]# storm rebalance tp7 -n 2 -e tp7-SimpleSpout=8 -e tp7-SimpleBolt=8
P8汪文君Apache Storm1.2.2实战-08讲-通过程序运行分析Topology的并行度-下 17:39
见上
P9汪文君Apache Storm1.2.2实战-09讲-Storm Topology的Rebalance 10:41
见上
P10汪文君Apache Storm1.2.2实战-10讲-分析Storm Topology并行度内容补充 11:26
P11汪文君Apache Storm1.2.2实战-11讲-shuffle grouping数据分组详解 25:46
P12汪文君Apache Storm1.2.2实战-12讲-shuffle grouping数据分组详解(补充) 05:49
P13汪文君Apache Storm1.2.2实战-13讲-fields grouping数据分组详解(戴耳机听) 23:29
P14汪文君Apache Storm1.2.2实战-14讲-all grouping讲解前的分析过程 20:26
P15汪文君Apache Storm1.2.2实战-15讲-all grouping详细讲解 14:16
P16汪文君Apache Storm1.2.2实战-16讲-global grouping详细讲解 25:51
P17汪文君Apache Storm1.2.2实战-17讲-direct grouping详细讲解 26:17
P18汪文君Apache Storm1.2.2实战-18讲-none grouping& localOrShuffe grouping 04:52
P19汪文君Apache Storm1.2.2实战-19讲-如何自定义storm grouping详解 33:33
P20汪文君Apache Storm1.2.2实战-20讲-综合案例之电信号码主叫被叫实时统计 24:01
P21汪文君Apache Storm1.2.2实战-21讲-综合案例之Word Count 25:05
P22汪文君Apache Storm1.2.2实战-22讲-Storm担保数据被处理ack,failed,timeout,exeption讲解(戴耳机听) 26:53
P23汪文君Apache Storm1.2.2实战-23讲-Storm Fully担保数据被处理的案例分析 15:38
P24汪文君Apache Storm1.2.2实战-24讲-一些担保数据被处理的方案讲解 16:30
P25汪文君Apache Storm1.2.2实战-25讲-高级Tuple的使用(Tick Tuple) 16:42
大数据 流式计算 apache storm 学习笔记 01 ---汪文君相关推荐
- 分布式流式计算框架Storm
Storm用于实时处理,就好比 Hadoop 用于批处理. --> 离线计算:批量获取数据,批量传输数据,周期性比量计算数据,数据展示(Sqoop-->HDFS--> ...
- flink大数据处理流式计算详解
flink大数据处理 文章目录 flink大数据处理 二.WebUI可视化界面(测试用) 三.Flink部署 3.1 JobManager 3.2 TaskManager 3.3 并行度的调整配置 3 ...
- 流式计算利器-Storm
流计算的出现拓宽了应对复杂实时计算需求能力.Storm作为流计算的利器,极大方便了应用. 一.静态数据和流数据 静态数据:为了支持决策分析而构建的数据仓库系统,其中存放的大量历史数据就是静态数据. 流 ...
- 图解大数据 | 流式数据处理-Spark Streaming
作者:韩信子@ShowMeAI 教程地址:http://www.showmeai.tech/tutorials/84 本文地址:http://www.showmeai.tech/article-det ...
- 流式计算框架Storm 编程案例部署Linux结果演示及pom依赖
使用maven方式创建storm项目: <?xml version="1.0" encoding="UTF-8"?> <project xml ...
- 流式计算框架Storm编程案例:实时给手机品牌转大写并加上时间戳后缀代码示例
导入jar包,保险起见,直接从storm安装目录拷贝,maven方式可能会因版本问题出现纰漏. 结果演示:
- 流式计算框架Storm网站访问来源实时统计及存储到redis代码示例
- 流式计算框架Storm后台启动命令(避免新开窗口)
- 《Hadoop 权威指南 - 大数据的存储与分析》学习笔记
第一章 初识Hadoop 1.2 数据的存储与分析 对多个硬盘中的数据并行进行读/写数据,有以下两个重要问题: 硬件故障问题.解决方案:复制(replication),系统保存数据的副本(replic ...
- 大数据之Flink流式计算引擎
Flink DataFlow 数据的分类 有界数据 离线的计算 效率:硬件相同的情况下:时间 无界数据 实时的计算 效率:硬件环境相同的情况下,吞吐量:数据处理的数量.延迟:结果输出的时间-数据接收的 ...
最新文章
- CSS常见布局解决方案
- android ext3 格式化,怎样将TF卡格式化为EXT分区?
- 图像处理理论(七)——LBP, Fisherface, Viola-Jones
- 计算机网络 实验教案,《计算机网络》实验教案.pdf
- php文件上传 github,PHP的cURL文件上传
- java调用 solr集群_Solr集群安装Version5.5.2(cloud模式)
- lstm原始论文_有序的神经元——ON-LSTM模型浅析
- 【leetcode】590. N-ary Tree Postorder Traversal
- 小蒜的A+B 计蒜客 - T1283
- TokuDB存储引擎
- IntelliJ IDEA提示忽略大小写
- Redhat5 安装oracle10g 启动测试
- 图虫知识共享协议_缘之好物 篇二十:关怀父母的另类方案----新礼物:2019图虫影像历...
- python异常和错误的区别_python中错误和异常有什么区别
- Linux下部署wowza流媒体HA文档
- 如何将OGG文件转换成MP3?
- Unity3DAI行为------路径跟随
- 常见基准电压芯片有哪些
- 计算机键盘音乐 卡农,巴赫键盘音乐
- 喂,恶臭青年,你还想继续单身?今天特别福利来袭,出来挨打!