一、网易音乐数仓建设之路:

https://mp.weixin.qq.com/s/FIKCe6oV8NproiKYzis_6w

二、Streamsets是由Informatica前首席产品官Girish Pancha和Cloudera前开发团队负责人Arvind Prabhakar于2014年创立的公司,总部设在旧金山。streamsets产品是一个做大数据ETL的工具,支持包括结构化和半/非结构化数据源,拖拽式的可视化数据流程设计界面。而Streamsets旗下有如下三个产品: streamsets data collector(核心产品,开源):大数据ETL工具;streamsets data collector Edge(开源):将这个组件安装在物联网等设备上,占用少的内存和CPU;streamsets control hub(收费项目):可以将collector编辑好的pipeline放入control hub进行管理,可实现定时调度、管理和pipeline拓扑;
所以之后的介绍都会在streamsets data collector这个核心开源产品

https://blog.csdn.net/qq_39657909/article/details/107685907

三、实时数据湖:Flink CDC流式写入Hudi

https://mp.weixin.qq.com/s/JkCbvfJhdz9gT-Tw1pUBIA

四、Debezium-Flink-Hudi:实时流式CDC

Debezium是一个非常方便部署使用的CDC工具,可以有效地将RMSDB数据抽取到消息系统中,供不同的下游应用消费。而Flink直接对接Debezium与Hudi的功能,极大方便了数据湖场景下的实时数据ingestion。

https://mp.weixin.qq.com/s?__biz=MzIyMzQ0NjA0MQ==&mid=2247486157&idx=2&sn=eeb1c5f3bbeb32c99933b32152db49c9&chksm=e81f5fbbdf68d6adf91c2638cfc439353221c6a9b2730b6ed0ad69ec727c0912dfbc4b5f1cc3&scene=21#wechat_redirect

/** Licensed to the Apache Software Foundation (ASF) under one* or more contributor license agreements.  See the NOTICE file* distributed with this work for additional information* regarding copyright ownership.  The ASF licenses this file* to you under the Apache License, Version 2.0 (the* "License"); you may not use this file except in compliance* with the License.  You may obtain a copy of the License at**     http://www.apache.org/licenses/LICENSE-2.0** Unless required by applicable law or agreed to in writing, software* distributed under the License is distributed on an "AS IS" BASIS,* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.* See the License for the specific language governing permissions and* limitations under the License.*/package org.apache.flink.streaming.connectors.kafka.table;import org.apache.flink.api.common.restartstrategy.RestartStrategies;
import org.apache.flink.api.common.serialization.SerializationSchema;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase;
import org.apache.flink.streaming.connectors.kafka.KafkaTestBaseWithFlink;
import org.apache.flink.streaming.connectors.kafka.partitioner.FlinkFixedPartitioner;
import org.apache.flink.streaming.connectors.kafka.partitioner.FlinkKafkaPartitioner;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.TableResult;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
import org.apache.flink.table.planner.factories.TestValuesTableFactory;import org.junit.Before;
import org.junit.Test;import java.io.File;
import java.io.IOException;
import java.net.URL;
import java.nio.file.Files;
import java.nio.file.Path;
import java.time.Duration;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import java.util.Properties;import static org.junit.Assert.assertEquals;/** IT cases for Kafka with changelog format for Table API & SQL. */
public class KafkaChangelogTableITCase extends KafkaTestBaseWithFlink {protected StreamExecutionEnvironment env;protected StreamTableEnvironment tEnv;@Beforepublic void setup() {TestValuesTableFactory.clearAllData();env = StreamExecutionEnvironment.getExecutionEnvironment();tEnv =StreamTableEnvironment.create(env,EnvironmentSettings.newInstance()// Watermark is only supported in blink planner.useBlinkPlanner().inStreamingMode().build());env.getConfig().setRestartStrategy(RestartStrategies.noRestart());// we have to use single parallelism,// because we will count the messages in sink to terminate the jobenv.setParallelism(1);}@Testpublic void testKafkaDebeziumChangelogSource() throws Exception {final String topic = "changelog_topic";createTestTopic(topic, 1, 1);// enables MiniBatch processing to verify MiniBatch + FLIP-95, see FLINK-18769Configuration tableConf = tEnv.getConfig().getConfiguration();tableConf.setString("table.exec.mini-batch.enabled", "true");tableConf.setString("table.exec.mini-batch.allow-latency", "1s");tableConf.setString("table.exec.mini-batch.size", "5000");tableConf.setString("table.optimizer.agg-phase-strategy", "TWO_PHASE");// ---------- Write the Debezium json into Kafka -------------------List<String> lines = readLines("debezium-data-schema-exclude.txt");DataStreamSource<String> stream = env.fromCollection(lines);SerializationSchema<String> serSchema = new SimpleStringSchema();FlinkKafkaPartitioner<String> partitioner = new FlinkFixedPartitioner<>();// the producer must not produce duplicatesProperties producerProperties =FlinkKafkaProducerBase.getPropertiesFromBrokerList(brokerConnectionStrings);producerProperties.setProperty("retries", "0");producerProperties.putAll(secureProps);kafkaServer.produceIntoKafka(stream, topic, serSchema, producerProperties, partitioner);try {env.execute("Write sequence");} catch (Exception e) {throw new Exception("Failed to write debezium data to Kafka.", e);}// ---------- Produce an event time stream into Kafka -------------------String bootstraps = standardProps.getProperty("bootstrap.servers");String sourceDDL =String.format("CREATE TABLE debezium_source ("+ " id INT NOT NULL,"+ " name STRING,"+ " description STRING,"+ " weight DECIMAL(10,3)"+ ") WITH ("+ " 'connector' = 'kafka',"+ " 'topic' = '%s',"+ " 'properties.bootstrap.servers' = '%s',"+ " 'scan.startup.mode' = 'earliest-offset',"+ " 'format' = 'debezium-json'"+ ")",topic, bootstraps);String sinkDDL ="CREATE TABLE sink ("+ " name STRING,"+ " weightSum DECIMAL(10,3),"+ " PRIMARY KEY (name) NOT ENFORCED"+ ") WITH ("+ " 'connector' = 'values',"+ " 'sink-insert-only' = 'false'"+ ")";tEnv.executeSql(sourceDDL);tEnv.executeSql(sinkDDL);TableResult tableResult =tEnv.executeSql("INSERT INTO sink SELECT name, SUM(weight) FROM debezium_source GROUP BY name");// Debezium captures change data on the `products` table://// CREATE TABLE products (//  id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,//  name VARCHAR(255),//  description VARCHAR(512),//  weight FLOAT// );// ALTER TABLE products AUTO_INCREMENT = 101;//// INSERT INTO products// VALUES (default,"scooter","Small 2-wheel scooter",3.14),//        (default,"car battery","12V car battery",8.1),//        (default,"12-pack drill bits","12-pack of drill bits with sizes ranging from #40// to #3",0.8),//        (default,"hammer","12oz carpenter's hammer",0.75),//        (default,"hammer","14oz carpenter's hammer",0.875),//        (default,"hammer","16oz carpenter's hammer",1.0),//        (default,"rocks","box of assorted rocks",5.3),//        (default,"jacket","water resistent black wind breaker",0.1),//        (default,"spare tire","24 inch spare tire",22.2);// UPDATE products SET description='18oz carpenter hammer' WHERE id=106;// UPDATE products SET weight='5.1' WHERE id=107;// INSERT INTO products VALUES (default,"jacket","water resistent white wind breaker",0.2);// INSERT INTO products VALUES (default,"scooter","Big 2-wheel scooter ",5.18);// UPDATE products SET description='new water resistent white wind breaker', weight='0.5'// WHERE id=110;// UPDATE products SET weight='5.17' WHERE id=111;// DELETE FROM products WHERE id=111;//// > SELECT * FROM products;// +-----+--------------------+---------------------------------------------------------+--------+// | id  | name               | description                                             |// weight |// +-----+--------------------+---------------------------------------------------------+--------+// | 101 | scooter            | Small 2-wheel scooter                                   |// 3.14 |// | 102 | car battery        | 12V car battery                                         |// 8.1 |// | 103 | 12-pack drill bits | 12-pack of drill bits with sizes ranging from #40 to #3 |// 0.8 |// | 104 | hammer             | 12oz carpenter's hammer                                 |// 0.75 |// | 105 | hammer             | 14oz carpenter's hammer                                 |// 0.875 |// | 106 | hammer             | 18oz carpenter hammer                                   |//   1 |// | 107 | rocks              | box of assorted rocks                                   |// 5.1 |// | 108 | jacket             | water resistent black wind breaker                      |// 0.1 |// | 109 | spare tire         | 24 inch spare tire                                      |// 22.2 |// | 110 | jacket             | new water resistent white wind breaker                  |// 0.5 |// +-----+--------------------+---------------------------------------------------------+--------+List<String> expected =Arrays.asList("scooter,3.140","car battery,8.100","12-pack drill bits,0.800","hammer,2.625","rocks,5.100","jacket,0.600","spare tire,22.200");waitingExpectedResults("sink", expected, Duration.ofSeconds(10));// ------------- cleanup -------------------tableResult.getJobClient().get().cancel().get(); // stop the jobdeleteTestTopic(topic);}// --------------------------------------------------------------------------------------------// Utilities// --------------------------------------------------------------------------------------------private static List<String> readLines(String resource) throws IOException {final URL url = KafkaChangelogTableITCase.class.getClassLoader().getResource(resource);assert url != null;Path path = new File(url.getFile()).toPath();return Files.readAllLines(path);}private static void waitingExpectedResults(String sinkName, List<String> expected, Duration timeout) throws InterruptedException {long now = System.currentTimeMillis();long stop = now + timeout.toMillis();Collections.sort(expected);while (System.currentTimeMillis() < stop) {List<String> actual = TestValuesTableFactory.getResults(sinkName);Collections.sort(actual);if (expected.equals(actual)) {return;}Thread.sleep(100);}// timeout, assert againList<String> actual = TestValuesTableFactory.getResults(sinkName);Collections.sort(actual);assertEquals(expected, actual);}
}

五、FLINK API阿里云文档,FLINK SQL开发应用

https://help.aliyun.com/document_detail/98951.html?utm_content=g_1000230851&spm=5176.20966629.toubu.3.f2991ddcpxxvD1#title-h8j-e6f-5xk

Debezium、Flink、Hudi数仓、湖仓一体的文献搜集相关推荐

  1. 基于Apache Hudi构建智能湖仓实践(附亚马逊工程师代码)

    数据仓库的数据体系严格.治理容易,业务规模越大,ROI 越高:数据湖的数据种类丰富,治理困难,业务规模越大,ROI 越低,但胜在灵活. 现在,鱼和熊掌我都想要,应该怎么办?湖仓一体架构就在这种情况下, ...

  2. 基于Delta lake、Hudi格式的湖仓一体方案

    简介:Delta Lake 和 Hudi 是流行的开放格式的存储层,为数据湖同时提供流式和批处理的操作,这允许我们在数据湖上直接运行 BI 等应用,让数据分析师可以即时查询新的实时数据,从而对您的业务 ...

  3. 数智学习|湖仓一体实践与探索

    栏目语 数澜科技开设栏目「技术派+」,聚焦前沿技术,洞悉行业风向,分享来自一线的研发经验与应用实践. 本期专栏由数澜科技研发中心副总经理白松带来,分享湖仓一体实践与探索. 导语 随着社会数字化进程不断 ...

  4. Arctic开源!网易数帆×华泰证券,推动湖仓一体落地

    数字化转型趋势下,各行业对数据生产力的探索与追求逐步进入深水区.现实的问题是,企业数据仓库存储.数据湖多种技术并存的局面将长期存在,如何才能摆脱技术协同的内耗,让大数据直通生产力的彼岸? 8月11日下 ...

  5. 数据湖 VS 数据仓库之争?阿里提出大数据架构新概念:湖仓一体

    作者 |关涛.李睿博.孙莉莉.张良模.贾扬清(from 阿里云智能计算平台) 黄波.金玉梅.于茜.刘子正(from 新浪微博机器学习研发部) 编者按 随着近几年数据湖概念的兴起,业界对于数据仓库和数据 ...

  6. 湖仓一体:数据湖vs数据仓库之争?

    本文介绍数据仓库和数据湖的区别是什么,作者对其来龙去脉进行深入剖析,来阐述两者融合演进的新方向--湖仓一体. 导读:随着近几年数据湖概念的兴起,业界对于数据仓库和数据湖的对比甚至争论就一直不断.有人说 ...

  7. 数据湖与数据仓库的新未来:阿里提出湖仓一体架构

    点击上方 "zhisheng"关注, 星标或置顶一起成长 Flink 从入门到精通 系列文章 作者: 关涛.李睿博.孙莉莉.张良模.贾扬清 (from 阿里云智能计算平台) 黄波. ...

  8. 详解数据仓库数据湖及湖仓一体

    随着近几年数据湖概念的兴起,业界对于数据仓库和数据湖的对比甚至争论就一直不断.有人说数据湖是下一代大数据平台,各大云厂商也在纷纷的提出自己的数据湖解决方案,一些云数仓产品也增加了和数据湖联动的特性. ...

  9. 7000字,详解仓湖一体架构!

    全文共7110个字,建议阅读15分钟 在了解湖仓一体化之前,我们先来看一则有关数据仓库的有趣故事吧~ 沃尔玛拥有世界上最大的数据仓库系统,它利用数据挖掘方法对交易数据进行分析后发现"跟尿布一 ...

最新文章

  1. 透视宇宙:大约138亿年前,宇宙真的发生过大爆炸吗?
  2. Theano - 循环
  3. 关中断解决任务间资源共享问题
  4. 第十篇 数据类型总结
  5. 使用 Chrome Timeline 来优化页面性能
  6. Vue的基础认知二---vue的双向绑定/vue获取DOM节点
  7. ArcGIS js api 手动构建FeatureLayer
  8. html特殊符号怎么输入法,教您特殊符号怎么打出来
  9. 5G可以让万人演唱会中人人有网上?有它就行 1
  10. html图片水印的代码,简单实用的给图片加水印源代码
  11. 酸奶糖酸比的计算机控制,PAL-BX丨ACID F5 五种水果糖酸度计
  12. Xcode更新不了的解决办法
  13. 手机token记录、支付宝、个推、goeasy、手机前端框架、阿里大于、百度编辑器、秀米集成解决方案
  14. ES6——数组数据去重
  15. 武林c语言,听风一剑
  16. 使用case when,union all实现sql行转列、列转行
  17. 技能梳理9@RGB+WS2812+ESP32
  18. 手机吃鸡语音服务器异常错误,绝地求生游戏报错解决方法汇总
  19. Android中间人攻击测试工具(原创) – Lanmitm
  20. 智能相机与工业相机区别与特性分析

热门文章

  1. 划分亚洲国家的三个足球梯队
  2. 湖南2015C语言对口高考,C语言湖南对口高考月考.doc
  3. Spring和Security整合详解
  4. STM32F103+DHT11模块+DS18B20模块 显示实时温湿度并高温高湿报警
  5. 敏捷开发思想的终极应用:软件快速开发平台
  6. ros 双wan配置_用户太多,网速拖后腿,双宽带路由器轻松解决
  7. 选择图片横向拼接html,一些比较常见的几种组合多张图片的方式
  8. SAP 销售订单状态
  9. 接入微信登录---(由于微信的官方文档写的有点散,特此记录一下)
  10. 【JavaSE】数据类型和运算符