(2条消息) 【开发日志】2022.09.02 ZENO----Audio----Beat detection algorithm----Combine Wav&Mp3_minimp3 和 ffmpeg_EndlessDaydream的博客-CSDN博客https://blog.csdn.net/Angelloveyatou/article/details/126670613

4 音频检测算法设计

4.1 节拍检测算法

4.1.1 节拍检测算法

要实现节拍检测算法，我们首先需要计算所选频率范围内的声音能量。我们可以通过使用FFT分析并将所选范围内频率箱的平方幅度相加来做到这一点。然后，我们计算当前播放位置之前一段时间（例如，几秒钟）的平均能量。

获得平均能量后，将其与所选频率范围内声音的当前能量进行比较。如果两种能量之间的差异超过一定的阈值，我们可以得出结论，有一个节拍。可以调整阈值以控制节拍检测的灵敏度。

为了实时实现算法，我们需要维护先前能量值的缓冲区，并在每次计算能量时更新它。我们可以使用圆形缓冲区来存储能量值，并使用指针来跟踪缓冲区中的当前位置。

节拍检测算法并不完美，可能会错过某些节拍或检测误报。但是，它可以很好地近似歌曲的节奏，并可用于同步视觉效果或触发游戏或交互式应用程序中的事件。

节拍检测算法是用于分析音频信号以确定其节奏或节拍的算法。以下是一些常见的节拍检测算法：

1.自相关函数法：该方法通过计算信号的自相关函数来检测节拍。信号的自相关函数将显示信号与其自身在时间上的延迟之间的相关性。当信号具有重复的模式时，自相关函数将具有明显的峰值，这些峰值对应于信号的节拍。

2.峰值检测法：该方法通过寻找信号中的峰值来检测节拍。通常，这些峰值与信号的强度或能量相关。在一段时间内检测到的峰值数量将与该时间段内的节拍数量相匹配。

3.快速傅里叶变换法：该方法通过对信号进行快速傅里叶变换（FFT）来检测节拍。 FFT将信号转换为频率域，其中可以检测到频率和强度。可以根据频率域中的能量峰值来确定信号的节拍。

4.基于模型的方法：该方法使用基于时间的模型来检测节拍。模型将信号表示为一系列时序事件，并使用模型来识别节拍模式。

这些算法可以单独或组合使用，以获得更准确的节拍检测结果。

本系统使用基于声能的简单统计模型计算。

4.1.2 基于声能的简单统计模型检测节拍

本系统是基于声能的简单统计模型实现简单的节拍检测算法，基本思想是利用音频数据的能量变化来检测节奏。计算当前播放前几秒钟声音的平均能量，并将其与声音的当前能量进行比较，如果能量差超过某个阈值，可以说有一个节拍。

使用 1024 个样本的窗口大小和 44100 Hz 的采样率，我们需要一个 44100/1024 = 43 个元素的缓冲区来存储 1 秒的历史记录。此样本的值可以从FFT分析中获得。

我们将分析集中在频谱的第一小节中，这样做的原因是检查声音的较低频率以捕捉电池的踢鼓和军鼓的使用，电池是跟踪歌曲节奏的最常用乐器之一。在我们的实验中，我们将采用 60hz-130hz 的低音范围，我们将在其中找到底鼓，以及中低音 301hz-750hz，在那里可以找到军鼓声音。中低音包含大多数乐器的低次谐波，通常被视为低音存在范围。

因此，我们需要获取此范围内的声音信息，并获取FFT结果的相应元素。要获得FFT结果中每个元素的频率，我们只需要计算频率分割（44100/1024 = 43）并将其乘以数据数组的索引。所以第一个组件存储范围 0-43Hz 的结果，第二个组件存储 43-86Hz，第三个 86-129Hz 的结果......

算法

假设 k 和 k+n 是实际处理范围的极限，FFT[i] 是 i 位置的频率幅度。我们可以计算范围的当前能量为

我们需要将此值与接下来的 42 个样本一起存储，以获得 1 秒的历史记录（H）。

现在可以使用此历史记录计算波段的平均值

通常，超过平均值加其一半的值是检测节拍的良好阈值。但是我们可以使用历史值的方差来调整这个因子。在像硬摇滚或摇滚乐这样非常嘈杂的音乐中，节拍检测变得有点狡猾，因此我们需要降低更高方差值的阈值。

我们可以定义一条线（方差，阈值）方程来表示阈值和方差之间的关系。以（0， 1.55）（0.02， 1.25）作为这条线的两个点。

我们的 FFT 结果在 0..1 范围内，因此方差值也在 0..1 范围内。

最后检测到节拍，如果

输出1 ，反之输出0，从而生成01序列输出到下一个结点。

为了实现该算法，我定义了一些变量来存储历史数据、采样频率和窗口大小等信息，并编写一些辅助函数来计算平均值、方差和阈值等。另外，为了存储历史数据，我使用双端队列（deque）容器，以便在开头插入新元素并在末尾删除最旧的元素。

本系统检测节拍具体步骤如下：

1.将音频数据按窗口大小进行分割，并计算每个窗口内的平均能量。

2.从频谱中选取低音范围，如60hz-130hz的范围，来捕捉电池的踢鼓和军鼓的使用。

3.对于每个窗口内的数据，在低音范围内计算其FFT结果，并获取相应的频率幅度。

4.根据一定历史记录的范围，如1秒内的历史数据，计算当前时间点的能量值和历史数据的平均能量。

5.根据历史数据的方差调整平均能量值的阈值。

6.判断当前时间点的能量值是否超过阈值，并根据一定规则来检测节拍。

4.1.3 本系统中部分音频结点

算法实现

    struct AudioBeats : zeno::INode {std::deque<double> H;virtual void apply() override {auto wave = get_input<PrimitiveObject>("wave");float threshold = get_input<NumericObject>("threshold")->get<float>();auto start_time = get_input<NumericObject>("time")->get<float>();float sampleFrequency = wave->userData().get<zeno::NumericObject>("SampleRate")->get<float>();int start_index = int(sampleFrequency * start_time);int duration_count = 1024;auto fft = Aquila::FftFactory::getFft(duration_count);std::vector<double> samples;samples.resize(duration_count);for (auto i = 0; i < duration_count; i++) {
//                if (start_index + i >= wave->size()) {
//                    break;
//                }samples[i] = wave->attr<float>("value")[min((start_index + i), wave->size()-1)];//if (start_index + i >= wave->size()) {//    break;//}//samples[i] = wave->attr<float>("value")[start_index + i];}Aquila::SpectrumType spectrums = fft->fft(samples.data());{double E = 0;for (const auto& spectrum: spectrums) {E += spectrum.real() * spectrum.real() + spectrum.imag() * spectrum.imag();}E /= duration_count;H.push_back(E);}while (H.size() > 43) {H.pop_front();}double avg_H = 0;for (const auto& E: H) {avg_H += E;}avg_H /= H.size();double var_H = 0;for (const auto& E: H) {var_H += (E - avg_H) * (E - avg_H);}var_H /= H.size();int beat = H.back() - threshold > (-15 * var_H + 1.55) * avg_H;set_output("beat", std::make_shared<NumericObject>(beat));set_output("var_H", std::make_shared<NumericObject>((float)var_H));auto output_H = std::make_shared<ListObject>();for (int i = 0; i < 43 - H.size(); i++) {output_H->arr.emplace_back(std::make_shared<NumericObject>((float)0));}for (const auto & h: H) {output_H->arr.emplace_back(std::make_shared<NumericObject>((float)h));}set_output("H", output_H);auto output_E = std::make_shared<ListObject>();for (const auto& spectrum: spectrums) {double e = spectrum.real() * spectrum.real() + spectrum.imag() * spectrum.imag();output_E->arr.emplace_back(std::make_shared<NumericObject>((float)e));}set_output("E", output_E);}};ZENDEFNODE(AudioBeats, {{"wave",{"float", "time", "0"},{"float", "threshold", "0.005"},},{"beat","var_H","H","E",},{},{"audio"},});struct AudioEnergy : zeno::INode {double minE = std::numeric_limits<double>::max();double maxE = std::numeric_limits<double>::min();std::vector<double> init;virtual void apply() override {auto wave = get_input<PrimitiveObject>("wave");int duration_count = 1024;if (init.empty()) {auto fft = Aquila::FftFactory::getFft(duration_count);int clip_count = wave->size() / duration_count;init.reserve(clip_count);for (auto i = 0; i < clip_count; i++) {std::vector<double> samples;samples.resize(duration_count);for (auto j = 0; j < duration_count; j++) {samples[j] = wave->attr<float>("value")[min(duration_count * i + j, wave->size()-1)];}Aquila::SpectrumType spectrums = fft->fft(samples.data());{double E = 0;for (const auto& spectrum: spectrums) {E += spectrum.real() * spectrum.real() + spectrum.imag() * spectrum.imag();}E /= duration_count;minE = min(minE, E);maxE = max(maxE, E);init.push_back(E);}}//            for (auto i = 0; i < clip_count; i++) {//                init[i] = init[i] / maxE;//            }}//        auto vis = std::make_shared<PrimitiveObject>();//        vis->resize(init.size());//        auto &index = vis->add_attr<float>("index");//        auto &listE = vis->add_attr<float>("E");//        for (auto i = 0; i < init.size(); i++) {//            index[i] = i;//            listE[i] = init[i];//        }//        set_output("vis", vis);set_output("minE", std::make_shared<NumericObject>((float)minE));set_output("maxE", std::make_shared<NumericObject>((float)maxE));auto start_time = get_input2<float>("time");float sampleFrequency = wave->userData().get<zeno::NumericObject>("SampleRate")->get<float>();int start_index = int(sampleFrequency * start_time);auto fft = Aquila::FftFactory::getFft(duration_count);std::vector<double> samples;samples.resize(duration_count);for (auto i = 0; i < duration_count; i++) {samples[i] = wave->attr<float>("value")[min((start_index + i), wave->size()-1)];}Aquila::SpectrumType spectrums = fft->fft(samples.data());double E = 0;for (const auto& spectrum: spectrums) {E += spectrum.real() * spectrum.real() + spectrum.imag() * spectrum.imag();}E /= duration_count;set_output("E", std::make_shared<NumericObject>((float)E));double uniE = (E - minE) / (maxE - minE);set_output("uniE", std::make_shared<NumericObject>((float)uniE));start_index /= duration_count;start_index = min(start_index, init.size() - 1);std::vector<double> _queue;for (int i = max(start_index - 43, 0); i < start_index; i++) {_queue.push_back((init[i] - minE) / (maxE - minE));}if (_queue.size() > 0) {double avg_H = 0;for (const double & e: _queue) {avg_H += e;}avg_H /= _queue.size();double var_H = 0;for (const double & e: _queue) {var_H += (e - avg_H) * (e - avg_H);}var_H /= _queue.size();double std_H = sqrt(var_H);//            zeno::log_info("E: {}, avg_H: {}, std_H: {}, var_H: {}", uniE, avg_H, std_H, var_H);float threshold = get_input2<float>("threshold");int beat = uniE > avg_H + std_H * threshold;set_output("beat", std::make_shared<NumericObject>(beat));}else {set_output("beat", std::make_shared<NumericObject>(0));}}};ZENDEFNODE(AudioEnergy, {{"wave",{"float", "time", "0"},{"float", "threshold", "1"},},{"beat","E","uniE","minE","maxE",
//            "vis",},{},{"audio"},});struct AudioFFT : zeno::INode {virtual void apply() override {auto wave = get_input<PrimitiveObject>("wave");int duration_count = 1024;auto start_time = get_input2<float>("time");float sampleFrequency = wave->userData().get<zeno::NumericObject>("SampleRate")->get<float>();int start_index = int(sampleFrequency * start_time);std::vector<double> samples;samples.resize(duration_count+1);for (auto i = 0; i < duration_count+1; i++) {samples[i] = wave->attr<float>("value")[min((start_index + i), wave->size()-1)];}auto pre_emphasis = get_input2<int>("preEmphasis");if (pre_emphasis) {auto alpha = get_input2<float>("preEmphasisAlpha");for (auto i = 0; i < duration_count; i++) {samples[i] = samples[i+1] - alpha * samples[i];}}samples.pop_back();auto hamming_window = get_input2<int>("hammingWindow");if (hamming_window) {for (auto i = 0; i < duration_count; i++) {double i_value = 0.54 - 0.46 * std::cos(2.0 * M_PI * i / (duration_count - 1));samples[i] = samples[i] * i_value;}}auto fft = Aquila::FftFactory::getFft(duration_count);Aquila::SpectrumType spectrums = fft->fft(samples.data());auto fft_prim = std::make_shared<PrimitiveObject>();fft_prim->resize(duration_count / 2 + 1);auto &freq = fft_prim->add_attr<float>("freq");auto &real = fft_prim->add_attr<float>("real");auto &image = fft_prim->add_attr<float>("image");auto &square = fft_prim->add_attr<float>("square");auto &power = fft_prim->add_attr<float>("power");for (std::size_t i = 0; i < fft_prim->verts.size(); ++i) {float r = spectrums[i].real();float im = spectrums[i].imag();freq[i] = float(i);real[i] = r;image[i] = im;float square_v = r * r + im * im;square[i] = square_v;power[i] = square_v / duration_count;}set_output("FFTPrim", fft_prim);}};ZENDEFNODE(AudioFFT, {{"wave",{"float", "time", "0"},{"bool", "preEmphasis", "0"},{"float", "preEmphasisAlpha", "0.97"},{"bool", "hammingWindow", "1"},},{"FFTPrim",},{},{"audio"},});struct MelFilter : zeno::INode {virtual void apply() override {auto fftPrim = get_input<PrimitiveObject>("FFTPrim");auto &power = fftPrim->attr<float>("power");auto sampleFreq = get_input2<float>("sampleFreq");auto rangePerFilter = get_input2<float>("rangePerFilter");float halfFreq = sampleFreq / 2;auto count = get_input2<int>("count");std::vector<float> hz_points;float mel_fh = 2595.0 * log10(1+halfFreq/700.0);for (int i = 0; i <= count + 1; i++) {float mel = mel_fh * i / (count + 1);float hz = 700.0 * (pow(10.0, mel / 2595.0) - 1);hz_points.push_back(hz);}std::vector<int> bin;for (const auto& hz: hz_points) {int index = (1024.0+1.0) * hz / sampleFreq;bin.push_back(index);}auto fbank = std::make_shared<PrimitiveObject>();fbank->resize(count);auto& fbank_v = fbank->add_attr<float>("fbank");for (auto i = 1; i <= count; i++) {int s = bin[i-1];int m = bin[i];int e = bin[i+1];s = (int) zaudio::lerp(m, s, rangePerFilter);e = (int) zaudio::lerp(m, e, rangePerFilter);float total = 0;for (auto i = s; i < m; i++) {float cof = (float)(m - i) / (float)(m - s);total += power[i] * cof;}for (auto i = m; i < e; i++) {float cof = 1 - (float)(m - i) / (float)(e - m);total += power[i] * cof;}if (total == 0) {fbank_v[i-1] = std::numeric_limits<float>::min();}else {fbank_v[i-1] = log(total);}}auto indexType = get_input2<std::string>("indexType");if (indexType == "index") {auto& index = fbank->add_attr<float>("i");for (auto i = 1; i <= count; i++) {index[i-1] = (float)(i-1);};} else if (indexType == "indexdivcount") {auto& index = fbank->add_attr<float>("i");for (auto i = 1; i <= count; i++) {index[i-1] = (float)(i-1) /count;};}set_output("FilterBank", fbank);}};ZENDEFNODE(MelFilter, {{"FFTPrim",{"int", "count", "15"},{"float", "sampleFreq", "44100"},{"float", "rangePerFilter", "1"},{"enum none index indexdivcount", "indexType", "index"},},{"FilterBank",},{},{"audio",},});
} // namespace zeno

参考文献

BEAT DETECTION ALGORITHMS.doc (parallelcube.com)https://www.parallelcube.com/web/wp-content/uploads/2018/03/BeatDetectionAlgorithms.pdf

TODO：

"A Review on Audio Event Detection," H. Su, et al., IEEE Access, vol. 8, pp. 77580-77593, 2020.
"Acoustic Event Detection with SEDNN: A Deep Learning Approach," P. Jaiswal and Y. Han, 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 316-320.
"Event Detection Using Multitask Learning of Auditory Features and Sound Event Classifiers," D. D. Lee, et al., IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 6, pp. 1190-1201, 2017.
"Audio Event Detection Using Deep Learning with Mel-Frequency Cepstral Coefficients," S. Gupta, et al., 2019 IEEE 10th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, Canada, 2019, pp. 186-191.
"Deep Convolutional Neural Networks for Acoustic Event Detection in Domestic Environments," M. L. Seltzer, et al., IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 1, pp. 111-125, 2016.
"Environmental Sound Classification with Convolutional Neural Networks," J. Salamon, et al., IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, QLD, Australia, 2015, pp. 732-736.

以上论文介绍了一些常用的音频检测算法，包括深度学习算法和基于特征的传统算法。这些算法可用于识别音频中的各种事件，如说话、喷嚏、汽车鸣笛等。可以根据实际需求选择合适的算法进行实现。

【毕业设计】基于程序化生成和音频检测的生态仿真与3D内容生成系统----音频检测算法设计相关推荐

【毕业设计】基于程序化生成和音频检测的生态仿真与3D内容生成系统----程序化生成地形算法设计
2 程序化生成地形算法设计 Input: Output: 2.1 地形的生成程序化生成地形算法是一种在计算机中生成地形的方法,通常用于游戏开发和虚拟现实应用.下面是几种常见的程序化生成地形算法: D ...
java毕业设计基于的企业办公管理系统设计与实现mybatis+源码+调试部署+系统+数据库+lw
java毕业设计基于的企业办公管理系统设计与实现mybatis+源码+调试部署+系统+数据库+lw java毕业设计基于的企业办公管理系统设计与实现mybatis+源码+调试部署+系统+数据库+lw ...
linux中检测用户信息的命令是,Linux中系统状态检测命令
1.ifconfig用于获取网卡配置与网络状态等信息,格式为:ifconfig [网络设备] [参数] 2.uname命令用于查看系统内核版本等信息,格式为:uname [-a] 查看系统的内核名称. ...
java计算机毕业设计基于安卓Android的校园快药APP-药店管理app(源码+系统+mysql数据库+Lw文档）
项目介绍本文介绍了校园快药APP软件开发建设的意义和国内外发展现状,然后详细描述了所开发手机APP的可行性分析,并分析了手机APP所要实现的功能.因为校园快药设施较多,而且人口密集,不能更好的管理校 ...
C++毕业设计——基于C+++EasyX+剪枝算法的能人机对弈的五子棋游戏设计与实现（毕业论文+程序源码）——五子棋游戏
基于C+++EasyX+剪枝算法的能人机对弈的五子棋游戏设计与实现(毕业论文+程序源码) 大家好,今天给大家介绍基于C+++EasyX+剪枝算法的能人机对弈的五子棋游戏设计与实现,文章末尾附有本毕业设 ...
阿士比亚：搜索团队智能内容生成实践
一.项目背景 1.1 什么是智能内容生成? 更准确的定义应该是智能文本内容生成,指的是训练机器模型,智能生成单品推荐理由.多商品清单文章一类的文本型内容,显然,与智能内容生成相对的概念 ...
淘宝总知道你要什么？万字讲述智能内容生成实践 | 技术头条
参加「CTA 核心技术及应用峰会·杭州」,请扫码报名 ↑↑↑ 作者 | 清淞来源 | 清淞的知乎专栏专栏地址: https://zhuanlan.zhihu.com/p/33956907 本文主要 ...
AIGC-AI内容生成深度产业报告
研报地址:AIGC-AI内容生成深度产业报告随着人工智能技术的不断发展和进步,AI内容生成已经成为了一个热门的应用领域.其中,AIGC(AI Generated Content)是最为典型的应用之一 ...
3d 自动生成物体_相芯科技首秀SIGGRAPH，3D形象自动生成火了
作为全球影响最广.规模最大的CG展示.学术研讨会,SIGGRAPH汇集了全球众多顶尖的计算机图形技术厂商以及影视行业的领军人物,分享最尖端的前沿技术.今年在美国洛杉矶会展中心,SIGGRAPH2019 ...

【毕业设计】基于程序化生成和音频检测的生态仿真与3D内容生成系统----音频检测算法设计