《SONG FROM PI: A MUSICALLY PLAUSIBLE NETWORK FOR POP MUSIC GENERATION》论文笔记

出处：ICLR 2017

Motivation

提出一个通用的基于RNN的pop music生成模型，在层次结构中封装了先验乐理知识（prior knowledge about how pop music is composed）。bottom layers生成旋律，higher levels生成鼓，和弦等。人工听觉测试的结论优于google提出的模型。并且作者基于该模型加了两个小应用：neural dancing and karaoke, as well as neural story singing.

Introduction

作者从机器学习往艺术领域的渗透开始谈起，目前已经在模仿梵高风格绘画，生成story，莎翁的小说等等方面取得进展，音乐是其中一个分属领域。RNN在自然语言文本处理方面有着自己的优势，在它的基础上完成音乐生成的工作具备可行性。例如[1,2,3,4]。但这些前人的工作基本都是生成单轨道的note，多轨道生成的研究有[5](polyphonic music)。作者希望能将旋律，和弦，鼓及其他乐器轨道同时生成出来，以构成完整意义上的pop song。作者的想法借鉴了yotube视频上基于$pi$的序列弹奏钢琴曲的启迪（https://youtu.be/OMq9he-5HUU），该钢琴曲的一些生成规则使随机不循环数列形成了音乐（shows both the randomness and the regularity of music. On one hand, since any possible digit sequence is a subset of the $pi$ digit sequence, this implies that pleasing music can be created even from a totally random base signal. On the other hand, the composer uses specific rules such as A Harmonic Minor scale and harmonies to convert the digit sequence into a music sheet. It is these rules that play the key role in converting randomness into music.）

Related work

基本上智能谱曲经历的时期是早期的机器学习+乐理[6]，到神经网络学习[1,2,3]，再到后面的深度学习（RNN）[4,7,8]+淡化乐理

音乐常识

what is note？defines the basic unit that music is composed of

12均分律 Music follows the 12-tone system, i.e., 12 is the cycle length of all notes. The 12 tones are: $C$, $C\^#=D\^b&, &D&,&D\^#=E\^b&, $E$,$F$, $F\^#=G\^b$, &G&, &G\^#=A\^b&, $A$,&A\^#=B\^b&, &B&.

A bar is a short segment of time that corresponds to a specific number of beats (notes).

Scale is a subset of notes.最常见的四种音阶：大小调Major (Minor), 和声小调Harmonic Minor, 旋律小调Melodic Minor and 布鲁斯Blues。如C大调音阶（C major）从c开始The subset of notes specified by C Major is thus C, D, E, F, G, A, and B (a subset of seven notes). All scales types have a subset of seven notes except for Blues which has six. In total we have 48 unique scales, i.e. 4 scale types and 12 possible starting notes. We treat Major and Minor as one type as for a Major scale there is always a Minor that has exactly the same set of notes. In music theory, this is referred to as Relative Minor.（关系小调）

Chord 和弦

The Circle of Fifths 五度音环

利用五度圈可以很容易进行和弦倾向的走向判定（strong chord progression），使整个乐章进行和谐。

模型结构

在生成音乐时，需要将scale作为条件，以便模型选择node。在每个timestep，将旋律melody封装为两个随机变量：key layer和press layer 分别表示按下的key值和duration时间。对于chord和鼓，作者假设它们与旋律是独立的，在每一个timestep，将旋律作为条件，生成chord layer和drum layer。

在实验时，作者针对Scale条件做了一些预处理。通常一个类型的音阶只会使用到12均分律中的7个音，或blues使用的6个。在数据集midi_man中采样了100个小时的pop song sample后，作者对所有note做了一个normalization，将首个note都平移至C（其余notes也做相应的平移），这样就便于将所有的歌曲归纳到4种类型的音阶中去。

旋律生成采用了两层的RNN（LSTM）模型，模型基于我们选定的音阶条件来生成音符，第一层为key layer，第二层为press layer。

由于有不同的scale，所以针对不同的scale，参数不一样，要重新训练？？？？ notes的输入输出范围被限定在C3 to C6，鼓励但不限定输出note一定要在scale的范围内，这样就会得到3个全音程（每个12个音符）加上静音共37个输出的范围值。press layer的输出使用softmax（？为什么）

LSTM的输入包括：以one-hot形式编码的上一个时间节点的note输出， Lookback features(由Google Magenta提出，可以使模型更容易记住近期的生成并在将来进行repeat，这里面有一些细节的数据结构，如用来记录一个bar和两个bar之前的输出与当前输出的对应关系之类的，需要看代码细致了解才行),melody profile（表现了high-level music flow，To get the profile for each song, we compute the local
note histogram at each time step with width of two bars, and cluster all local histograms within the song into 10 clusters via k-means. We order the 10 clusters with mean note ordered from low to high as cluster 1 to 10, and apply moving averages on the cluster id sequence to encourage local smoothness. This results in a 10-dimensional one-hot vector representation of the cluster id for each time step. This additional information allows the user to set the melody’s ups and downs of the song.本人理解这个profile定义了旋律的走向是升高还是降低）。使用了增序序列1，2，3...来表示按键的持续时间，作者指出这种方式相对于Magenta的单一note on消息要有优势，This is important, as Waite et al. has extremely unbalanced output distributions dominated by the repeat-of-holding event. We represent press $y_prs_^t$ as a 8-dimensional one-hot vector. The input to our LSTM is$y_prs_^{t-1}$ , concatenated with the 37-dimensional one-hot encoding of the melody key $y_key_^t$.

Chord layer

作者发现 99.19% of the chords belong to one of 72 chord classes (6 types X 12 start notes)，且chord is strongly correlated with melody.如下为首音符与和弦的对应关系统计图

[1]Jamshed J. Bharucha and Peter M. Todd. Modeling the perception of tonal structure with neural nets. Computer Music Journal, 13(4):44–53, 1989.

[2]Michael C. Mozer. Neural network music composition by prediction: Exploring the benefits of psychoacoustic constraints and multi-scale processing. Connection Science, 6(2-3), 1996.

[3]Chun-Chi J. Chen and Risto Miikkulainen. Creating melodies with evolving recurrent neural networks. In International Joint Conference on Neural Networks, 2001.

[4]Douglas Eck and Juergen Schmidhuber. A first look at music composition using lstm recurrent neural networks. 2002.

[5]Nicolas Boulanger-lewandowski, Yoshua Bengio, and Pascal Vincent. Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription.In ICML, 2012.

[6]Michael Chan, John Potter, and Emery Schubert. Improving algorithmic music composition with machine learning. In 9th International Conference on Music Perception and Cognition, 2006.

[7]Semin Kang, Soo-Yol Ok, and Young-Min Kang. Automatic Music Generation and Machine Learning Based Evaluation, pp. 436–443. Springer Berlin Heidelberg, 2012.(复调，但是scale type is enforced)

[8]Allen Huang and Raymond Wu. Deep learning for music. arXiv preprint arXiv:1606.04930, 2016 (2-layer LSTM,able to create chord)

转载于:https://www.cnblogs.com/punkcure/p/8072370.html

《SONG FROM PI: A MUSICALLY PLAUSIBLE NETWORK FOR POP MUSIC GENERATION》论文笔记相关推荐

论文笔记之Understanding and Diagnosing Visual Tracking Systems
Understanding and Diagnosing Visual Tracking Systems 论文链接:http://dwz.cn/6qPeIb 本文的主要思想是为了剖析出一个跟踪算法中到 ...
《Understanding and Diagnosing Visual Tracking Systems》论文笔记
本人为目标追踪初入小白,在博客下第一次记录一下自己的论文笔记,如有差错,恳请批评指正!! 论文相关信息:<Understanding and Diagnosing Visual Tracking ...
论文笔记Understanding and Diagnosing Visual Tracking Systems
最近在看目标跟踪方面的论文,看到王乃岩博士发的一篇分析跟踪系统的文章,将目标跟踪系统拆分为多个独立的部分进行分析,比较各个部分的效果.本文主要对该论文的重点的一个大致翻译,刚入门,水平有限,如有理解错 ...
目标跟踪笔记Understanding and Diagnosing Visual Tracking Systems
Understanding and Diagnosing Visual Tracking Systems 原文链接:https://blog.csdn.net/u010515206/article/d ...
追踪系统分模块解析（Understanding and Diagnosing Visual Tracking Systems）
追踪系统分模块解析(Understanding and Diagnosing Visual Tracking Systems) PROJECT http://winsty.net/tracker_di ...
ICCV 2015 《Understanding and Diagnosing Visual Tracking Systems》论文笔记
目录写在前面文章大意一些benchmark 实验实验设置基本模型数据集实验1 Featrue Extractor 实验2 Observation Model 实验3 Motion Mod ...
Understanding and Diagnosing Visual Tracking Systems
文章把一个跟踪器分为几个模块,分别为motion model, feature extractor, observation model, model updater, and ensemble po ...
CVPR 2017 SANet:《SANet: Structure-Aware Network for Visual Tracking》论文笔记
理解出错之处望不吝指正. 本文模型叫做SANet.作者在论文中提到,CNN模型主要适用于类间判别,对于相似物体的判别能力不强.作者提出使用RNN对目标物体的self-structure进行建模,用于提 ...
ICCV 2017 UCT:《UCT: Learning Unified Convolutional Networks forReal-time Visual Tracking》论文笔记
理解出错之处望不吝指正. 本文模型叫做UCT.就像论文题目一样,作者提出了一个基于卷积神经网络的end2end的tracking模型.模型的整体结构如下图所示(图中实线代表online trackin ...
CVPR 2018 STRCF:《Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking》论文笔记
理解出错之处望不吝指正. 本文提出的模型叫做STRCF. 在DCF中存在边界效应,SRDCF在DCF的基础上中通过加入spatial惩罚项解决了边界效应,但是SRDCF在tracking的过程中要使用 ...

《SONG FROM PI: A MUSICALLY PLAUSIBLE NETWORK FOR POP MUSIC GENERATION》论文笔记

《SONG FROM PI: A MUSICALLY PLAUSIBLE NETWORK FOR POP MUSIC GENERATION》论文笔记相关推荐

最新文章

热门文章