Dirichlet Proscess

  • Dirichlet_tutorial
  • 一、Introduction
  • 二、Gaussian Mixture Model (GMM)
  • 三、Construction of Dirichlet Process
  • 四、Stick-Breaking Construction
  • 五、The nature of Dirichlet distribution
  • 六、Chinese Restaurant Process

Dirichlet_tutorial

Author: Li Dong; Time: 14 April 2020


一、Introduction

The following are my rough notes on Dilirchlet Process video by Professor Richard Xu. Any errors are mine.

二、Gaussian Mixture Model (GMM)

A motiviating example

Let’s assume that the above data comes from GMM. X={x1,x2,…xn}. Suppose they come from K samples of Gaussian Distribution, then the joint probability distribution of all samples is


Now,the question is how to determine K? The physical meaning of K is Gauss’s number of mixtures.(In EM,K is a constan), In this model, parameter θ={μl…μk,εl…εk,αl… αk}, At this point we can’t tell the value of k from the graph above.
      Take K as a parameter:
θ={μl…μk,εl…εk,αl… αk,k},argmax[P(X)], In this case, it must be K=N (N is the total number of data), Because in this case, the maximum likelihood for K=N is the maximum, This clustering is obviously not what we were hoping for.
     We want the model to automatically confirm K, let’s say it’s a function of N K=f(N),
(On the Dirichlet Process, E(K)∝logN), We have a set of data X={x1,x2,…xn},Each data is from a distribution with the parameter θ,That is, the data corresponds to a set of parameters {θ1,⋯,θN},now we need a distribution to explain these parameters.
      Here are the assumptions:

θi∼H(θ)

If this H is a continuous distribution, then the sample taken from H can not be exactly the same(because if θi, θj~H,P(θi=θj)=0), At this point, the outcome K=H,which brings us back to where we don’t want to be. So we need a discrete distribution, G θi~G, meanwhile, we want G and H to be similar, at this point, we introduce the Dirichlet Process to construct this G:

G∼DP(α,H)

H, like H above, is called the base measure, α is a scalar(α>0), it describes the degree of dispersion of G,when α=0, G is the most discrete case, G has only one value;when α= ∞,G=H. It’s exactly what we expected!
       In a real situation, in the Dirichlet distribution, H can also be a continuous distribution.G is a random distribution from DP.
The characteristics of this G can be viewed as follows. First, we divide each G into different areas with several vertical lines. It can be divided into any number of areas, and the size of each area is casual. We can name each area, for example, we are divided into K areas. αl… αk,


Since G is a random measure, it is assumed that the sum of the weights of each area (that is, the total length of the vertical line) is expressed as G(α1),G(α2)… are also random measures.They also have a probabilistic nature. The nature of this probability is that they obey a Dirichlet distribution together:

(G(a1),G(a2),⋯,G(ad))∼DIR(αH(a1),αH(a2),⋯,αH(ad))

This is the definition of Dirichlet Process. Note that G(a1),here represents the total weight of G in the a1 region, and H(a1) represents the total weight of H in the a1 region. In other words, a certain division under each DP sample is subject to a Dirichlet distribution.
The properties of Dirichlet distribution are as follows:

So, in the above DP, we can write that

        We found that α does not appear in the mean, which means that this α has no effect on the mean.Next we look at two extreme cases: when α=0,The variance is 0. In other words, the measurement of G in a certain area is equal to the measurement of H in this area;when α=∞, Var[G(ak)]=H(αk)(1−H(αk)),at this time, the distribution of G in a certain area is a Bernoulli distribution, which means that it is either in this area or not in this area. At this time, G is the most discrete case of H.This is the same as the expectation above.

三、Construction of Dirichlet Process

First explain what is construction. For the distribution in statistics, we all have a probability density function, and we sample all points from this distribution. However, in the Dirichlet process, each sample is a random measure, which contains countless points and corresponding weights. This kind of sampling is very difficult, and from the definition, we can not get the sampling method. Therefore, we need to have a construction method that can generate such measure.

四、Stick-Breaking Construction

Earlier we said that each DP sample is a random measure. It consists of countless points and their corresponding lengths. In statistics, this point is called an atom. Well, in one sampling of DP, we obtained it by extracting countless Atoms and their weights. Let’s see how to sample this Atom and its weights step by step. We have a base distribution of H. First, we randomly extract a value θ from H. This value corresponds to an Atom, and its weight is the height of the vertical line at the position. Then, the Atom of the first sample is θ1:

θ1∼H

Next we need to determine the height of the vertical line, which is the weight corresponding to this Atom. We first extract a value from a Beta distribution whose parameter is (1, α)

β1∼Beta(1,α)


       The horizontal line in the figure above shows a line segment with a length of 1, and the left start point is 0 and the right end point is 1. We randomly select an atom, and its corresponding weight extraction method is to first extract a result from Beta (1, α) to get β1 , whose weight is π =β1。
       At the time of sub-sampling, we still randomly extract an atom is θ2,and its corresponding weight extraction method is, first extract a result in Beta (1, α) to get β2, then its weight is π2=(1−π1)β2 . This means taking the position of β1 from the end of the first extraction to the end of 1 as a new line segment, and then extracting a new position to obtain a new weight. Then the position of the new weight must be between β and 1. as the picture shows. In other words, the result of the second Atom pumping:

θ2∼H π2=(1−π1)β1

For the following samples (Atom) continue to be extracted in the above manner, all the Atom and its weights obtained are the results of a G.when α=0, E[β1]=1,in other words, all the weights are on the first Atom, the other weights are all 0, which is the most discrete case; when α=∞, E[β1]=0,then all Atom weights are close to 0, then this means that each atom weight is almost and very close to 0, that is to say this G = H.Therefore, if G is a sample of DP, then it is composed of countless atoms and their weights, and G can be written as:

五、The nature of Dirichlet distribution

In Dirichlet Proscess Mixture Model, we know,Now there are some data x1 … xn, each data has parameters to generate these data, we set these corresponding parameters toθ1… θn, all these parameters should be generated from the discrete measure G,the G generated from the Dirichlet distribution can be used in this place.


The question now is when θ knows, what is the posterior of G

The relationship between Dirichlet distribution and Multinomial distribution:

The relationship between Dirichlet distribution and Dirichlet Process is:

Bring the above properties into the above

And:

       In the above formula, the previous part is a continuous base measure,the latter is a discrete measure.The above formula is called stidc and slab in statistics.

六、Chinese Restaurant Process

Suppose a Chinese restaurant has unlimited tables, and the first customer sits on the first table after the arrival. When the second customer comes, you can choose to sit on the first table, or you can choose to sit on a new table. Assuming that the n + 1th customer arrives, there are already k customers on the table. n1 n2, …, nk customers, then the probability of the n + 1 customer sitting on the i-th table.The mathematical expression of the Chinese restaurant process is as follows: If θ1… θk are generated from the same distribution.Find P (θi∣θ-i), θ-i={θ1…θi-1, θi+1… },(I’m a bit confused here, shouldn’t it be {θ1…θi-1}?) Suppose W is the parameter of this distribution.so:

       In this case, we don’t care what the value of a is, but what kind of a is in the corresponding CRP process is which table the i-th person went to.So at this time we introduce {z1…zi}, which corresponds to {θ1…θi} one-to-one, z indicates which class θi is in, and the corresponding CRP is the label of the table where people go.so:

       Because the Dirichlet distribution is the conjugate prior of The Multinomial distribution, we know:

       Regarding the problem of the coefficient a, this coefficient is generated when we divide the categories and only care about the same number of classes. Obviously, in the Dirichlet process, the same number of classes is not equivalent. So we remove this coefficient and bring it into

so:

       nl,-imeans that there are several z in a equal to {θ1…θi-1, θi+1… },m ranges from 1 to k,at this point we sum m over 1 to k:

That means:

This result is called the Chinese Restaurant Process.

Dirichlet Proscess相关推荐

  1. 潜在狄利克雷分配(LDA,Latent Dirichlet Allocation)模型(三)

    潜在狄利克雷分配(LDA,Latent Dirichlet Allocation)模型(三) 目录 潜在狄利克雷分配(LDA,Latent Dirichlet Allocation)模型(三) 主题演 ...

  2. 潜在狄利克雷分配(LDA,Latent Dirichlet Allocation)模型(二)

    潜在狄利克雷分配(LDA,Latent Dirichlet Allocation)模型(二) 目录 潜在狄利克雷分配(LDA,Latent Dirichlet Allocation)模型(二) LDA ...

  3. 潜在狄利克雷分配(LDA,Latent Dirichlet Allocation)模型(一)

    潜在狄利克雷分配(LDA,Latent Dirichlet Allocation)模型(一) 目录 潜在狄利克雷分配(LDA,Latent Dirichlet Allocation)模型(一) LDA ...

  4. 关于微分方程的初值条件和边界条件(狄里克雷(Dirichlet)条件、诺依曼(Neumann)条件、洛平(Robin)条件)

    在有限元仿真运算时,经常碰到的是对PDE方程的求解,常用的仿真软件如COMSOL.ANSYS.ABAQUS的均是对PDE方程的工具.要使求解PDE方程有确定的解,就需要引入一定的条件--定解条件,有时 ...

  5. 关于Beta分布、二项分布与Dirichlet分布、多项分布的关系

    from:http://blog.csdn.net/u010140338/article/details/41344853 From : http://www.cnblogs.com/wybang/p ...

  6. 【LDA学习系列】Dirichlet分布python代码

    代码: # -*- coding: utf-8 -*- ''' Created on 2018年5月15日 @author: user @attention: dirichret distributi ...

  7. 机器学习知识点(二十八)Beta分布和Dirichlet分布理解

    1.二者关系: Dirichlet分布是Beta分布的多元推广.Beta分布是二项式分布的共轭分布,Dirichlet分布是多项式分布的共轭分布. 通常情况下,我们说的分布都是关于某个参数的函数,把对 ...

  8. Latent dirichlet allocation note

    2 Latent Dirichlet Allocation Introduction LDA是给文本建模的一种方法,它属于生成模型.生成模型是指该模型可以随机生成可观测的数据,LDA可以随机生成一篇由 ...

  9. Latent dirichlet allocation note -- Prepare

    转自莘莘学子blog : http://ljm426.blog.163.com/blog/static/120003220098110425415/ By: Zhou, Blog: http://fo ...

最新文章

  1. Linux下安装配置NTP时间同步服务器
  2. 催收成本在单体经济中的分析
  3. 使用Xftp实现Windows与Linux服务器实现快速传输文件
  4. 源码免杀--反调试代码,免杀爱好者必备的利剑
  5. 电信充q币短信怎么发_移动、联通、电信话费快来领!微信小额提现免手续费方法!刚需羊毛!...
  6. aps是什么意思_三分钟看懂ERP、MES、APS系统的关联和区别
  7. 电脑遇到蓝屏代码0x000007b问题如何解决
  8. 如何制作高效率的数据可视化大屏
  9. linux内核奇遇记之md源代码解读之一
  10. ad中按钮开关的符号_弱电图纸中敷设方式符号表示大全
  11. html内编写vbs,HTML_VBS编程教程 (第2篇),第二篇: 我真没想到, - phpStudy
  12. C语言运算符优先级(超详细)
  13. 记成功坚持学习英语45天截点—Darren
  14. Mybatis批量新增
  15. 上传身份证照片js_web端上传图片,截取证件照
  16. 浅议初中语文微写作(语文教师论文)
  17. webpack `Invalid Host/Origin header`问题
  18. 【算法】剑指offer-删除链表中重复的节点最小栈
  19. 2021湖南省地区高考成绩排名查询,湖南高考排名查询方法 2020年湖南高考成绩位次全省排名查询...
  20. 架设网站php,初次架设PHP网站

热门文章

  1. 2018年企业采购平台哪个好?供应平台哪个好?
  2. NetApp AFF C 系列全闪存存储解决方案
  3. php 批量改文件名后缀名_PHP实现批量修改文件后缀名的方法
  4. Dynamo_修复导出EXCEL时出现的组件丢失问题
  5. 只知道SQL数据库?又一国产数据库语言诞生了
  6. GAERF: predicting lncRNA-disease associations by graph auto-encoder and random forest(通过图自动编码器和随机森林)
  7. 高精度 A+B Problem
  8. 深入理解搜索引擎——基于DPSR的个性化召回模型
  9. 常用免费 API 接口分享(持续更新)
  10. ETL - ETL工具介绍