Dirichlet Proscess

Dirichlet_tutorial
一、Introduction
二、Gaussian Mixture Model (GMM)
三、Construction of Dirichlet Process
四、Stick-Breaking Construction
五、The nature of Dirichlet distribution
六、Chinese Restaurant Process

Dirichlet_tutorial

Author: Li Dong; Time: 14 April 2020

一、Introduction

The following are my rough notes on Dilirchlet Process video by Professor Richard Xu. Any errors are mine.

二、Gaussian Mixture Model (GMM)

A motiviating example

Let’s assume that the above data comes from GMM. X={x1,x2,…xn}. Suppose they come from K samples of Gaussian Distribution, then the joint probability distribution of all samples is

Now,the question is how to determine K? The physical meaning of K is Gauss’s number of mixtures.(In EM,K is a constan), In this model, parameter θ={μl…μk，εl…εk，αl… αk}, At this point we can’t tell the value of k from the graph above.
      Take K as a parameter:
θ={μl…μk，εl…εk，αl… αk,k},argmax[P(X)], In this case, it must be K=N (N is the total number of data), Because in this case, the maximum likelihood for K=N is the maximum, This clustering is obviously not what we were hoping for.
     We want the model to automatically confirm K, let’s say it’s a function of N K=f(N),
(On the Dirichlet Process, E(K)∝logN), We have a set of data X={x1,x2,…xn}，Each data is from a distribution with the parameter θ，That is, the data corresponds to a set of parameters {θ1,⋯,θN}，now we need a distribution to explain these parameters.
      Here are the assumptions:

θi∼H(θ)

If this H is a continuous distribution, then the sample taken from H can not be exactly the same(because if θi, θj~H,P(θi=θj)=0), At this point, the outcome K=H,which brings us back to where we don’t want to be. So we need a discrete distribution, G θi~G, meanwhile, we want G and H to be similar, at this point, we introduce the Dirichlet Process to construct this G:

G∼DP(α,H)

H, like H above, is called the base measure, α is a scalar(α>0), it describes the degree of dispersion of G,when α=0, G is the most discrete case, G has only one value;when α= ∞，G=H. It’s exactly what we expected!
In a real situation, in the Dirichlet distribution, H can also be a continuous distribution.G is a random distribution from DP.
The characteristics of this G can be viewed as follows. First, we divide each G into different areas with several vertical lines. It can be divided into any number of areas, and the size of each area is casual. We can name each area, for example, we are divided into K areas. αl… αk,

Since G is a random measure, it is assumed that the sum of the weights of each area (that is, the total length of the vertical line) is expressed as G(α1)，G（α2）… are also random measures.They also have a probabilistic nature. The nature of this probability is that they obey a Dirichlet distribution together:

(G(a1),G(a2),⋯,G(ad))∼DIR(αH(a1),αH(a2),⋯,αH(ad))

This is the definition of Dirichlet Process. Note that G(a1),here represents the total weight of G in the a1 region, and H(a1) represents the total weight of H in the a1 region. In other words, a certain division under each DP sample is subject to a Dirichlet distribution.
The properties of Dirichlet distribution are as follows:

So, in the above DP, we can write that

We found that α does not appear in the mean, which means that this α has no effect on the mean.Next we look at two extreme cases: when α=0,The variance is 0. In other words, the measurement of G in a certain area is equal to the measurement of H in this area;when α=∞, Var[G(ak)]=H(αk)(1−H(αk)),at this time, the distribution of G in a certain area is a Bernoulli distribution, which means that it is either in this area or not in this area. At this time, G is the most discrete case of H.This is the same as the expectation above.

三、Construction of Dirichlet Process

First explain what is construction. For the distribution in statistics, we all have a probability density function, and we sample all points from this distribution. However, in the Dirichlet process, each sample is a random measure, which contains countless points and corresponding weights. This kind of sampling is very difficult, and from the definition, we can not get the sampling method. Therefore, we need to have a construction method that can generate such measure.

四、Stick-Breaking Construction

Earlier we said that each DP sample is a random measure. It consists of countless points and their corresponding lengths. In statistics, this point is called an atom. Well, in one sampling of DP, we obtained it by extracting countless Atoms and their weights. Let’s see how to sample this Atom and its weights step by step. We have a base distribution of H. First, we randomly extract a value θ from H. This value corresponds to an Atom, and its weight is the height of the vertical line at the position. Then, the Atom of the first sample is θ1:

θ1∼H

Next we need to determine the height of the vertical line, which is the weight corresponding to this Atom. We first extract a value from a Beta distribution whose parameter is (1, α)

β1∼Beta(1,α)

The horizontal line in the figure above shows a line segment with a length of 1, and the left start point is 0 and the right end point is 1. We randomly select an atom, and its corresponding weight extraction method is to first extract a result from Beta (1, α) to get β1 , whose weight is π =β1。
At the time of sub-sampling, we still randomly extract an atom is θ2,and its corresponding weight extraction method is, first extract a result in Beta (1, α) to get β2, then its weight is π2=(1−π1)β2 . This means taking the position of β1 from the end of the first extraction to the end of 1 as a new line segment, and then extracting a new position to obtain a new weight. Then the position of the new weight must be between β and 1. as the picture shows. In other words, the result of the second Atom pumping：

θ2∼H π2=(1−π1)β1

For the following samples (Atom) continue to be extracted in the above manner, all the Atom and its weights obtained are the results of a G.when α=0, E[β1]=1,in other words, all the weights are on the first Atom, the other weights are all 0, which is the most discrete case; when α=∞, E[β1]=0,then all Atom weights are close to 0, then this means that each atom weight is almost and very close to 0, that is to say this G = H.Therefore, if G is a sample of DP, then it is composed of countless atoms and their weights, and G can be written as:

五、The nature of Dirichlet distribution

In Dirichlet Proscess Mixture Model, we know,Now there are some data x1 … xn, each data has parameters to generate these data, we set these corresponding parameters toθ1… θn, all these parameters should be generated from the discrete measure G,the G generated from the Dirichlet distribution can be used in this place.

The question now is when θ knows, what is the posterior of G

The relationship between Dirichlet distribution and Multinomial distribution:

The relationship between Dirichlet distribution and Dirichlet Process is：

Bring the above properties into the above

And:

In the above formula, the previous part is a continuous base measure，the latter is a discrete measure.The above formula is called stidc and slab in statistics.

六、Chinese Restaurant Process

Suppose a Chinese restaurant has unlimited tables, and the first customer sits on the first table after the arrival. When the second customer comes, you can choose to sit on the first table, or you can choose to sit on a new table. Assuming that the n + 1th customer arrives, there are already k customers on the table. n1 n2, …, nk customers, then the probability of the n + 1 customer sitting on the i-th table.The mathematical expression of the Chinese restaurant process is as follows: If θ1… θk are generated from the same distribution.Find P (θi∣θ-i), θ-i={θ1…θi-1, θi+1… },(I’m a bit confused here, shouldn’t it be {θ1…θi-1}?) Suppose W is the parameter of this distribution.so:

       In this case, we don’t care what the value of a is, but what kind of a is in the corresponding CRP process is which table the i-th person went to.So at this time we introduce {z1…zi}, which corresponds to {θ1…θi} one-to-one, z indicates which class θi is in, and the corresponding CRP is the label of the table where people go.so:

       Because the Dirichlet distribution is the conjugate prior of The Multinomial distribution, we know:

       Regarding the problem of the coefficient a, this coefficient is generated when we divide the categories and only care about the same number of classes. Obviously, in the Dirichlet process, the same number of classes is not equivalent. So we remove this coefficient and bring it into

so:

       nl,-imeans that there are several z in a equal to {θ1…θi-1, θi+1… },m ranges from 1 to k,at this point we sum m over 1 to k:

That means:

This result is called the Chinese Restaurant Process.

Dirichlet Proscess相关推荐

潜在狄利克雷分配(LDA，Latent Dirichlet Allocation)模型（三）
潜在狄利克雷分配(LDA,Latent Dirichlet Allocation)模型(三) 目录潜在狄利克雷分配(LDA,Latent Dirichlet Allocation)模型(三) 主题演 ...
潜在狄利克雷分配(LDA，Latent Dirichlet Allocation)模型（二）
潜在狄利克雷分配(LDA,Latent Dirichlet Allocation)模型(二) 目录潜在狄利克雷分配(LDA,Latent Dirichlet Allocation)模型(二) LDA ...
潜在狄利克雷分配(LDA，Latent Dirichlet Allocation)模型（一）
潜在狄利克雷分配(LDA,Latent Dirichlet Allocation)模型(一) 目录潜在狄利克雷分配(LDA,Latent Dirichlet Allocation)模型(一) LDA ...
关于微分方程的初值条件和边界条件（狄里克雷(Dirichlet)条件、诺依曼(Neumann)条件、洛平(Robin)条件）
在有限元仿真运算时,经常碰到的是对PDE方程的求解,常用的仿真软件如COMSOL.ANSYS.ABAQUS的均是对PDE方程的工具.要使求解PDE方程有确定的解,就需要引入一定的条件--定解条件,有时 ...
关于Beta分布、二项分布与Dirichlet分布、多项分布的关系
from:http://blog.csdn.net/u010140338/article/details/41344853 From : http://www.cnblogs.com/wybang/p ...
【LDA学习系列】Dirichlet分布python代码
代码: # -*- coding: utf-8 -*- ''' Created on 2018年5月15日 @author: user @attention: dirichret distributi ...
机器学习知识点(二十八)Beta分布和Dirichlet分布理解
1.二者关系: Dirichlet分布是Beta分布的多元推广.Beta分布是二项式分布的共轭分布,Dirichlet分布是多项式分布的共轭分布. 通常情况下,我们说的分布都是关于某个参数的函数,把对 ...
Latent dirichlet allocation note
2 Latent Dirichlet Allocation Introduction LDA是给文本建模的一种方法,它属于生成模型.生成模型是指该模型可以随机生成可观测的数据,LDA可以随机生成一篇由 ...
Latent dirichlet allocation note -- Prepare
转自莘莘学子blog : http://ljm426.blog.163.com/blog/static/120003220098110425415/ By: Zhou, Blog: http://fo ...

Dirichlet Proscess

Dirichlet Proscess

Dirichlet_tutorial

一、Introduction

二、Gaussian Mixture Model (GMM)

三、Construction of Dirichlet Process

四、Stick-Breaking Construction

五、The nature of Dirichlet distribution

六、Chinese Restaurant Process

Dirichlet Proscess相关推荐

最新文章

热门文章