Week Six

F Score

P = 2 1 P + 1 R = 2 P R P + R \begin{aligned} P &= &\dfrac{2}{\dfrac{1}{P}+\dfrac{1}{R}}\\ &= &2 \dfrac{PR}{P+R} \end{aligned} P​==​P1​+R1​2​2P+RPR​​

Week Seven

Support Vector Machine

Cost Function

min ⁡ θ [ − 1 m ∑ y i ∈ Y , x i ∈ X y i log ⁡ h ( θ T x i ) + ( 1 − y i ) log ⁡ ( 1 − h ( θ T x i ) ) + λ 2 m ∑ θ i ∈ θ θ i 2 ] ⇒ min ⁡ θ [ − ∑ y i ∈ Y , x i ∈ X y i log ⁡ h ( θ T x i ) + ( 1 − y i ) log ⁡ ( 1 − h ( θ T x i ) ) + λ 2 ∑ θ i ∈ θ θ i 2 ] ⇒ min ⁡ θ [ C ∑ y i ∈ Y , x i ∈ X y i log ⁡ h ( θ T x i ) + ( 1 − y i ) log ⁡ ( 1 − h ( θ T x i ) ) + ∑ θ i ∈ θ θ i 2 ] \begin{aligned} &\min_{\theta}\lbrack-\dfrac{1}{m}{\sum_{y_{i}\in Y, x_{i} \in X}{y_{i} \log h(\theta^{T}x_{i})}+(1-y_{i})\log (1-h(\theta^{T}x_{i}))+\dfrac{\lambda}{2m} \sum_{\theta_{i} \in \theta}{\theta_{i}^{2}}}\rbrack\\ &\Rightarrow \min_{\theta}[-\sum_{y_{i} \in Y,x_{i} \in X}{y_{i} \log{h(\theta^{T}x_{i})}+(1-y_{i})\log(1-h(\theta^{T}x_{i}}))+\dfrac{\lambda}{2}\sum_{\theta_{i} \in \theta }{\theta^2_{i}}]\\ &\Rightarrow\min_{\theta}[C\sum_{y_{i} \in Y,x_{i} \in X}{y_{i} \log{h(\theta^{T}x_{i})}+(1-y_{i})\log(1-h(\theta^{T}x_{i}}))+\sum_{\theta_{i} \in \theta }{\theta^2_{i}}]\\ \end{aligned} ​θmin​[−m1​yi​∈Y,xi​∈X∑​yi​logh(θTxi​)+(1−yi​)log(1−h(θTxi​))+2mλ​θi​∈θ∑​θi2​]⇒θmin​[−yi​∈Y,xi​∈X∑​yi​logh(θTxi​)+(1−yi​)log(1−h(θTxi​))+2λ​θi​∈θ∑​θi2​]⇒θmin​[Cyi​∈Y,xi​∈X∑​yi​logh(θTxi​)+(1−yi​)log(1−h(θTxi​))+θi​∈θ∑​θi2​]​
C is somewhat 1 λ \dfrac{1}{\lambda} λ1​.

  • Large C:

    • lower bias, high variance
  • Small C:
    • Higher bias, low variance
  • Large σ 2 \sigma^2 σ2: Features f i f_{i} fi​ vary more smoothly.
    • Higher bias, low variance
  • Small σ 2 \sigma^2 σ2: Features f i f_{i} fi​ vary more sharply.
    • Lower bias, high variance.
      1 2 ∑ θ i ∈ θ θ i 2 s . t θ T x i ≥ 1 , i f y i = 1 θ T x i ≤ − 1 , i f y i = 0 \begin{aligned} & \dfrac{1}{2} \sum_{\theta_{i} \in \theta}{\theta_{i}^2}\\ &s.t&\theta^{T}x_{i} \geq 1, if\ y_{i} = 1&\\ &&\theta^{T}x_{i} \leq -1, if\ y_{i} = 0& \end{aligned} ​21​θi​∈θ∑​θi2​s.t​θTxi​≥1,if yi​=1θTxi​≤−1,if yi​=0​​

PS

If features are too many related to m, use logistic regression or SVM without a kernel.

If n is small, m is intermediate, use SVM with Gaussian kernal.

If n is small, m is large, add more features and use logistic regression or SVM without a kernel.

Week Eight

K-means

Cost Function

It try to minimize
min ⁡ μ 1 m ∑ i = 1 m ∣ ∣ x ( i ) − μ c ( i ) ∣ ∣ 2 \min_{\mu}{\dfrac{1}{m} \sum_{i=1}^{m} ||x^{(i)} - \mu_{c^{(i)}}}||^2 μmin​m1​i=1∑m​∣∣x(i)−μc(i)​∣∣2
For the first loop, minimize the cost function by varing the centorid. For the second loop, it minimize the cost funcion with cetorid fixed and realign the centorid of every x in the training set.

Initialize

Initialize the centorids randomly. Randomly select k samples from the training set and set the centorids to these random selected samples.

It is possible that K-meas fall into the local minimum, So repeat to initialize the centorids randomly until the cost(distortion) is suitable for your purposes.

K-means converge all the time and it will not increase the cost during the training processs. More centoirds will decease the cost, if not, the k-means must fall into the local minimum and reinitialize the centorid until the cost is less.

PCA (Principal Component Analysis)

Restruct x from z meeting the below nonequation
1 − 1 m ∑ i = 1 m ∣ ∣ x ( i ) − x a p p r o x i m a t i o n ( i ) ∣ ∣ 2 1 m ∑ i = 1 m ∣ ∣ x ( i ) ∣ ∣ 2 ≥ 0.99 1-\dfrac{\dfrac{1}{m} \sum_{i=1}^{m}||x^{(i)}-x^{(i)}_{approximation}||^2}{\dfrac{1}{m} \sum_{i=1}^{m} ||x^{(i)}||^2} \geq 0.99 1−m1​∑i=1m​∣∣x(i)∣∣2m1​∑i=1m​∣∣x(i)−xapproximation(i)​∣∣2​≥0.99
PS:
the nonequation can be equal to the below
[ U , S , D ] = s v d ( s i g m a ) U r e d u c e = U ( : , 1 : k ) z = U r e d u c e ′ ∗ x x a p p r o x i m a t i o n = U r e d u c e ∗ x S = ( s 11 0 ⋯ 0 0 s 22 ⋯ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ s n n ) ∑ i = 1 k s i i 2 ∑ i = 1 m s i i 2 ≥ 0.99 \begin{aligned} [U, S, D] &= svd(sigma)\\ U_{reduce} &= U(:, 1:k)\\ z &= U_{reduce}' * x\\ x_{approximation} &= U_{reduce} * x\\\\ S &= \left( \begin{array}{ccc} s_{11}&0&\cdots&0\\ 0&s_{22}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&s_{nn} \end{array} \right)\\\\ \dfrac{\sum_{i=1}^{k}s_{ii}^2}{\sum_{i=1}^{m} s_{ii}^2} &\geq 0.99 \end{aligned} [U,S,D]Ureduce​zxapproximation​S∑i=1m​sii2​∑i=1k​sii2​​​=svd(sigma)=U(:,1:k)=Ureduce′​∗x=Ureduce​∗x=⎝⎜⎜⎜⎛​s11​0⋮0​0s22​⋮0​⋯⋯⋱⋯​00⋮snn​​⎠⎟⎟⎟⎞​≥0.99​

Week Nine

Anomaly Detection

Gaussian Distribution

Multivariate Gaussian Distribution takes the connection of different variants into account
p ( x ) = 1 ( 2 π ) n 2 ∣ Σ ∣ 1 2 e − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) p(x) = \dfrac{1}{(2\pi)^{\frac{n}{2}}|\Sigma|^{\frac{1}{2}}}e^{-\frac{1}{2}(x-\mu)^{T}\Sigma^{-1}(x-\mu)} p(x)=(2π)2n​∣Σ∣21​1​e−21​(x−μ)TΣ−1(x−μ)
Single variant Gaussian Distribution is a special example of Multivariate Gaussian Distribution, where
Σ = ( σ 11 σ 22 ⋱ σ n n ) \Sigma = \left(\begin{array}{ccc} \sigma_{11}&&&&\\ &\sigma_{22}&&&\\ &&\ddots&&\\ &&&\sigma_{nn}&\\ \end{array}\right) Σ=⎝⎜⎜⎛​σ11​​σ22​​⋱​σnn​​​⎠⎟⎟⎞​
When training the Anomaly Detection, we can use Maximum Likelihood Estimation
μ = 1 m ∑ i = 1 m x ( i ) Σ = 1 m ∑ i = 1 m ( x ( i ) − μ ) ( x ( i ) − μ ) T \begin{aligned} \mu &= \dfrac{1}{m} \sum_{i=1}^{m}x^{(i)}\\ \Sigma &= \dfrac{1}{m} \sum_{i=1}^{m} (x^{(i)}-\mu)(x^{(i)}-\mu)^{T} \end{aligned} μΣ​=m1​i=1∑m​x(i)=m1​i=1∑m​(x(i)−μ)(x(i)−μ)T​
When we use single variant anomaly detection, the numerical cost is much cheaper than multivariant. But may need to add some new features to distinguish the normal and non-normal.

Recommender System

Cost Function

J ( X , Θ ) = 1 2 ∑ ( i , j ) : r ( i , j ) = 1 ( ( θ ( j ) ) T x ( i ) − y ( i , j ) ) 2 + λ 2 [ ∑ i = 1 n m ∑ k = 1 n ( x k ( i ) ) 2 + ∑ j = 1 n μ ∑ k = 1 n ( θ k ( j ) ) 2 ] J ( X , Θ ) = 1 2 S u m { ( X Θ ′ − Y ) . ∗ R } + λ 2 ( S u m { Θ . 2 } + S u m { X . 2 } \begin{aligned} J(X,\Theta) = \dfrac{1}{2} \sum_{(i,j):r(i,j)=1}((\theta^{(j)})^{T}x^{(i)}-y^{(i,j)})^2 + \dfrac{\lambda}{2}[\sum_{i=1}^{n_{m}}\sum_{k=1}^{n}(x_k^{(i)})^2 + \sum_{j=1}^{n_\mu} \sum_{k=1}^n(\theta_{k}^{(j)})^2]\\ J(X,\Theta) = \dfrac{1}{2}Sum\{(X\Theta'-Y).*R\} + \dfrac{\lambda}{2}(Sum\{\Theta.^2\} + Sum\{X.^2\}\\ \end{aligned} J(X,Θ)=21​(i,j):r(i,j)=1∑​((θ(j))Tx(i)−y(i,j))2+2λ​[i=1∑nm​​k=1∑n​(xk(i)​)2+j=1∑nμ​​k=1∑n​(θk(j)​)2]J(X,Θ)=21​Sum{(XΘ′−Y).∗R}+2λ​(Sum{Θ.2}+Sum{X.2}​
∂ J ∂ X = ( ( X Θ ′ − Y ) . ∗ R ) Θ + λ X ∂ J ∂ Θ = ( ( X Θ ′ − Y ) . ∗ R ) ′ X + λ Θ \begin{aligned} \dfrac{\partial J}{\partial X} = ((X\Theta'-Y).*R) \Theta + \lambda X\\ \dfrac{\partial J}{\partial \Theta} = ((X\Theta'-Y).*R)'X + \lambda \Theta \end{aligned} ∂X∂J​=((XΘ′−Y).∗R)Θ+λX∂Θ∂J​=((XΘ′−Y).∗R)′X+λΘ​

Machine Learning on Coursera相关推荐

  1. IBM Machine Learning学习笔记(一)——Exploratory Data Analysis for Machine Learning

    数据的探索性分析 1. 读入数据 (1)csv文件读取 (2)json文件读取 (3)SQL数据库读取 (4)Not-only SQL (NoSQL)读取 (5)从网络中获取 2. 数据清洗 (1)缺 ...

  2. Machine Learning - Andrew Ng on Coursera (Week 6)

    本篇文章将分享Coursera上Andrew Ng的Machine Learning第六周的课程,主要内容有如下,详细内容可以参考文末附件: 评价机器学习算法 Diagnosing bias vs. ...

  3. Machine Learning - Andrew Ng on Coursera (Week 5)

    本篇文章将分享Coursera上Andrew Ng的Machine Learning第五周的课程,主要内容有如下,详细内容可以参考文末附件: 代价函数及后向算法 Cost function(代价函数) ...

  4. Machine Learning - Andrew Ng on Coursera (Week 4)

    本篇文章将分享Coursera上Andrew Ng的Machine Learning第四周的课程,主要内容有如下,详细内容可以参考文末附件: 动机 神经网络 应用 动机 为什么要引入神经网络?在分类问 ...

  5. Machine Learning - Andrew Ng on Coursera (Week 3)

    本篇文章将分享Coursera上Andrew Ng的Machine Learning第三周的课程,主要内容有如下,详细内容可以参考文末附件: 分类问题及模型表示 逻辑回归模型 多类别的分类问题 解决过 ...

  6. Machine Learning - Andrew Ng on Coursera (Week 2)

    本篇文章将分享Coursera上Andrew Ng的Machine Learning第二周的课程,主要内容有如下,详细内容可以参考文末附件: 设置作业环境 多变量线性回归 参数的解析算法 Octave ...

  7. Machine Learning - Andrew Ng on Coursera (Week 1)

    转载自:http://1.kaopuer.applinzi.com/?p=110 今天分享了Coursera上Andrew Ng的Machine Learning第一周的课程,主要内容有如下,详细内容 ...

  8. Coursera公开课笔记: 斯坦福大学机器学习第十一课“机器学习系统设计(Machine learning system design)”

    Coursera公开课笔记: 斯坦福大学机器学习第十一课"机器学习系统设计(Machine learning system design)" 斯坦福大学机器学习斯坦福大学机器学习第 ...

  9. Coursera 吴恩达《Machine Learning》视频 + 作业

    红色石头的个人网站:www.redstonewill.com 吴恩达(Andrew Ng)在 Coursera 上开设的机器学习入门课<Machine Learning>,授课地址是: C ...

最新文章

  1. day04-html
  2. OpenCASCADE:读取和写入 IGES
  3. 基于京东手机销售数据用回归决策树预测价格
  4. 最简单,最明了,看了就会的VScode和C++的配置!(Visual Studio Code)
  5. 15个只有数学老师懂的泪流满面瞬间
  6. python闹钟界面程序_「Python编程」自由管理时间之编写一个小闹钟起床
  7. Redis 通配符查找及批量删除key
  8. 【52.55%】【BZOJ 4520】K远点对
  9. Microsoft JET Database Engine 错误 '80040e09' 解决方法
  10. 健身 赚钱 ; 旅行 用心爱一个人就行了 其他的都会开挂来临~
  11. 基于JavaWeb的小区车辆信息管理系统
  12. 手把手教你自制一寸两寸照
  13. 到底程序员的工资有多高?你不了解的程序员!
  14. 传统密码学(三)——转轮密码机
  15. 专业技术问题:UI设计师岗位面试反馈的常见问题
  16. Ant Design Pro从零到一(认识AntD)
  17. 首届中阳验方节即将举办,失传多年的国宝级验方重见天日
  18. 保时捷狂推NFT,高调喊出打造Web3社区,Web2品牌“天生缺陷”终将折戟沉沙?...
  19. 华硕电脑装linux黑屏,华硕电脑更新显卡后开机黑屏应该怎么解决
  20. 铜陵C语言培训,铜陵学院c语言程序设计报告答案

热门文章

  1. 编程之美 - 电话号码对应英语单词
  2. OSG使用GLSL各个版本例子
  3. 多账号管理如何进行有效运营
  4. QT - QT中配置MSVC编译环境 以及 VS中配置QT开发环境
  5. 短视频如何进行种草?三种方式别忘记,帮你获得用户喜爱
  6. 【数理逻辑】命题逻辑 ( 等值演算 | 幂等律 | 交换律 | 结合律 | 分配律 | 德摩根律 | 吸收率 | 零律 | 同一律 | 排中律 | 矛盾律 | 双重否定率 | 蕴涵等值式 ... )
  7. 首席新媒体运营商学院创始人黎想:如何做“新客户新粉丝”的快速拉新
  8. 搜索和查询html代码,html查找框功能
  9. Pycharm中如何修改光标大小?
  10. linux 在指定行后写入文件内容,linux命令行下将指定的几行内容写入到一个文件中...