大小限制的聚类问题

1 主要方法

# 导入库
from size_constrained_clustering import fcm, equal, minmax, shrinkage,da
# by default it is euclidean distance, but can select others
from sklearn.metrics.pairwise import haversine_distances
from sklearn.datasets import make_blobs
import numpy as np
import matplotlib.pyplot as plt

1.1 Fuzzy C-means Algorithm

和KMeans类似，不过利用了归属概率（membership probability）进行计算，而不是直接的0或者1

n_samples = 2000
n_clusters = 4
centers = [(-5, -5), (0, 0), (5, 5), (7, 10)]
X, _ = make_blobs(n_samples=n_samples, n_features=2, cluster_std=1.0,centers=centers, shuffle=False, random_state=42)
#生成数据集，每个主句两个特征值，一共2000个样本，四个分类，同时设定了聚类中心点的位置model = fcm.FCM(n_clusters)# use other distance function: e.g. haversine distance
# model = fcm.FCM(n_clusters, distance_func=haversine_distances)model.fit(X)centers = model.cluster_centers_
'''
array([[ 0.06913083,  0.07352352],[-5.01038079, -4.98275774],[ 6.99974221, 10.01169349],[ 4.98686053,  5.0026792 ]])
模型拟合之后，样本聚类的中心点
'''labels = model.labels_
#模型拟合后，每个样本的类别


plt.figure(figsize=(10,10))colors=['red','green','blue','yellow']for i,color in enumerate(colors):color_tmp=np.where(labels==i)[0]plt.scatter(X[color_tmp,0],X[color_tmp,1],c=color,label=i)plt.legend()
plt.scatter(centers[:,0],centers[:,1],s=1000,c='black')

1. 2 Same Size Contrained KMeans Heuristics

利用启发式的方法获取等大聚类结果

n_samples = 2000
n_clusters = 4
X = np.random.rand(n_samples, 2)
# use minimum cost flow framework to solve
model = equal.SameSizeKMeansHeuristics(n_clusters)model.fit(X)
centers = model.cluster_centers_
labels = model.labels_

import matplotlib.pyplot as plt
plt.figure(figsize=(10,10))
colors=['red','green','blue','yellow']
for i,color in enumerate(colors):color_tmp=np.where(labels==i)[0]plt.scatter(X[color_tmp,0],X[color_tmp,1],c=color,label=i)
plt.legend()
plt.scatter(centers[:,0],centers[:,1],s=1000,c='black')

1.3 Same Size Contrained KMeans Inspired by Minimum Cost Flow Problem：

将聚类转换为分配问题，并用最小费用流的思路进行求解

n_samples = 2000
n_clusters = 4
X = np.random.rand(n_samples, 2)
# use minimum cost flow framework to solve
model = equal.SameSizeKMeansMinCostFlow(n_clusters)model.fit(X)
centers = model.cluster_centers_
labels = model.labels_

plt.figure(figsize=(10,10))
colors=['red','green','blue','yellow']for i,color in enumerate(colors):color_tmp=np.where(labels==i)[0]plt.scatter(X[color_tmp,0],X[color_tmp,1],c=color,label=i)plt.legend()
plt.scatter(centers[:,0],centers[:,1],s=1000,c='black')

1.4 Minimum and Maximum Size Constrained KMeans Inspired by Minimum Cost Flow Problem

将聚类转换为分配问题，并用最小费用流的思路进行求解，加入最小和最大聚类规模限制

n_samples = 2000
n_clusters = 4
X = np.random.rand(n_samples, 2)
# use minimum cost flow framework to solvemodel = minmax.MinMaxKMeansMinCostFlow(n_clusters, size_min=200,   size_max=800)model.fit(X)
centers = model.cluster_centers_
labels = model.labels_plt.figure(figsize=(10,10))
colors=['red','green','blue','yellow']
for i,color in enumerate(colors):color_tmp=np.where(labels==i)[0]plt.scatter(X[color_tmp,0],X[color_tmp,1],c=color,label=i)
plt.legend()
plt.scatter(centers[:,0],centers[:,1],s=1000,c='black')

1.5 Deterministic Annealling Algorithm:

输入目标每类规模比例，获得相应聚类规模的结果。

n_samples = 2000
n_clusters = 4
X = np.random.rand(n_samples, 2)
# use minimum cost flow framework to solve
model =  da.DeterministicAnnealing(n_clusters, distribution=[0.1, 0.2,0.4, 0.3])model.fit(X)
centers = model.cluster_centers_
labels = model.labels_plt.figure(figsize=(10,10))
colors=['red','green','blue','yellow']
for i,color in enumerate(colors):color_tmp=np.where(labels==i)[0]plt.scatter(X[color_tmp,0],X[color_tmp,1],c=color,label=i)
plt.legend()
plt.scatter(centers[:,0],centers[:,1],s=1000,c='black')

python 笔记 size-constrained-clustering （对类别大小做限制的聚类问题）相关推荐

python输出字体的大小_Toby的Python笔记 | 预备知识：安装openpyxl学做电子表格
Toby的Python笔记 | 预备知识:安装openpyxl学做电子表格 Python 需要创建和读取excel表里面的数据,需要用 openpyxl 这个包,今天安装好备用. 首先,进入C命令窗口 ...
好全面的python笔记，那我就笑纳了
注:本笔记基于python2.6而编辑,尽量的偏向3.x的语法 Python的特色 1.简单 2.易学 3.免费.开源 4.高层语言: 封装内存管理等 5.可移植性: 程序如果避免使用依赖于系统的特性 ...
【Python笔记】pyqt5进度条-多线程图像分块处理防止窗体卡顿
目录主要功能环境配置实现过程 1.设计ui 主界面弹出框窗体文件 2.主体实现打开文件计算函数代码附录 title.ui titleok.ui title.py titleok.py ...
Python笔记（6）数字
Python笔记(6) 数字 1. Number 数据类型 2. 数值类型 3. 类型转换 4. 数学函数 5. 随机数函数 6. 三角函数 7. 数学常量 1. Number 数据类型 Python ...
初学者python笔记（内置函数_2）
这篇初学者笔记是接着上一篇初学者python笔记(内置函数_1)的.同样都是介绍Python中那些常用内置函数的. max()和min()的高级用法我们都知道,max():取最大值,min():取最 ...
Python笔记相关
Python 2021.9.9 Turtle官方文档货币兑换 money=input("请输入货币符号($/￥)和金额:") while 1+1==2:if money[0] i ...
python笔记: numpy matrix 随机抽取几行或几列
python笔记: numpy matrix 随机抽取几行或几列随机取几行随机取几列 tips 1.生成array 2.array的大小 3.打乱array的2种类似方法, 矩阵为多行时默认打乱行 ...
python笔记11面向对象
python笔记11面向对象思想先声明一下各位大佬,这是我的笔记. 如有错误,恳请指正. 另外,感谢您的观看,谢谢啦! 面向对象 :将数据与函数绑定在一起,进行封装,减少重复代码的重写过程面向过 ...
Python笔记（一）
Python笔记(一) 从HelloWorld开始学习吧 HelloWorld.py Python中的注释 Python中变量名起名规范 Python中的输出以及格式化 Python中的数据类型 Nu ...

python 笔记 size-constrained-clustering （对类别大小做限制的聚类问题）

1 主要方法

1.1 Fuzzy C-means Algorithm

1. 2 Same Size Contrained KMeans Heuristics

1.3 Same Size Contrained KMeans Inspired by Minimum Cost Flow Problem：

1.4 Minimum and Maximum Size Constrained KMeans Inspired by Minimum Cost Flow Problem

1.5 Deterministic Annealling Algorithm:

python 笔记 size-constrained-clustering （对类别大小做限制的聚类问题）相关推荐

最新文章

热门文章