faiss中的索引基于几个基础算法构建,只不过在faiss中是一种高效的实现。分别是

  • k-means聚类
  • PCA降维
  • PQ编码
  • 解码

k-means聚类

关于k-means聚类算法在之前的博客《OpenCV中的Kmeans聚类》中有介绍过,是一种无监督算法,自动的基于设定的聚类中心数进行聚类。

# 导入faiss
import sys
import faiss
import numpy as np # 构建数据
d = 512  # 维数
n_data = 2000
np.random.seed(0)
data = []
mu = 3
sigma = 0.1
for i in range(n_data):data.append(np.random.normal(mu, sigma, d))data = np.array(data).astype('float32')# 聚类
ncentroids = 100  # 设定聚类中心100个
niter = 20
verbose = True
d = data.shape[1]
kmeans = faiss.Kmeans(d, ncentroids, niter=niter, verbose=verbose)
kmeans.train(data)# 输出聚类中心
print(kmeans.centroids)
print(kmeans.centroids.shape)  # 聚类中心维度
[[3.0195723 3.0102541 3.0154822 ... 3.0012543 2.983653  2.9833684][2.963544  2.9899907 2.9942687 ... 2.989136  2.9432213 2.9964879][3.038704  2.9829412 3.0110557 ... 2.9955654 3.0334764 3.0621648]...[3.1335988 3.0283628 3.0185556 ... 2.9817123 2.8893135 2.846763 ][3.0277164 2.9488263 2.9712245 ... 3.046851  3.0523338 2.9560134][2.9731872 2.9688954 3.0551343 ... 3.0077536 3.038112  3.0422602]]
(100, 512)

结果保存在kmeans对象中:

  • kmeans.centroids:聚类中心
  • kmeans.obj:迭代时候目标函数的值(总方差)

更多参数选择:

  • nredo:运行此次数的群集,并保持最佳质心(根据聚类目标选择)

  • verbose:使聚类更详细

  • spherical:执行球形K-means,每次迭代后,质心为L2标准化

  • INT_CORDROID:圆形心质坐标取整

  • update_index:每次迭代后重新训练索引?

  • min_points_per_centroid / max_points_per_centroid:以上训练集是分支的

  • seed:随机数发生器的种子

计算某个向量属于哪一个子类,返回聚类中心次序和L2距离

D, I = kmeans.index.search(data[:5], 1)
print(D)
print(I)
[[5.027096 ][4.712036 ][4.7764983][4.5557623][4.403679 ]][[61][44][61][49][83]]
# 返回距离每个聚类中心最近的5个向量
index = faiss.IndexFlatL2 (d)
index.add(data)
D, I = index.search (kmeans.centroids, 5)
print(D)
print(I)
[[3.8203125e+00 4.3466797e+00 4.3652344e+00 4.3691406e+00 4.3789062e+00][3.6269531e+00 4.0800781e+00 4.1953125e+00 4.2304688e+00 4.2705078e+00][3.7167969e+00 4.0781250e+00 4.1015625e+00 4.1796875e+00 4.2685547e+00][3.4667969e+00 3.8759766e+00 4.1923828e+00 4.4365234e+00 4.5742188e+00][0.0000000e+00 9.0585938e+00 9.0957031e+00 9.1376953e+00 9.1484375e+00][3.7412109e+00 4.3515625e+00 4.4199219e+00 4.5517578e+00 4.5849609e+00][3.8154297e+00 4.2207031e+00 4.4433594e+00 4.5625000e+00 4.5986328e+00][3.5927734e+00 3.9169922e+00 4.3535156e+00 4.4375000e+00 4.4453125e+00][3.9101562e+00 4.3496094e+00 4.3554688e+00 4.4189453e+00 4.4384766e+00][0.0000000e+00 8.6308594e+00 8.8496094e+00 8.9980469e+00 9.0107422e+00][3.8427734e+00 4.2568359e+00 4.3339844e+00 4.3779297e+00 4.3789062e+00][3.5175781e+00 4.0654297e+00 4.1015625e+00 4.2148438e+00 4.2558594e+00][2.3457031e+00 2.3486328e+00 6.8046875e+00 6.8427734e+00 6.9658203e+00][9.7656250e-04 8.8378906e+00 8.9287109e+00 8.9941406e+00 9.0283203e+00][3.7519531e+00 4.3740234e+00 4.5185547e+00 4.5371094e+00 4.6191406e+00][0.0000000e+00 8.5371094e+00 8.6289062e+00 8.6933594e+00 8.9248047e+00][3.5009766e+00 4.2011719e+00 4.2431641e+00 4.3730469e+00 4.4199219e+00][3.3779297e+00 3.6240234e+00 3.6523438e+00 3.7822266e+00 3.9042969e+00][3.7656250e+00 4.2851562e+00 4.3125000e+00 4.5205078e+00 4.5849609e+00][3.7421875e+00 4.2714844e+00 4.4677734e+00 4.5693359e+00 4.5761719e+00][3.8076172e+00 4.3037109e+00 4.3144531e+00 4.3457031e+00 4.3574219e+00][0.0000000e+00 8.8740234e+00 8.9199219e+00 8.9599609e+00 8.9638672e+00][3.3349609e+00 3.7226562e+00 3.7910156e+00 4.0488281e+00 4.3642578e+00][2.9335938e+00 3.1669922e+00 3.4218750e+00 5.9628906e+00 6.0009766e+00][2.2109375e+00 2.2128906e+00 6.5175781e+00 6.7275391e+00 6.8320312e+00][2.0791016e+00 2.0800781e+00 6.6953125e+00 6.7089844e+00 6.7255859e+00][0.0000000e+00 8.6171875e+00 8.6777344e+00 9.0263672e+00 9.0664062e+00][9.7656250e-04 8.8359375e+00 8.8808594e+00 8.9208984e+00 8.9423828e+00][2.4062500e+00 2.4082031e+00 6.8105469e+00 6.9472656e+00 7.0068359e+00][9.7656250e-04 8.6279297e+00 8.7968750e+00 8.9531250e+00 8.9550781e+00][0.0000000e+00 8.9794922e+00 9.0263672e+00 9.0439453e+00 9.0703125e+00][3.2343750e+00 3.5175781e+00 3.5957031e+00 3.9169922e+00 5.3691406e+00][2.2929688e+00 2.2949219e+00 6.6669922e+00 6.6992188e+00 6.7089844e+00][3.8671875e+00 4.3222656e+00 4.3359375e+00 4.3564453e+00 4.4472656e+00][3.7900391e+00 4.3183594e+00 4.3662109e+00 4.3964844e+00 4.5839844e+00][3.7949219e+00 4.3251953e+00 4.3828125e+00 4.3964844e+00 4.4072266e+00][0.0000000e+00 8.5292969e+00 8.8564453e+00 8.8642578e+00 8.8750000e+00][2.8613281e+00 3.0986328e+00 3.3378906e+00 6.0507812e+00 6.0644531e+00][2.9062500e+00 2.9882812e+00 3.3222656e+00 5.8398438e+00 6.0195312e+00][3.7910156e+00 4.1953125e+00 4.2392578e+00 4.3154297e+00 4.3330078e+00][3.8828125e+00 4.2753906e+00 4.3330078e+00 4.4433594e+00 4.4492188e+00][3.4697266e+00 4.0683594e+00 4.3691406e+00 4.4130859e+00 4.4150391e+00][3.8027344e+00 4.3623047e+00 4.4492188e+00 4.4511719e+00 4.5117188e+00][3.8144531e+00 4.4375000e+00 4.5009766e+00 4.5097656e+00 4.5244141e+00][3.7324219e+00 4.3505859e+00 4.3671875e+00 4.3886719e+00 4.4052734e+00][3.5830078e+00 4.1464844e+00 4.2402344e+00 4.2773438e+00 4.4667969e+00][3.3808594e+00 3.9414062e+00 4.0000000e+00 4.1132812e+00 4.1464844e+00][2.9091797e+00 3.1630859e+00 3.4140625e+00 5.7685547e+00 5.8164062e+00][0.0000000e+00 9.0380859e+00 9.0937500e+00 9.0957031e+00 9.1191406e+00][3.6074219e+00 4.2148438e+00 4.2587891e+00 4.3574219e+00 4.3632812e+00][9.7656250e-04 8.8525391e+00 8.9111328e+00 9.0576172e+00 9.1298828e+00][2.3740234e+00 2.3750000e+00 6.7568359e+00 6.9619141e+00 7.0683594e+00][2.2177734e+00 2.2197266e+00 6.6367188e+00 6.7744141e+00 6.8906250e+00][1.9531250e-03 8.6640625e+00 9.1230469e+00 9.1894531e+00 9.2167969e+00][0.0000000e+00 8.7958984e+00 9.0039062e+00 9.0644531e+00 9.0917969e+00][2.8320312e+00 3.1093750e+00 3.2480469e+00 5.9785156e+00 6.0566406e+00][3.8183594e+00 4.3847656e+00 4.4121094e+00 4.4179688e+00 4.4355469e+00][2.4335938e+00 2.4355469e+00 6.8212891e+00 6.8222656e+00 6.8789062e+00][0.0000000e+00 8.7382812e+00 8.8437500e+00 8.9365234e+00 9.0361328e+00][3.8437500e+00 4.3125000e+00 4.3320312e+00 4.3818359e+00 4.4121094e+00][3.3281250e+00 3.5781250e+00 3.9941406e+00 4.0322266e+00 4.4687500e+00][3.7285156e+00 4.1611328e+00 4.2333984e+00 4.2851562e+00 4.3378906e+00][3.7382812e+00 4.1406250e+00 4.3603516e+00 4.4218750e+00 4.4697266e+00][2.2978516e+00 2.2998047e+00 6.7763672e+00 6.7939453e+00 6.7998047e+00][3.7988281e+00 4.3564453e+00 4.4707031e+00 4.5048828e+00 4.5126953e+00][3.7900391e+00 4.3642578e+00 4.4472656e+00 4.4902344e+00 4.4931641e+00][2.1865234e+00 2.1894531e+00 6.5410156e+00 6.6201172e+00 6.7519531e+00][2.2285156e+00 2.2285156e+00 6.6171875e+00 6.6210938e+00 6.6757812e+00][3.8369141e+00 4.2773438e+00 4.3076172e+00 4.4609375e+00 4.4980469e+00][2.9296875e-03 8.6015625e+00 8.6787109e+00 8.8613281e+00 8.9052734e+00][3.8437500e+00 4.3750000e+00 4.4257812e+00 4.4609375e+00 4.4843750e+00][3.8623047e+00 4.2587891e+00 4.2812500e+00 4.3554688e+00 4.3710938e+00][3.6523438e+00 3.9287109e+00 3.9990234e+00 4.1103516e+00 4.1933594e+00][0.0000000e+00 8.9931641e+00 9.1396484e+00 9.1943359e+00 9.2109375e+00][2.2480469e+00 2.2480469e+00 6.8378906e+00 6.9013672e+00 6.9023438e+00][3.7441406e+00 4.4804688e+00 4.5458984e+00 4.5761719e+00 4.5996094e+00][3.6289062e+00 4.0683594e+00 4.3623047e+00 4.4296875e+00 4.4677734e+00][3.7363281e+00 4.3417969e+00 4.3515625e+00 4.4511719e+00 4.4960938e+00][0.0000000e+00 8.7128906e+00 8.9687500e+00 9.0869141e+00 9.1064453e+00][3.4726562e+00 3.6914062e+00 4.0546875e+00 4.0927734e+00 4.4355469e+00][3.5927734e+00 4.3212891e+00 4.3271484e+00 4.4785156e+00 4.5058594e+00][3.1914062e+00 3.4531250e+00 3.5703125e+00 3.5722656e+00 5.5263672e+00][3.7792969e+00 4.2929688e+00 4.3574219e+00 4.3710938e+00 4.3955078e+00][3.7207031e+00 4.0117188e+00 4.1591797e+00 4.4082031e+00 4.4091797e+00][2.3359375e+00 2.3359375e+00 6.6621094e+00 6.9716797e+00 7.1054688e+00][3.6054688e+00 4.1835938e+00 4.1933594e+00 4.3076172e+00 4.3876953e+00][3.0058594e+00 3.3808594e+00 3.4121094e+00 5.8144531e+00 5.9804688e+00][3.6806641e+00 4.0449219e+00 4.0625000e+00 4.2500000e+00 4.3759766e+00][2.2041016e+00 2.2070312e+00 6.7275391e+00 7.0244141e+00 7.0302734e+00][2.8291016e+00 3.2236328e+00 3.3925781e+00 5.7548828e+00 5.9121094e+00][3.8447266e+00 4.2128906e+00 4.3095703e+00 4.3681641e+00 4.3886719e+00][3.0585938e+00 3.7578125e+00 3.8242188e+00 3.8652344e+00 5.5468750e+00][3.5449219e+00 3.8789062e+00 4.0400391e+00 4.1953125e+00 4.2910156e+00][3.5341797e+00 4.1005859e+00 4.3300781e+00 4.3359375e+00 4.3554688e+00][3.6171875e+00 4.1240234e+00 4.1435547e+00 4.3378906e+00 4.3505859e+00][3.7167969e+00 4.3369141e+00 4.4003906e+00 4.4765625e+00 4.5156250e+00][3.7050781e+00 4.2529297e+00 4.2763672e+00 4.3886719e+00 4.4355469e+00][3.2031250e+00 3.4453125e+00 3.5498047e+00 3.7802734e+00 5.4550781e+00][3.3593750e+00 3.8710938e+00 4.1171875e+00 4.1445312e+00 4.5146484e+00][3.6142578e+00 3.8007812e+00 4.0078125e+00 4.2519531e+00 4.3789062e+00]]
[[1083  277 1624 1555  981][1411  198  620  414 1664][ 140   28  317 1915 1597][1704 1252 1745 1906  126][ 867 1055  308 1483  788][1882 1937 1568  981 1695][1885 1826  481   37  779][1725 1417   19 1356 1785][1801  981 1624  145 1524][1011  835  700  650  148][1496  650  681 1624  518][ 780  545  881   61  101][1428 1042 1489  389  511][ 454  259  329 1336  444][1273 1727  106  981 1438][1027 1686  106  121  835][1374  287 1825  637   35][1342  288 1146 1958 1348][ 528 1678 1167  592  145][ 440  981  337  402 1168][1174  773 1686  981  519][ 294   30  685  655  133][ 607 1512  623 1262 1960][1191  668 1140 1100  724][1854 1851  121  681  276][ 781 1927  704  610 1417][1708  896 1707 1172 1331][ 908 1558 1116  519 1686][  88 1848 1624  277  773][ 111  981 1417 1666  464][1961 1116 1664 1185  228][ 368 1905  890 1642  604][1607 1376  163   48 1893][1653  981  427 1523  655][1087 1353   31 1120  832][1800  981  413 1624  106][1347  182  740 1987 1618][  58 1404  105  981  604][1455   81  170  198  685][ 685  277  981  145  879][1850 1334  981  277 1886][1758  922 1600 1506 1192][ 269  981 1483 1162  277][1125 1624  106 1907 1401][1504  511  981  106   28][  62 1377  222 1839  128][1205  423  939  862 1824][1124  216 1104  145 1353][ 435  586 1660  456  269][ 740 1181 1578 1743  155][1809 1571  608  497 1956][1050 1203 1624 1536 1666][1171  307  156  906  790][ 658 1241 1337  804 1540][ 425  629 1318  277  427][ 783 1449  419  121  650][  66  981 1459  277  315][1856 1114 1055  940 1558][  87 1374  916 1707 1057][ 712  981  182  277  106][1574  329  136  197  971][ 879  981 1624  277  106][ 568  389 1241  981  560][1528 1697 1008  725 1523][1255  835 1384  981 1115][1912  981  548   94  277][1004 1048 1558  574  182][1299 1418 1624  121 1470][1070 1384  981  145 1624][ 514  879 1949 1540  329][1876  981  277 1459  650][ 627 1008  981  963  529][1699 1132 1158 1431 1837][1843  902  402 1575 1304][ 340 1190  842  504 1558][ 453  457  648 1079  234][ 344  410   18  470 1877][ 124  277  981 1624 1555][ 275 1237 1381 1439  875][ 793 1249  142  575  664][ 711  952 1751 1229  619][1722 1141 1672  917 1654][1540 1707  277  879  103][1176  574 1459    4  256][ 584 1092  829  681  291][ 687  874 1644  646  535][1577 1715 1239 1623  259][1897 1130 1179 1553 1587][ 479  841  122   28 1561][1921 1343  109  518 1686][  78  981  277  896  145][1736  229 1453 1157  542][1858  610  616 1144    9][ 741 1895 1400  131  836][1774  586 1470   56  172][ 249 1554  277  613  981][ 159 1970  255  605  471][1392  909  228  104  121][ 666  540  403 1804  372][ 244   99 1705 1331  231]]

PCA降维

mat = faiss.PCAMatrix (512, 64)  # 从512维降为64维
mat.train(data)
print(data.shape)
assert mat.is_trained
tr = mat.apply_py(data)
print(tr.shape)
(2000, 512)
(2000, 64)

可以看到维度从512维度降低到64维度

PQ编码/解码

d = 512  # 数据维度
cs = 4  # code size (bytes)# 训练数据集
xt = data# dataset to encode (could be same as train)
x = datapq = faiss.ProductQuantizer(d, cs, 8)
pq.train(xt)  # 训练# encode编码
codes = pq.compute_codes(x)# decode解码
x2 = pq.decode(codes)# 编码-解码后与原始数据的差
avg_relative_error = ((x - x2)**2).sum() / (x ** 2).sum()
print(avg_relative_error)
0.0008764966

标量量化器(scalar quantizer)与之类似

d = 512  # 数据维度# 训练集
xt = data# dataset to encode (could be same as train)
x = data# QT_8bit allocates 8 bits per dimension (QT_4bit also works)
sq = faiss.ScalarQuantizer(d, faiss.ScalarQuantizer.QT_8bit)
sq.train(xt)# encode 编码
codes = sq.compute_codes(x)# decode 解码
x2 = sq.decode(codes)# 计算编码-解码后与原始数据的差
avg_relative_error = ((x - x2)**2).sum() / (x ** 2).sum()
print(avg_relative_error)
6.7287445e-08

参考

  • https://github.com/facebookresearch/faiss/wiki/Faiss-building-blocks:-clustering,-PCA,-quantization
  • https://github.com/liqima/faiss_note

faiss-3: 基础模块相关推荐

  1. 三, for循环----文件对象-----函数基础------模块基础

    for循环 文件对象 函数基础 模块基础 一,for 循环 1,for循环详解 二.文件对象 1,文件打开方法 2,文件输入 3,文件输出 4,操作文件 三,函数基础 1,函数基本操作 2,函数参数 ...

  2. 负数比较大小_小升初总复习第七个基础模块:负数的认识

    今天我们开始第七个基础模块的复习:负数的认识 负数的认识一共分为3个知识考点. 第一,正负数的意义 以前我们学过的6,10,7,85.....这样大于0的数叫做正数.正数前面也可以 加"+& ...

  3. 计算机应用基础模块4客观题,国开河北[课程号]00815《计算机应用基础》模块4PowerPoint2010电子演示文稿系统——客观题辅导答案...

    学校:国家开放大学河北 [课程号]00815-计算机应用基础-模块4 PowerPoint 2010 电子演示文稿系统--客观题试卷满分:100分得分:100 1.在poewrpoint中," ...

  4. 天津市职高高一计算机试题及答案,职高(中职)数学(基础模块)上册题库.doc

    职高(中职)数学(基础模块)上册题库.doc 文档编号:388585 文档页数:13 上传时间: 2018-08-30 文档级别:精品资源 文档类型:doc 文档大小:653.00KB 宁波至达教育 ...

  5. 分数怎么化成带分数_小升初数学总复习第三个基础模块:分数的认识

    今天我们开始小升初数学总复习第三个基础模块的复习:分数的认识 分数的认识一共分为8个知识考点. 第一,分数的意义 把单位"1".平均分成若干份,表示这样的一份或者几份的数叫做分数. ...

  6. ASP.NET MVC +EasyUI 权限设计(三)基础模块

    请注明转载地址:http://www.cnblogs.com/arhat 在上一章中呢,我们基本上搭建好了环境,那么本章我们就从基础模块开始写起.由于用户,角色,动作三个当中,都是依赖与动作的,所以本 ...

  7. 中职计算机英语课件ppt,语文版中职英语(基础模块 上册)Unit 7《Computers》ppt课件1.ppt...

    语文版中职英语(基础模块 上册)Unit 7<Computers>ppt课件1.ppt 文档编号:114197 文档页数:48 上传时间: 2018-04-04 文档级别:精品资源 文档类 ...

  8. JAVASE基础模块十五(StringBuffer类)

    JAVASE基础模块十五(StringBuffer类) public class Stbuffer { public static void main(String[] args) { //总共创建五 ...

  9. JAVASE基础模块三十四( 菜鸡版简单登录验证模块系统IO流文件写入)

    JAVASE基础模块三十四( 菜鸡版简单登录验证模块系统IO流文件写入) 开发流程:需求文档 接口文档 效果图 开发环境统一:JDK1.8 IDEA win10 GIT SSM SSH SpringB ...

  10. 00815 计算机基础,国开(山东)00815-计算机应用基础-模块1 windows 7 操作系统——客观题-辅导资料...

    国开(山东)00815-计算机应用基础-模块1 windows 7 操作系统--客观题-辅导资料 (9页) 本资源提供全文预览,点击全文预览即可全文预览,如果喜欢文档就下载吧,查找使用更方便哦! 14 ...

最新文章

  1. 在一个无序的int数组上构建一个最小堆的时间复杂度_漫画:寻找无序数组的第k大元素(修订版)...
  2. java通过maven构建项目实现日志生成模拟(三)通过logback 打印日志
  3. mysql5.17免安装教程_详细介绍MySql5.7.17免安装配置教程的示例代码
  4. 曝苹果iPhone 13将于8月开始大批量产,恶劣天气不会造成影响
  5. Python 3.6模拟输入并爬取百度前10页密切相关链接
  6. Sklearn之datasets和训练
  7. 秒杀系统设计中的业务性思考
  8. HTML5实现扫描识别二维码/生成二维码
  9. C#编程如何安装Nupkg包
  10. 润乾报表-单元格函数
  11. 解析eas webservice
  12. setw和width
  13. mac 删除磁盘图标_如何在Mac上更改硬盘图标
  14. 书城项目第三阶段及其源码
  15. vue引入 wps在线编辑版
  16. 隐私合规:检测第三方SDK调用的隐私权限
  17. 轻量应用服务器 饥荒,饥荒服务端世界设置生成
  18. 资深WEB前端开发(社区)
  19. 搜狐狐友营销的十二个办法
  20. 解决Oracle MySQL 安全漏洞(CVE-2021-22570)

热门文章

  1. 使用pascal voc训练测试faster rcnn
  2. android 音量调节不起作用,Android音量控制
  3. Swift游戏实战-跑酷熊猫 09 移除场景之外的平台
  4. CTO、技术总监、首席架构师的区别
  5. 【论文解读--Xdog】(宇树科技鼻祖)新型电驱式四足机器人研制与测试
  6. 5ic计算机考试考卷读取错误,北京自考出现错误试卷
  7. struts2与常用表格ajax操作的json传值问题
  8. 【趣读官方文档】1.管家的抉择 (Android进程生命周期)
  9. ZOJ 3987 Numbers(Java枚举)
  10. 解决-redis保存的中文变为unicode, redis的基本操作