《python 机器学习算法-logistics regression》

训练data以及源文件（python2.x）见作者（赵志勇）的github:

https://github.com/zhaozhiyong19890102/Python-Machine-Learning-Algorithm

以下的文件是修改过的，适用于 python 3.x

1. 训练文件 train.py

# coding:UTF-8import numpy as npdef sig(x):'''sigmoid function:param x::return:'''return 1.0/(1+np.exp(-x)) # 1.0 equals to 1, the calculation returns a doubledef error_rate(h, label):'''compute lost fuction value:param h::param label::return:'''m = np.shape(h)[0] # return the row of h; shape() return the size of hsum_err = 0.0for i in range(m):if h[i, 0] > 0 and (1-h[i,0])>0:temp = h[i,0] + (1-label[i,0]) * np.log(1-h[i,0])if temp < 0 : # input of log should be positive; m-=1continuesum_err -= (label[i,0]*np.log(temp))else:sum_err -= 0return sum_err / mdef lr_train_bgd(feature, label, maxCycle, alpha):''' gradient descent:param feature::param label::param maxCycle::param alpha::return:'''n = np.shape(feature)[1] # shape return the size of feature, [rows, cols]; [1] means set n as the second element of the return list;w = np.mat(np.ones((n,1))) # new matrix, rows = n, cols = 1; initialize w as onesi = 0while i<= maxCycle:i += 1# feature size should be (m,n), w size is (n, 1), h size is (m, 1);# m is the number of sample, n is the dimension of a sample;h = sig(feature * w) err = label - h # err size is (m, 1)if i % 100 == 0:print("\t--------iter=" + str(i) + \", train error rate= " + str(error_rate(h, label)))# w updating rule of batch gradient decent (matrix style);# alpha is a number; feature size is (m, n); T means to transpose a matrix; err size is (m, 1)# so the size of w is (n, 1);w = w + alpha * feature.T * err # w size is (n, 1)return wdef load_data(file_name):'''导入训练数据input:  file_name(string)训练数据的位置output: feature_data(mat)特征label_data(mat)标签'''f = open(file_name)  # 打开文件feature_data = [] # declare a listlabel_data = []for line in f.readlines(): # read a line, and loop each elements of it feature_tmp = []lable_tmp = []# string.strip() means to remove the leading and trailing whitespace# string.split("x") means to separate the string into several sub_strings with "x"lines = line.strip().split("\t") # lines is a listfeature_tmp.append(1)  # formula: x0 = 1, check the book for detail;for i in range(len(lines) - 1):feature_tmp.append(float(lines[i])) # conbine a sample with the elements;lable_tmp.append(float(lines[-1])) # list[-1] means the last elementfeature_data.append(feature_tmp) # add element to listlabel_data.append(lable_tmp)f.close()  # closing filereturn np.mat(feature_data), np.mat(label_data) # mat 1Xn,these data will be reshape in next function;def save_model(file_name, w):m = np.shape(w)[0]f_w = open(file_name, "w")w_array = []for i in range(m): # xrange is not suitable for python 3.x, should be replaced by range;w_array.append(str(w[i, 0]))f_w.write("\t".join(w_array)) # add "\t" to the intervals of every two adjacent letters;f_w.close()if __name__ == "__main__":# load file print("---------- 1.load data ------------")feature, label = load_data("data.txt")# trainprint("---------- 2.training ------------")w = lr_train_bgd(feature, label, 1000, 0.01)# saveprint("---------- 3.save model ------------")save_model("weights", w)

2 测试文件 test.py

# coding:UTF-8import numpy as np
from logistic_regression import sigdef load_weight(w):f = open(w)w = []for line in f.readlines():lines = line.strip().split("\t")w_tmp = []for x in lines:w_tmp.append(float(x))w.append(w_tmp)    f.close()return np.mat(w)def load_data(file_name, n):f = open(file_name)feature_data = []for line in f.readlines():feature_tmp = []lines = line.strip().split("\t")# print lines[2]if len(lines) != n - 1:continuefeature_tmp.append(1)for x in lines:# print xfeature_tmp.append(float(x))feature_data.append(feature_tmp)f.close()return np.mat(feature_data)def predict(data, w):h = sig(data * w.T)#sigm = np.shape(h)[0]for i in range(m):if h[i, 0] < 0.5:h[i, 0] = 0.0else:h[i, 0] = 1.0return hdef save_result(file_name, result):m = np.shape(result)[0]tmp = []for i in range(m):tmp.append(str(result[i, 0]))f_result = open(file_name, "w")f_result.write("\t".join(tmp))f_result.close()    if __name__ == "__main__":# 1print("---------- 1.load model ------------")w = load_weight("weights")n = np.shape(w)[1]# 2print ("---------- 2.load data ------------")testData = load_data("test_data", n)# 3print ("---------- 3.get prediction ------------")h = predict(testData, w)## 4print ("---------- 4.save prediction ------------")save_result("result2", h)

《python 机器学习算法-logistics regression》相关推荐

ComeFuture英伽学院——2020年全国大学生英语竞赛【C类初赛真题解析】(持续更新)
视频:ComeFuture英伽学院--2019年全国大学生英语竞赛[C类初赛真题解析]大小作文--详细解析课件:[课件]2019年大学生英语竞赛C类初赛.pdf 视频:2020年全国大学生英语竞赛 ...
ComeFuture英伽学院——2019年全国大学生英语竞赛【C类初赛真题解析】大小作文——详细解析
视频:ComeFuture英伽学院--2019年全国大学生英语竞赛[C类初赛真题解析]大小作文--详细解析课件:[课件]2019年大学生英语竞赛C类初赛.pdf 视频:2020年全国大学生英语竞赛 ...
信息学奥赛真题解析（玩具谜题）
玩具谜题(2016年信息学奥赛提高组真题) 题目描述小南有一套可爱的玩具小人, 它们各有不同的职业.有一天, 这些玩具小人把小南的眼镜藏了起来.小南发现玩具小人们围成了一个圈,它们有的面朝圈内,有的 ...
信息学奥赛之初赛第1轮讲解（01-08课）
信息学奥赛之初赛讲解 01 计算机概述系统基本结构信息学奥赛之初赛讲解 01 计算机概述系统基本结构_哔哩哔哩_bilibili 信息学奥赛之初赛讲解 02 软件系统计算机语言进制转换信息 ...
信息学奥赛一本通习题答案（五）
最近在给小学生做C++的入门培训,用的教程是信息学奥赛一本通,刷题网址 http://ybt.ssoier.cn:8088/index.php 现将部分习题的答案放在博客上,希望能给其他有需要的人带来 ...
信息学奥赛一本通习题答案（三）
最近在给小学生做C++的入门培训,用的教程是信息学奥赛一本通,刷题网址 http://ybt.ssoier.cn:8088/index.php 现将部分习题的答案放在博客上,希望能给其他有需要的人带来 ...
信息学奥赛一本通提高篇第六部分数学基础相关的真题
第1章快速幂 1875:[13NOIP提高组]转圈游戏信息学奥赛一本通(C++版)在线评测系统第2 章素数第 3 章约数第 4 章同余问题第 5 章矩阵乘法第 6 章 ...
信息学奥赛一本通题目代码（非题库）
为了完善自己学c++,很多人都去读相关文献,就比如<信息学奥赛一本通>,可又对题目无从下手,从今天开始,我将把书上的题目一一的解析下来,可以做参考,如果有错,可以告诉我,将在下次解析里重 ...
信息学奥赛一本通（C++版）刷题记录
总目录详见:https://blog.csdn.net/mrcrack/article/details/86501716 信息学奥赛一本通(C++版) 刷题记录 http://ybt.ssoier. ...
最近公共祖先三种算法详解 + 模板题建议新手收藏例题：信息学奥赛一本通祖孙询问距离
首先什么是最近公共祖先?? 如图:红色节点的祖先为红色的1, 2, 3. 绿色节点的祖先为绿色的1, 2, 3, 4. 他们的最近公共祖先即他们最先相交的地方,如在上图中黄色的点就是他们的最近公共祖先 ...

《python 机器学习算法-logistics regression》

《python 机器学习算法-logistics regression》相关推荐

最新文章

热门文章