题目一：机器学习框架

机器学习的框架有哪些？请写出其构建一个机器学习的流水线

学习的框架

TensorFlow
pytorch
Paddle Paddle
CNTK
MXNet
mindspore
oneflow
MegEngine
Jittor
sklearn

构建一个机器学习的流水线

import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import cross_val_score
from sklearn.datasets import load_iris# 获取数据
#加载机器学习自带的iris数据
dataset = load_iris()
# print(dataset)
X = dataset.data
y = dataset.target
#构件流水线
scaling_pipeline = Pipeline([('scale', MinMaxScaler()),('predict', KNeighborsClassifier())])
scores = cross_val_score(scaling_pipeline, X, y, scoring='accuracy')
print("预测的准确率为{0:.1f}%".format(np.mean(scores) * 100))

预测的准确率为96.0%

题目二：机器学习的数据加载

课堂上我们已经熟悉了加载机器学习自带的iris数据，请使用相同的方法加载
boston房价数据，了解其数据的格式和结构。

from sklearn.datasets import load_boston
#加载boston数据集
boston = load_boston()
boston

{'data': array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02,4.9800e+00],[2.7310e-02, 0.0000e+00, 7.0700e+00, ..., 1.7800e+01, 3.9690e+02,9.1400e+00],[2.7290e-02, 0.0000e+00, 7.0700e+00, ..., 1.7800e+01, 3.9283e+02,4.0300e+00],...,[6.0760e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,5.6400e+00],[1.0959e-01, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9345e+02,6.4800e+00],[4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,7.8800e+00]]),'target': array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17.5, 20.2, 18.2, 13.6, 19.6,15.2, 14.5, 15.6, 13.9, 16.6, 14.8, 18.4, 21. , 12.7, 14.5, 13.2,13.1, 13.5, 18.9, 20. , 21. , 24.7, 30.8, 34.9, 26.6, 25.3, 24.7,21.2, 19.3, 20. , 16.6, 14.4, 19.4, 19.7, 20.5, 25. , 23.4, 18.9,35.4, 24.7, 31.6, 23.3, 19.6, 18.7, 16. , 22.2, 25. , 33. , 23.5,19.4, 22. , 17.4, 20.9, 24.2, 21.7, 22.8, 23.4, 24.1, 21.4, 20. ,20.8, 21.2, 20.3, 28. , 23.9, 24.8, 22.9, 23.9, 26.6, 22.5, 22.2,23.6, 28.7, 22.6, 22. , 22.9, 25. , 20.6, 28.4, 21.4, 38.7, 43.8,33.2, 27.5, 26.5, 18.6, 19.3, 20.1, 19.5, 19.5, 20.4, 19.8, 19.4,21.7, 22.8, 18.8, 18.7, 18.5, 18.3, 21.2, 19.2, 20.4, 19.3, 22. ,20.3, 20.5, 17.3, 18.8, 21.4, 15.7, 16.2, 18. , 14.3, 19.2, 19.6,23. , 18.4, 15.6, 18.1, 17.4, 17.1, 13.3, 17.8, 14. , 14.4, 13.4,15.6, 11.8, 13.8, 15.6, 14.6, 17.8, 15.4, 21.5, 19.6, 15.3, 19.4,17. , 15.6, 13.1, 41.3, 24.3, 23.3, 27. , 50. , 50. , 50. , 22.7,25. , 50. , 23.8, 23.8, 22.3, 17.4, 19.1, 23.1, 23.6, 22.6, 29.4,23.2, 24.6, 29.9, 37.2, 39.8, 36.2, 37.9, 32.5, 26.4, 29.6, 50. ,32. , 29.8, 34.9, 37. , 30.5, 36.4, 31.1, 29.1, 50. , 33.3, 30.3,34.6, 34.9, 32.9, 24.1, 42.3, 48.5, 50. , 22.6, 24.4, 22.5, 24.4,20. , 21.7, 19.3, 22.4, 28.1, 23.7, 25. , 23.3, 28.7, 21.5, 23. ,26.7, 21.7, 27.5, 30.1, 44.8, 50. , 37.6, 31.6, 46.7, 31.5, 24.3,31.7, 41.7, 48.3, 29. , 24. , 25.1, 31.5, 23.7, 23.3, 22. , 20.1,22.2, 23.7, 17.6, 18.5, 24.3, 20.5, 24.5, 26.2, 24.4, 24.8, 29.6,42.8, 21.9, 20.9, 44. , 50. , 36. , 30.1, 33.8, 43.1, 48.8, 31. ,36.5, 22.8, 30.7, 50. , 43.5, 20.7, 21.1, 25.2, 24.4, 35.2, 32.4,32. , 33.2, 33.1, 29.1, 35.1, 45.4, 35.4, 46. , 50. , 32.2, 22. ,20.1, 23.2, 22.3, 24.8, 28.5, 37.3, 27.9, 23.9, 21.7, 28.6, 27.1,20.3, 22.5, 29. , 24.8, 22. , 26.4, 33.1, 36.1, 28.4, 33.4, 28.2,22.8, 20.3, 16.1, 22.1, 19.4, 21.6, 23.8, 16.2, 17.8, 19.8, 23.1,21. , 23.8, 23.1, 20.4, 18.5, 25. , 24.6, 23. , 22.2, 19.3, 22.6,19.8, 17.1, 19.4, 22.2, 20.7, 21.1, 19.5, 18.5, 20.6, 19. , 18.7,32.7, 16.5, 23.9, 31.2, 17.5, 17.2, 23.1, 24.5, 26.6, 22.9, 24.1,18.6, 30.1, 18.2, 20.6, 17.8, 21.7, 22.7, 22.6, 25. , 19.9, 20.8,16.8, 21.9, 27.5, 21.9, 23.1, 50. , 50. , 50. , 50. , 50. , 13.8,13.8, 15. , 13.9, 13.3, 13.1, 10.2, 10.4, 10.9, 11.3, 12.3,  8.8,7.2, 10.5,  7.4, 10.2, 11.5, 15.1, 23.2,  9.7, 13.8, 12.7, 13.1,12.5,  8.5,  5. ,  6.3,  5.6,  7.2, 12.1,  8.3,  8.5,  5. , 11.9,27.9, 17.2, 27.5, 15. , 17.2, 17.9, 16.3,  7. ,  7.2,  7.5, 10.4,8.8,  8.4, 16.7, 14.2, 20.8, 13.4, 11.7,  8.3, 10.2, 10.9, 11. ,9.5, 14.5, 14.1, 16.1, 14.3, 11.7, 13.4,  9.6,  8.7,  8.4, 12.8,10.5, 17.1, 18.4, 15.4, 10.8, 11.8, 14.9, 12.6, 14.1, 13. , 13.4,15.2, 16.1, 17.8, 14.9, 14.1, 12.7, 13.5, 14.9, 20. , 16.4, 17.7,19.5, 20.2, 21.4, 19.9, 19. , 19.1, 19.1, 20.1, 19.9, 19.6, 23.2,29.8, 13.8, 13.3, 16.7, 12. , 14.6, 21.4, 23. , 23.7, 25. , 21.8,20.6, 21.2, 19.1, 20.6, 15.2,  7. ,  8.1, 13.6, 20.1, 21.8, 24.5,23.1, 19.7, 18.3, 21.2, 17.5, 16.8, 22.4, 20.6, 23.9, 22. , 11.9]),'feature_names': array(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD','TAX', 'PTRATIO', 'B', 'LSTAT'], dtype='<U7'),'DESCR': ".. _boston_dataset:\n\nBoston house prices dataset\n---------------------------\n\n**Data Set Characteristics:**  \n\n    :Number of Instances: 506 \n\n    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.\n\n    :Attribute Information (in order):\n        - CRIM     per capita crime rate by town\n        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.\n        - INDUS    proportion of non-retail business acres per town\n        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)\n        - NOX      nitric oxides concentration (parts per 10 million)\n        - RM       average number of rooms per dwelling\n        - AGE      proportion of owner-occupied units built prior to 1940\n        - DIS      weighted distances to five Boston employment centres\n        - RAD      index of accessibility to radial highways\n        - TAX      full-value property-tax rate per $10,000\n        - PTRATIO  pupil-teacher ratio by town\n        - B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town\n        - LSTAT    % lower status of the population\n        - MEDV     Median value of owner-occupied homes in $1000's\n\n    :Missing Attribute Values: None\n\n    :Creator: Harrison, D. and Rubinfeld, D.L.\n\nThis is a copy of UCI ML housing dataset.\nhttps://archive.ics.uci.edu/ml/machine-learning-databases/housing/\n\n\nThis dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.\n\nThe Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic\nprices and the demand for clean air', J. Environ. Economics & Management,\nvol.5, 81-102, 1978.   Used in Belsley, Kuh & Welsch, 'Regression diagnostics\n...', Wiley, 1980.   N.B. Various transformations are used in the table on\npages 244-261 of the latter.\n\nThe Boston house-price data has been used in many machine learning papers that address regression\nproblems.   \n     \n.. topic:: References\n\n   - Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.\n   - Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.\n",'filename': 'D:\\dell\\lib\\site-packages\\sklearn\\datasets\\data\\boston_house_prices.csv'}

boston.feature_names

array(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD','TAX', 'PTRATIO', 'B', 'LSTAT'], dtype='<U7')

共有13个属性标签（feature）,也可以在目录中以csv打开，具体目录参考如上代码运行结尾
Variables in order:
CRIM per capita crime rate by town
ZN proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS proportion of non-retail business acres per town
CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
NOX nitric oxides concentration (parts per 10 million)
RM average number of rooms per dwelling
AGE proportion of owner-occupied units built prior to 1940
DIS weighted distances to five Boston employment centres
RAD index of accessibility to radial highways
TAX full-value property-tax rate per $10,000
PTRATIO pupil-teacher ratio by town
B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
LSTAT % lower status of the population
MEDV Median value of owner-occupied homes in $1000’s

!pip install pandas
import pandas as pd
df=pd.DataFrame(boston.data,columns=boston.feature_names)
df
# df.info() #查看数据的类型，完整性
# df.describe() #查看数据的统计特征（均值、方差等）
# df.dropna(inplace=True) #删除有缺失的样本

Requirement already satisfied: pandas in d:\dell\lib\site-packages (1.2.4)
Requirement already satisfied: python-dateutil>=2.7.3 in d:\dell\lib\site-packages (from pandas) (2.8.1)
Requirement already satisfied: pytz>=2017.3 in d:\dell\lib\site-packages (from pandas) (2021.1)
Requirement already satisfied: numpy>=1.16.5 in c:\users\dell\appdata\roaming\python\python38\site-packages (from pandas) (1.19.5)
Requirement already satisfied: six>=1.5 in d:\dell\lib\site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0)

	CRIM	ZN	INDUS	CHAS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT
0	0.00632	18.0	2.31	0.0	0.538	6.575	65.2	4.0900	1.0	296.0	15.3	396.90	4.98
1	0.02731	0.0	7.07	0.0	0.469	6.421	78.9	4.9671	2.0	242.0	17.8	396.90	9.14
2	0.02729	0.0	7.07	0.0	0.469	7.185	61.1	4.9671	2.0	242.0	17.8	392.83	4.03
3	0.03237	0.0	2.18	0.0	0.458	6.998	45.8	6.0622	3.0	222.0	18.7	394.63	2.94
4	0.06905	0.0	2.18	0.0	0.458	7.147	54.2	6.0622	3.0	222.0	18.7	396.90	5.33
...	...	...	...	...	...	...	...	...	...	...	...	...	...
501	0.06263	0.0	11.93	0.0	0.573	6.593	69.1	2.4786	1.0	273.0	21.0	391.99	9.67
502	0.04527	0.0	11.93	0.0	0.573	6.120	76.7	2.2875	1.0	273.0	21.0	396.90	9.08
503	0.06076	0.0	11.93	0.0	0.573	6.976	91.0	2.1675	1.0	273.0	21.0	396.90	5.64
504	0.10959	0.0	11.93	0.0	0.573	6.794	89.3	2.3889	1.0	273.0	21.0	393.45	6.48
505	0.04741	0.0	11.93	0.0	0.573	6.030	80.8	2.5050	1.0	273.0	21.0	396.90	7.88

506 rows × 13 columns

题目三：数据集的分割

请将导入的数据集分为训练集与测试集，使用train_test_split()方法

from sklearn.model_selection import train_test_split
df['target']=boston.target
features = pd.DataFrame(np.c_[df['LSTAT'],df['RM'],df['PTRATIO']],columns=['LSTAT','RM','PTRATIO'])
target=df['target']
x_train,x_test,y_train,y_test = train_test_split(features,target,random_state=5,test_size=0.17)

题目四：机器学习的训练

请创建任意一个回归训练模型，并将boston房价数据进行训练

from sklearn.metrics import r2_score
from sklearn.linear_model import LinearRegression,Lasso
lr = LinearRegression() #实例化一个线性回归对象
lr.fit(x_train, y_train) #采用fit方法，拟合回归系数和截距
y_pred = lr.predict(x_test)#模型预测
print(r2_score(y_test, y_pred))#模型评价, 决定系数

0.7017302408287501

题目五：判断训练的效果

请使用MSE来评判我们的训练效果

from sklearn.metrics import mean_squared_error
print("mse=",mean_squared_error(y_test, y_pred))#均方误差

mse= 23.515599635089057

# EN=ElasticNet(0.02)  #实例化弹性网络回归对象
# EN.fit(x_train,y_train) #训练
# y_pred=EN.predict(x_test) #预测
# #评价
# print(r2_score(y_pred,y_test))
# print("mse=",mean_squared_error(y_test, y_pred))#均方误差

题目六：交叉验证

请使用交叉验证来查看训练的效果

from sklearn.model_selection import cross_val_score
scores = cross_val_score(lr, boston.data, boston.target, cv=5)
scores

array([ 0.63919994,  0.71386698,  0.58702344,  0.07923081, -0.25294154])

题目七：模型的保存

保存我们刚刚训练好的模型

!pip install joblib

Requirement already satisfied: joblib in d:\dell\lib\site-packages (1.0.1)

import joblib #jbolib模块
#保存Model(注:save文件夹要预先建立，否则会报错)
joblib.dump(lr, 'lr.pkl')

['lr.pkl']

#读取Model
clf3 = joblib.load('clf.pkl')
#测试读取后的Model
# print(clf3.predict(x_test))

机械学习房价预测实战（mse 回归交叉验证）相关推荐

Pytorch kaggle 房价预测实战
Pytorch kaggle 房价预测实战 0. 环境介绍环境使用 Kaggle 里免费建立的 Notebook 教程使用李沐老师的动手学深度学习网站和视频讲解小技巧:当遇到函数看不懂的时候 ...
ML：基于葡萄牙银行机构营销活动数据集(年龄/职业等)利用Pipeline框架(两种类型特征并行处理)+多种模型预测(分层抽样+调参交叉验证评估+网格/随机搜索+推理)客户是否购买该银行的产品二分类案
ML之pipeline:基于葡萄牙银行机构营销活动数据集(年龄/职业/婚姻/违约等)利用Pipeline框架(两种类型特征并行处理)+多种模型预测(分层抽样+调参交叉验证评估+网格搜索/随机搜索+模型 ...
机器学习代码实战——K折交叉验证（K Fold Cross Validation）
文章目录 1.实验目的 2.导入数据和必要模块 3.比较不同模型预测准确率 3.1.逻辑回归 3.2.决策树 3.3.支持向量机 3.4.随机森林 1.实验目的使用sklearn库中的鸢尾花数据集, ...
机器学习之房价预测实战
背景 kaggle地址:https://www.kaggle.com/c/house-prices-advanced-regression-techniques/overview 赛题给我们79个描述 ...
【机器学习】从房价预测问题看回归算法
关键词:机器学习 / 回归文章目录回归问题是什么生成数据最小二乘法学习一元线性回归模型最小二乘法学习多元线性回归模型梯度下降法学习回归模型回归问题是什么回归问题是除了分类问题以外,机器 ...
22.实战：Kaggle房价预测
1. 下载和缓存数据集这里我们实现几个函数来方便下载数据. 首先,我们建立字典DATA_HUB, 它可以将数据集名称的字符串映射到数据集相关的二元组上, 这个二元组包含数据集的url和验证文件完整性 ...
Python机器学习15——XGboost和 LightGBM详细用法(交叉验证，网格搜参，变量筛选)
本系列基本不讲数学原理,只从代码角度去让读者们利用最简洁的Python代码实现机器学习方法. 集成模型发展到现在的XGboost,LightGBM,都是目前竞赛项目会采用的主流算法.是真正的具有做项目 ...
梯度消失和梯度爆炸_知识干货-动手学深度学习-05 梯度消失和梯度爆炸以及Kaggle房价预测...
梯度消失和梯度爆炸考虑到环境因素的其他问题 Kaggle房价预测梯度消失和梯度爆炸深度模型有关数值稳定性的典型问题是消失(vanishing)和爆炸(explosion). 当神经网络的层数较多 ...
Python数据分析-房价预测及模型分析
摘要 Python数据分析-房价的影响因素图解https://blog.csdn.net/weixin_42341655/article/details/120299008?spm=1001.201 ...

机械学习房价预测实战（mse 回归交叉验证）

题目一：机器学习框架

学习的框架

构建一个机器学习的流水线

题目二：机器学习的数据加载

题目三：数据集的分割

题目四：机器学习的训练

题目五：判断训练的效果

题目六：交叉验证

题目七：模型的保存

机械学习房价预测实战（mse 回归交叉验证）相关推荐

最新文章

热门文章

机械学习房价预测实战（mse 回归 交叉验证）

题目一：机器学习框架

学习的框架

构建一个机器学习的流水线

题目二：机器学习的数据加载

题目三：数据集的分割

题目四：机器学习的训练

题目五：判断训练的效果

题目六：交叉验证

题目七：模型的保存

机械学习房价预测实战（mse 回归 交叉验证）相关推荐

最新文章

热门文章

机械学习房价预测实战（mse 回归交叉验证）

机械学习房价预测实战（mse 回归交叉验证）相关推荐