唐宇迪数据分析学习笔记

第1天 Numpy

import numpy
world_alcohol = numpy.genfromtxt("world_alcohol.txt", delimiter=",",dtype='str') #函数genfromtxt是打开txt文件，分隔符是逗号
print(type(world_alcohol))   #ndarray是numpy最核心的结构，不是list，是矩阵
print(world_alcohol)
print(help(numpy.genfromtxt))  #查看函数的参数解释，可以在numpy.genformtxt(里面定义参数)
_____________
<class 'numpy.ndarray'>
[['Year' 'WHO region' 'Country' 'Beverage Types' 'Display Value']['1986' 'Western Pacific' 'Viet Nam' 'Wine' '0']['1986' 'Americas' 'Uruguay' 'Other' '0.5']...['1987' 'Africa' 'Malawi' 'Other' '0.75']['1989' 'Americas' 'Bahamas' 'Wine' '1.5']['1985' 'Africa' 'Malawi' 'Spirits' '0.31']]

#The numpy.array() function can take a list or list of lists as input. When we input a list, we get a one-dimensional array as a result:
vector = numpy.array([5, 10, 15, 20])                          #一维数组：一个中括号
#When we input a list of lists, we get a matrix as a result:  #二维数组：list of list
matrix = numpy.array([[5, 10, 15], [20, 25, 30], [35, 40, 45]])
print (vector)
print (matrix)
______________
[ 5 10 15 20]
[[ 5 10 15][20 25 30][35 40 45]]

#We can use the ndarray.shape property to figure out how many elements are in the array
vector = numpy.array([1, 2, 3, 4])      #np.array里面的元素必须是同样的数据类型
print(vector.shape)
#For matrices, the shape property contains a tuple with 2 elements.
matrix = numpy.array([[5, 10, 15], [20, 25, 30]])
print(matrix.shape)
________________
(4,)
(2, 3)

#Each value in a NumPy array has to have the same data type
#NumPy will automatically figure out an appropriate data type when reading in data or converting lists to arrays.
#You can check the data type of a NumPy array using the dtype property.
number = numpy.array([1,2,3,4])       #全都是int
print(number)
print(number.dtype)num = numpy.array([1.0,2,3,4])        #全都是float
print(num)
print(num.dtype)numbers = numpy.array([1, 2, 3, '4.0'])   #全都是string
print(numbers)
print(numbers.dtype)
________________________
[1 2 3 4]
int64
[1. 2. 3. 4.]
float64
['1' '2' '3' '4.0']
<U21

#When NumPy can't convert a value to a numeric data type like float or integer, it uses a special nan value that stands for Not a Number
#nan is the missing data
#1.98600000e+03 is actually 1.986 * 10 ^ 3
world_alcohol
_______________
array([[             nan,              nan,              nan,nan,              nan],[  1.98600000e+03,              nan,              nan,nan,   0.00000000e+00],[  1.98600000e+03,              nan,              nan,nan,   5.00000000e-01],..., [  1.98700000e+03,              nan,              nan,nan,   7.50000000e-01],[  1.98900000e+03,              nan,              nan,nan,   1.50000000e+00],[  1.98500000e+03,              nan,              nan,nan,   3.10000000e-01]])

world_alcohol = numpy.genfromtxt("world_alcohol.txt", delimiter=",", dtype="str", skip_header=1  #跳过头行)
print(world_alcohol)
________________
[['1986' 'Western Pacific' 'Viet Nam' 'Wine' '0']['1986' 'Americas' 'Uruguay' 'Other' '0.5']['1985' 'Africa' "Cte d'Ivoire" 'Wine' '1.62']...['1987' 'Africa' 'Malawi' 'Other' '0.75']['1989' 'Americas' 'Bahamas' 'Wine' '1.5']['1985' 'Africa' 'Malawi' 'Spirits' '0.31']]

uruguay_other_1986 = world_alcohol[0,4]
third_country = world_alcohol[2,3]
print (uruguay_other_1986)
print (third_country)
____________________
0
Wine

vector = numpy.array([5, 10, 15, 20])
print(vector[0:3])
______________
[ 5 10 15]

matrix = numpy.array([[5, 10, 15], [20, 25, 30],[35, 40, 45]])
print(matrix)
print(matrix[:,1])   #所有行，第2列
__________________
[[ 5 10 15][20 25 30][35 40 45]]
[10 25 40]

matrix = numpy.array([[5, 10, 15], [20, 25, 30],[35, 40, 45]])
print(matrix[:,0:2])   #两列切片
________________
[[ 5 10][20 25][35 40]]

matrix = numpy.array([[5, 10, 15], [20, 25, 30],[35, 40, 45]])
print(matrix[1:3,0:2])
________________
[[20 25][35 40]]

import numpy
#it will compare the second value to each element in the vector
# If the values are equal, the Python interpreter returns True; otherwise, it returns False
vector = numpy.array([5, 10, 15, 20])  #对每个元素进行遍历比较，不相等的话返回false，返回bool值
vector == 15   #“==”会进行判断
______________
array([False, False,  True, False])

vector = numpy.array([5, 10, 15, 20])
print(vector)
_____________
[ 5 10 15 20]

matrix = numpy.array([[5, 10, 15], [20, 25, 30],[35, 40, 45]])
matrix == 25
______________
array([[False, False, False],[False,  True, False],[False, False, False]])

#Compares vector to the value 10, which generates a new Boolean vector [False, True, False, False]. It assigns this result to equal_to_ten
vector = numpy.array([5, 10, 15, 20])
vector == 10
# ab = (vector == 10)
# print (ab)
# print(vector[ab])
_________________
array([False,  True, False, False])

ab =(vector == 10)
print(ab)
__________
[False  True False False]

print(vector[ab])    #传入bool值，只会返回下标为true的值，返回一个数组.如果没有true，返回空数组
____________
[10]

matrix = numpy.array([[5, 10, 15], [20, 25, 30],[35, 40, 45]])
second_column_25 = (matrix[:,1] == 25)     #判断第2列，所有行里面，是否等于25
print second_column_25
print(matrix[second_column_25, :])      #将等于25的那一行，所有列作为matrix的索引，打印出来
_______________
[False  True False]
[[20 25 30]]

链接：https://pan.baidu.com/s/16KviRbUoV1j4MZKWopz7FA 提取码: 7vrk

唐宇迪数据分析学习笔记相关推荐

唐宇迪机器学习课程笔记：逻辑回归之信用卡检测任务
信用卡欺诈检测基于信用卡交易记录数据建立分类模型来预测哪些交易记录是异常的哪些是正常的. 任务流程: 加载数据,观察问题针对问题给出解决方案数据集切分评估方法对比逻辑回归模型建模结果分析 ...
python数据项目分析实战技法_《Python数据分析与机器学习实战-唐宇迪》读书笔记第9章--随机森林项目实战——气温预测(1/2)...
第9章--随机森林项目实战--气温预测(1/2) 第8章已经讲解过随机森林的基本原理,本章将从实战的角度出发,借助Python工具包完成气温预测任务,其中涉及多个模块,主要包含随机森林建模.特征选择. ...
python画一片树叶的故事_《Python数据分析与机器学习实战-唐宇迪》读书笔记第7章--决策树...
第7章决策树决策树算法是机器学习中最经典的算法之一.大家可能听过一些高深的算法,例如在竞赛中大杀四方的Xgboost.各种集成策略等,其实它们都是基于树模型来建立的,掌握基本的树模型后,再去理解集成 ...
B站唐宇迪深度学习笔记
笔记请移步深度学习飞书笔记
python天气数据分析论文_《Python数据分析与机器学习实战-唐宇迪》读书笔记第9章--随机森林项目实战——气温预测(2/2)...
第9章--随机森林项目实战--气温预测(2/2) 第8章已经讲解过随机森林的基本原理,本章将从实战的角度出发,借助Python工具包完成气温预测任务,其中涉及多个模块,主要包含随机森林建模.特征选择. ...
唐宇迪强化学习笔记之项目实战(flabby bird)
强化学习: 学习系统没有像很多其它形式的机器学习方法一样被告知应该做出什么行为,必须在尝试了之后才能发现哪些行为会导致奖励的最大化,当前的行为可能不仅仅会影响即时奖励,还会影响下一步的奖励以及后续的所 ...
23神经网络 :唐宇迪《python数据分析与机器学习实战》学习笔记
唐宇迪<python数据分析与机器学习实战>学习笔记 23神经网络 1.初识神经网络百度深度学习研究院的图,当数据规模较小时差异较小,但当数据规模较大时深度学习算法的效率明显增加,目前大 ...
唐宇迪机器学习课程数据集_最受欢迎的数据科学和机器学习课程-2020年8月
唐宇迪机器学习课程数据集 There are a lot of great online resources and websites on data science and machine lear ...
唐宇迪机器学习实战课程笔记(全)
1. 线性回归 1.1线性回归理论 1.2线性回归实战 2.训练调参基本功(线性回归.岭回归.Lasso回归) 2.1 线性回归模型实现 2.2不同GD策略对比 2.3多项式曲线回归 2.4过拟合和欠 ...
唐宇迪Pytorch笔记（附课程资料）
目录 pytorch_tutorial 介绍软件架构安装教程所需python包使用说明配套资料 { title = {pytorch深度学习实战}, author = {唐宇迪}, url ...

唐宇迪数据分析学习笔记

唐宇迪数据分析学习笔记

第1天 Numpy

唐宇迪数据分析学习笔记相关推荐

最新文章

热门文章