Matplotlib

CASE: demographic

案例：人口统计数据

The world bank has estimates of the world population for the years 1950 up to 2100. The years are loaded in your workspace as a list called year, and the corresponding populations as a list called pop.
世界银行预估了1950年到2100年的世界人口。其中年份已经加载到列表year，人口加载到列表pop。

print(year[-1])
print(pop[-1])2100
10.85

from matplotlib import pyplot as plt
plt.plot(year, pop)
plt.show()

Let’s start working on the data that professor Hans Rosling used to build his beautiful bubble chart. It was collected in 2007. Two lists are available for you:
现在让我们开始研究 Hans Rosling 教授的一份数据，其中包含两个指标：

life_exp which contains the life expectancy for each country 每个国家的预期寿命
gdp_cap, which contains the GDP per capita (i.e. per person) for each country expressed in US Dollars. 每个国家的人均GDP

print(gdp_cap[-1])
print(life_exp[-1])469.70929810000007
43.487

plt.plot(gdp_cap, life_exp)
plt.show()

When you’re trying to assess if there’s a correlation between two variables, for example, the scatter plot is the better choice.
当你尝试评估两个变量之间的相关性时，散点图是更好的选择。

plt.scatter(gdp_cap, life_exp)
plt.xscale('log') # 把人均GDP用对数表示时，相关性就会变得很明显。
plt.show()

You saw that the higher GDP usually corresponds to a higher life expectancy. In other words, there is a positive correlation. Do you think there’s a relationship between population and life expectancy of a country?
GDP越高，寿命越长。换句话说，两者是正相关的。但是一个国家的人口和预期寿命之间有关系吗?

import matplotlib.pyplot as plt
plt.scatter(pop, life_exp)
plt.show()

To see how life expectancy in different countries is distributed, let’s create a histogram of life_exp
为了了解不同国家的预期寿命是如何分布的，让我们创建一个life_exp直方图。

import matplotlib.pyplot as plt
plt.hist(life_exp)
plt.show()

In the previous exercise, you didn’t specify the number of bins. By default, Python sets the number of bins to 10 in that case. The number of bins is pretty important. Too few bins will oversimplify reality and won’t show you the details. Too many bins will overcomplicate reality and won’t show the bigger picture.
直方图默认箱子为10，太少的箱子会使图像过于简单化，不会展示细节。太多的箱子会使图像变得过于复杂，不会展现出更大的图景。

import matplotlib.pyplot as plt
plt.hist(life_exp, 5)
plt.show()
plt.clf() # 清除

import matplotlib.pyplot as plt
plt.hist(life_exp, 20)
plt.show()
plt.clf()

Let’s do a similar comparison. life_exp contains life expectancy data for different countries in 2007. You also have access to a second list now, life_exp1950, containing similar data for 1950. Can you make a histogram for both datasets?
life_exp包含2007年不同国家的预期寿命数据。life_exp1950包含1950年不同国家的预期寿命数据。你能给两个数据集都做一个直方图吗?

import matplotlib.pyplot as plt
plt.hist(life_exp, 15)
plt.show()
plt.clf()plt.hist(life_exp1950, 15)
plt.show()
plt.clf()

You’re going to work on the scatter plot with world development data: GDP per capita on the x-axis (logarithmic scale), life expectancy on the y-axis. As a first step, let’s add axis labels and a title to the plot.
你们要用世界发展数据的散点图：x轴是人均GDP(对数尺度)，y轴是预期寿命。作为第一步，让我们将axis标签和标题添加到绘图中。

import matplotlib.pyplot as plt
plt.scatter(gdp_cap, life_exp)
plt.xscale('log')
xlab = 'GDP per Capita [in USD]'
ylab = 'Life Expectancy [in years]'
title = 'World Development in 2007'
plt.xlabel(xlab)
plt.ylabel(ylab)
plt.title(title)
plt.show()

Let’s do a thing for the x-axis of your world development chart, with the xticks() function. The tick values 1000, 10000 and 100000 should be replaced by 1k, 10k and 100k.
让我们用xticks()函数修改图表中的x轴：刻度值1000、10000和100000用1k、10k和100k替换。

import matplotlib.pyplot as plt
plt.scatter(gdp_cap, life_exp)
plt.xscale('log')
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
tick_val = [1000, 10000, 100000]
tick_lab = ['1k', '10k', '100k']
plt.xticks(tick_val, tick_lab)
plt.show()

Right now, the scatter plot is just a cloud of blue dots, indistinguishable from each other. Let’s change this. Wouldn’t it be nice if the size of the dots corresponds to the population?
现在，散点图只是一团蓝点，彼此难以区分。让我们改变这种情况，让这些点的大小与总体的大小一致。

import numpy as np
np_pop = np.array(pop) # 将pop存储为numpy数组:np_pop
np_pop = np_pop * 2
plt.scatter(gdp_cap, life_exp, s = np_pop) # 将size参数设置为np_pop
plt.xscale('log')
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000, 10000, 100000],['1k', '10k', '100k'])
plt.show()

The next step is making the plot more colorful! To do this, a list col has been created for you. It’s a list with a color for each corresponding country, depending on the continent the country is part of.
下一步，我们为不同大洲国家设定不同的颜色，具体参见下方字典：

dict = {'Asia':'red','Europe':'green','Africa':'blue','Americas':'yellow','Oceania':'black'
}

plt.scatter(x = gdp_cap, y = life_exp, s = np.array(pop) * 2, c = col, alpha = 0.8)
plt.xscale('log')
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000,10000,100000], ['1k','10k','100k'])
plt.show()

Additional customizations and gridlines.
添加额外的注释和网格线

plt.scatter(x = gdp_cap, y = life_exp, s = np.array(pop) * 2, c = col, alpha = 0.8)
plt.xscale('log')
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000,10000,100000], ['1k','10k','100k'])
plt.text(1550, 71, 'India')
plt.text(5700, 80, 'China')
plt.grid(True)
plt.show()

Dictionaries

找到某个国家名对应的index：

countries = ['spain', 'france', 'germany', 'norway']
capitals = ['madrid', 'paris', 'berlin', 'oslo']
ind_ger = countries.index('germany') # 德国的索引
print(capitals[ind_ger])<script.py> output:berlin

学会创建字典：

countries = ['spain', 'france', 'germany', 'norway']
capitals = ['madrid', 'paris', 'berlin', 'oslo']
europe = { 'spain':'madrid', 'france':'paris', 'germany':'berlin', 'norway':'oslo' }
print(europe)<script.py> output:{'spain': 'madrid', 'germany': 'berlin', 'norway': 'oslo', 'france': 'paris'}

访问字典中的键(key)对应的值(value)：

europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin', 'norway':'oslo' }
print(europe.keys())
print(europe['norway'])<script.py> output:dict_keys(['spain', 'germany', 'norway', 'france'])oslo

添加一个新的键值进入字典：

europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin', 'norway':'oslo' }
europe['italy'] = 'rome'
print('italy' in europe)
europe['poland'] = 'warsaw'
print(europe)<script.py> output:True{'spain': 'madrid', 'germany': 'berlin', 'italy': 'rome', 'norway': 'oslo', 'france': 'paris', 'poland': 'warsaw'}

更新和删除字典中已存在的键值：

europe = {'spain':'madrid', 'france':'paris', 'germany':'bonn','norway':'oslo', 'italy':'rome', 'poland':'warsaw','australia':'vienna' }
europe['germany'] = 'berlin'
del europe['australia']
print(europe)<script.py> output:{'spain': 'madrid', 'germany': 'berlin', 'italy': 'rome', 'norway': 'oslo', 'france': 'paris', 'poland': 'warsaw'}

字典嵌套：

europe = { 'spain': { 'capital':'madrid', 'population':46.77 },'france': { 'capital':'paris', 'population':66.03 },'germany': { 'capital':'berlin', 'population':80.62 },'norway': { 'capital':'oslo', 'population':5.084 } }
print(europe['france'])
data = {'capital':'rome', 'population':59.83} # 添加信息
europe['italy'] = data
print(europe)<script.py> output:{'capital': 'paris', 'population': 66.03}{'spain': {'capital': 'madrid', 'population': 46.77}, 'germany': {'capital': 'berlin', 'population': 80.62}, 'italy': {'capital': 'rome', 'population': 59.83}, 'norway': {'capital': 'oslo', 'population': 5.084}, 'france': {'capital': 'paris', 'population': 66.03}}

Pandas

The DataFrame is one of Pandas’ most important data structures. It’s basically a way to store tabular data where you can label the rows and the columns. One way to build a DataFrame is from a dictionary.
DataFrame是panda最重要的数据结构之一。它基本上是一种存储表格数据的方法，您可以在其中标记行和列。构建DataFrame的一种方法是从字典中获得。

names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr =  [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]import pandas as pd
my_dict = {'country':names, 'drives_right':dr, 'cars_per_cap':cpc} # 创建字典
cars = pd.DataFrame(my_dict) # 建立一个DataFrame
print(cars)<script.py> output:cars_per_cap        country  drives_right0           809  United States          True1           731      Australia         False2           588          Japan         False...

通过设置index属性指定DataFrame的行标签：

import pandas as pd
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr =  [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]
cars_dict = { 'country':names, 'drives_right':dr, 'cars_per_cap':cpc }
cars = pd.DataFrame(cars_dict)row_labels = ['US', 'AUS', 'JPN', 'IN', 'RU', 'MOR', 'EG']
cars.index = row_labels # 指定行标签
print(cars)<script.py> output:cars_per_cap        country  drives_rightUS            809  United States          TrueAUS           731      Australia         FalseJPN           588          Japan         False...

Putting data in a dictionary and then building a DataFrame works, but it’s not very efficient. What if you’re dealing with millions of observations? In those cases, the data is typically available as files with a regular structure. One of those file types is the CSV file, which is short for “comma-separated values”.
将数据放入字典中，然后构建一个DataFrame是可行的，但它的效率不是很高。如果你要处理数百万次的观测呢?在这些情况下，数据通常以CSV文件提供。

import pandas as pd
cars = pd.read_csv('cars.csv')
print(cars)<script.py> output:Unnamed: 0  cars_per_cap        country  drives_right0         US           809  United States          True1        AUS           731      Australia         False2        JPN           588          Japan         False...

Your read_csv() call to import the CSV data didn’t generate an error, but the output is not entirely what we wanted. The row labels were imported as another column without a name.
行标签并没有被导入，我们可以用index_col

import pandas as pd
cars = pd.read_csv('cars.csv', index_col=0)
print(cars)<script.py> output:cars_per_cap        country  drives_rightUS            809  United States          TrueAUS           731      Australia         FalseJPN           588          Japan         False...

Square Brackets

使用方括号选择列：

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
print(cars['country']) # 打印Pandas Series<script.py> output:US     United StatesAUS        AustraliaJPN            Japan...Name: country, dtype: object

print(cars[['country']]) # 打印Pandas DataFrame<script.py> output:countryUS   United StatesAUS      AustraliaJPN          Japan...

print(cars[['country', 'drives_right']])<script.py> output:country  drives_rightUS   United States          TrueAUS      Australia         FalseJPN          Japan         False

除了选择列，还可以用方括号选择行或者观察值：

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
print(cars[0:3])
print(cars[3:6])<script.py> output:cars_per_cap        country  drives_rightUS            809  United States          TrueAUS           731      Australia         FalseJPN           588          Japan         Falsecars_per_cap  country  drives_rightIN             18    India         FalseRU            200   Russia          TrueMOR            70  Morocco          True

loc and iloc

With loc and iloc you can do practically any data selection operation on DataFrames you can think of. loc is label-based, which means that you have to specify rows and columns based on their row and column labels. iloc is integer index based, so you have to specify rows and columns by their integer index like you did in the previous exercise.
使用loc和iloc，您几乎可以对您能想到的数据流进行任何数据选择操作。loc是基于标签的，这意味着必须根据行和列标签指定行和列。iloc是基于整数索引的，因此您必须通过它们的整数索引指定行和列。

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
print(cars.loc[['JPN']]) # 打印Japan<script.py> output:cars_per_cap country  drives_rightJPN           588   Japan         False

print(cars.iloc[2]) <script.py> output:cars_per_cap      588country         Japandrives_right    FalseName: JPN, dtype: object

print(cars.loc[['AUS', 'EG']])<script.py> output:cars_per_cap    country  drives_rightAUS           731  Australia         FalseEG             45      Egypt          True

loc and iloc also allow you to select both rows and columns from a DataFrame.
loc和iloc还允许您从DataFrame中选择行和列。

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
print(cars.loc['MOR', 'drives_right']) # 打印出摩洛哥的drives_right值
print(cars.loc[['RU', 'MOR'], ['country', 'drives_right']])<script.py> output:Truecountry  drives_rightRU    Russia          TrueMOR  Morocco          True

It’s also possible to select only columns with loc and iloc. In both cases, you simply put a slice going from beginning to end in front of the comma.
也可以只选择具有loc和iloc的列。在这两种情况下，只需在逗号前面放一个从开始到结束的切片。

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
print(cars.loc[:, 'drives_right'])<script.py> output:US      TrueAUS    FalseJPN    False...Name: drives_right, dtype: bool

print(cars.loc[:, ['drives_right']])<script.py> output:drives_rightUS           TrueAUS         FalseJPN         False

print(cars.loc[:, ['cars_per_cap', 'drives_right']])<script.py> output:cars_per_cap  drives_rightUS            809          TrueAUS           731         FalseJPN           588         False

Logic

Boolean operators with Numpy
带有numpy的布尔运算符：np.logical_and()、np.logical_or()和np.logical_not()

import numpy as np
my_house = np.array([18.0, 20.0, 10.75, 9.50])
your_house = np.array([14.0, 24.0, 14.25, 9.0])print(np.logical_or(my_house > 18.5, my_house < 10))
print(np.logical_and(my_house < 11, your_house < 11))<script.py> output:[False  True False  True][False False False  True]

Control Flow

用if语句查看房间：

room = "kit"
area = 14.0if room == "kit" :print("looking around in the kitchen.")
if area > 15:print("big place!")<script.py> output:looking around in the kitchen.

用else扩展if语句：

room = "kit"
area = 14.0if room == "kit" :print("looking around in the kitchen.")
else :print("looking around elsewhere.")if area > 15 :print("big place!")
else:print("pretty small.")<script.py> output:looking around in the kitchen.pretty small.

进一步：使用elif

room = "bed"
area = 14.0if room == "kit" :print("looking around in the kitchen.")
elif room == "bed":print("looking around in the bedroom.")
else :print("looking around elsewhere.")if area > 15 :print("big place!")
elif area > 10:print("medium size, nice!")
else :print("pretty small.")<script.py> output:looking around in the bedroom.medium size, nice!

Filtering

筛选出符合drives_right is True的行：

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
dr = cars['drives_right'] # 将drives_right提取为Series
sel = cars.loc[dr]
print(sel)<script.py> output:cars_per_cap        country  drives_rightUS            809  United States          TrueRU            200         Russia          TrueMOR            70        Morocco          TrueEG             45          Egypt          True

将上面的代码简化为一行：

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
sel = cars[cars['drives_right']]
print(sel)

This time you want to find out which countries have a high cars per capita figure.
找到哪些国家的人均汽车拥有量高。

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)cpc = cars['cars_per_cap']
many_cars = cpc > 500
car_maniac = cars[many_cars]
print(car_maniac)<script.py> output:cars_per_cap        country  drives_rightUS            809  United States          TrueAUS           731      Australia         FalseJPN           588          Japan         False

找到cars_per_cap在100到500之间的汽车观察结果：

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
import numpy as npcpc = cars['cars_per_cap']
between = np.logical_and(cpc > 100, cpc < 500)
medium = cars[between]
print(medium)<script.py> output:cars_per_cap country  drives_rightRU           200  Russia          True

Loops

while循环：

offset = 3
while offset != 0:print("correcting...")offset = offset - 1print(offset)<script.py> output:correcting...2correcting...1correcting...0

偏移量为负的while循环：

offset = -3
while offset != 0 :print("correcting...")if offset > 0 :offset = offset - 1else : offset = offset + 1  print(offset)<script.py> output:correcting...-2correcting...-1correcting...0

对列表进行循环：

areas = [11.25, 18.0, 20.0, 10.75, 9.50]
for element in areas:print(element)<script.py> output:11.2518.020.010.759.5

Using a for loop to iterate over a list only gives you access to every list element in each run, one after the other. If you also want to access the index information, so where the list element you’re iterating over is located, you can use enumerate().
使用for循环遍历一个列表只允许您在每次运行时一个接一个地访问每个列表元素。如果还希望访问索引信息，以便迭代的列表元素位于何处，则可以使用enumerate()。

areas = [11.25, 18.0, 20.0, 10.75, 9.50]
for index, a in enumerate(areas) :print("room " + str(index) + ": " + str(a))<script.py> output:room 0: 11.25room 1: 18.0room 2: 20.0room 3: 10.75room 4: 9.5

For non-programmer folks, room 0: 11.25 is strange. Wouldn’t it be better if the count started at 1?
房间0: 11.25很奇怪，改为房间1：

areas = [11.25, 18.0, 20.0, 10.75, 9.50]
for index, area in enumerate(areas) :print("room " + str(index+1) + ": " + str(area))<script.py> output:room 1: 11.25room 2: 18.0room 3: 20.0room 4: 10.75room 5: 9.5

构建子列表的循环：

house = [["hallway", 11.25], ["kitchen", 18.0], ["living room", 20.0], ["bedroom", 10.75], ["bathroom", 9.50]]
for x, y in house:print("the " + str(x) + " is " + str(y) + " sqm")<script.py> output:the hallway is 11.25 sqmthe kitchen is 18.0 sqmthe living room is 20.0 sqmthe bedroom is 10.75 sqmthe bathroom is 9.5 sqm

字典的循环遍历：

europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin','norway':'oslo', 'italy':'rome', 'poland':'warsaw', 'austria':'vienna' }
for key, value in europe.items():print("the capital of " + key + " is " + str(value))<script.py> output:the capital of austria is viennathe capital of norway is oslothe capital of spain is madrid...

numpy数组的循环遍历：

import numpy as np
for x in np_height: # 遍历一维数组print(str(x) + " inches")<script.py> output:
74 inches
74 inches
72 inches

import numpy as np
for x in np.nditer(np_baseball): # 遍历二维数组及以上print(x)<script.py> output:
74
74
...
180
215
...

DataFrame的循环遍历：

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
for lab, row in cars.iterrows():print(lab)print(row)<script.py> output:UScars_per_cap              809country         United Statesdrives_right             TrueName: US, dtype: object...

The row data that’s generated by iterrows() on every run is a Pandas Series. This format is not very convenient to print out. Luckily, you can easily select variables from the Pandas Series using square brackets.
每次运行时由iterrows()生成的行数据是panda系列，这种格式打印出来不太方便。

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
for lab, row in cars.iterrows() :print(lab + ": " + str(row['cars_per_cap']))<script.py> output:US: 809AUS: 731JPN: 588...

在DataFrame中添加列

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)for lab, row in cars.iterrows(): # 添加国家列的循环代码cars.loc[lab, "COUNTRY"] = row['country'].upper()
print(cars)<script.py> output:cars_per_cap        country  drives_right        COUNTRYUS            809  United States          True  UNITED STATESAUS           731      Australia         False      AUSTRALIAJPN           588          Japan         False          JAPAN...

If you want to add a column to a DataFrame by calling a function on another column, the iterrows() method in combination with a for loop is not the preferred way to go. Instead, you’ll want to use apply().
如果您想通过调用另一列上的函数来将一列添加到DataFrame中，那么iterrows()方法结合for循环不是首选的方法。相反，您需要使用apply()。

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)for lab, row in cars.iterrows() : # 使用.apply(str.upper)cars["COUNTRY"] = cars["country"].apply(str.upper)
print(cars)

Case: Hacker Statistics 黑客统计

Randomness has many uses in science, art, statistics, cryptography, gaming, gambling, and other fields. You’re going to use randomness to simulate a game.
随机性在科学、艺术、统计学、密码学、游戏、赌博和其他领域有很多用途。你将使用随机性来模拟游戏。

All the functionality you need is contained in the random package, a sub-package of numpy. In this exercise, you’ll be using two functions from this package:
您需要的所有功能都包含在random包中，它是numpy的子包。在这个练习中，您将使用这个包中的两个函数：

seed(): sets the random seed, so that your results are reproducible between simulations. As an argument, it takes an integer of your choosing. If you call the function, no output will be generated. 设置随机种子，使您的结果是重复之间的模拟。作为一个参数，它取你选择的整数。如果调用该函数，则不会生成任何输出。
rand(): if you don’t specify any arguments, it generates a random float between zero and one. 如果不指定任何参数，它将生成0到1之间的随机浮点数。

import numpy as np
np.random.seed(123) # Set the seed
print(np.random.rand())<script.py> output:0.6964691855978616

使用randint()随机一个整数：

import numpy as np
np.random.seed(123)
print(np.random.randint(1, 7)) # 使用randint()来模拟骰子
print(np.random.randint(1, 7))<script.py> output:63

你是否在帝国大厦游戏中获胜，取决于你每一步骰子的点数，使用循环语句模拟骰子：

import numpy as np
np.random.seed(123)
step = 50
dice = np.random.randint(1, 7)if dice <= 2 :step = step - 1
elif dice <= 5 :step = step + 1
else :step = step + np.random.randint(1,7)print(dice)
print(step)<script.py> output:653

Before, you have already written Python code that determines the next step based on the previous step. Now it’s time to put this code inside a for loop so that we can simulate a random walk.
在此之前，您已经编写了Python代码，它根据前面的步骤确定下一步。现在我们把它放在for循环中进行随机游走：

import numpy as np
np.random.seed(123)
random_walk = [0]for x in range(100) :step = random_walk[-1] # random_walk中的最后一个元素dice = np.random.randint(1,7)if dice <= 2:step = step - 1elif dice <= 5:step = step + 1else:step = step + np.random.randint(1,7)random_walk.append(step) # 将next_step追加到random_walk
print(random_walk)<script.py> output:[0, 3, 4, 5, 4, 5, 6, 7, 6, 5, 4, 3, 2, 1, 0, -1, ..., 57, 58, 59]

Things are shaping up nicely! You already have code that calculates your location in the Empire State Building after 100 dice throws. However, there’s something we haven’t thought about - you can’t go below 0!
你已经有了在掷100次骰子后计算你在帝国大厦位置的代码。然而，有些事情我们还没有考虑到——你不能低于0！解决这类问题的典型方法是使用max()。

import numpy as np
np.random.seed(123)
random_walk = [0]for x in range(100) :step = random_walk[-1]dice = np.random.randint(1,7)if dice <= 2:step = max(0, step - 1) # 使用max确保step不低于0elif dice <= 5:step = step + 1else:step = step + np.random.randint(1,7)random_walk.append(step)
print(random_walk)<script.py> output:[0, 3, 4, 5, 4, 5, 6, 7, 6, 5, 4, 3, 2, 1, 0, 0, ..., 58, 59, 60]

绘制折线图：

import numpy as np
np.random.seed(123)
random_walk = [0]for x in range(100) :step = random_walk[-1]dice = np.random.randint(1,7)if dice <= 2:step = max(0, step - 1)elif dice <= 5:step = step + 1else:step = step + np.random.randint(1,7)random_walk.append(step)import matplotlib.pyplot as plt
plt.plot(random_walk)
plt.show()

A single random walk is one thing, but that doesn’t tell you if you have a good chance at winning the bet. To get an idea about how big your chances are of reaching 60 steps, you can repeatedly simulate the random walk and collect the results. That’s exactly what you’ll do in this exercise.
一次随机游走并不能告诉你你是否有很大的机会赢得这场胜利。为了了解达到60级台阶的可能性有多大，可以重复模拟随机行走并收集结果。这就是你们在这个练习中要做的。

import numpy as np
np.random.seed(123)
all_walks = []for i in range(10) : # 模拟随机行走10次random_walk = [0]for x in range(100) :step = random_walk[-1]dice = np.random.randint(1,7)if dice <= 2:step = max(0, step - 1)elif dice <= 5:step = step + 1else:step = step + np.random.randint(1,7)random_walk.append(step)all_walks.append(random_walk)import matplotlib.pyplot as plt
np_aw = np.array(all_walks)
plt.plot(np_aw)
plt.show()
plt.clf()
np_aw_t = np.transpose(np_aw) # 转置np_aw
plt.plot(np_aw_t)
plt.show()

You’re a bit clumsy and you have a 0.1% chance of falling down. That calls for another random number generation. Basically, you can generate a random float between 0 and 1. If this value is less than or equal to 0.001, you should reset step to 0.
除此之外，你有0.1%的几率摔倒。这就需要产生另一个随机数。基本上，您可以生成0到1之间的随机浮点数。如果该值小于或等于0.001，则需要从头开始。

import numpy as np
np.random.seed(123)
all_walks = []for i in range(250) :random_walk = [0]for x in range(100) :step = random_walk[-1]dice = np.random.randint(1,7)if dice <= 2:step = max(0, step - 1)elif dice <= 5:step = step + 1else:step = step + np.random.randint(1,7)if np.random.rand() <= 0.001 : # Implement clumsinessstep = 0random_walk.append(step)all_walks.append(random_walk)import matplotlib.pyplot as plt
np_aw_t = np.transpose(np.array(all_walks))
plt.plot(np_aw_t)
plt.show()

All these fancy visualizations have put us on a sidetrack. We still have to solve the million-dollar problem: What are the odds that you’ll reach 60 steps high on the Empire State Building?
所有这些花哨的视觉化使我们偏离了轨道。我们仍然要解决这个百万美元的问题:你爬上帝国大厦60级台阶的几率有多大？

Basically, you want to know about the end points of all the random walks you’ve simulated. These end points have a certain distribution that you can visualize with a histogram.
基本上，你想知道你模拟的所有随机游动的终点。这些端点有一定的分布，你可以用直方图来表示。

import numpy as np
np.random.seed(123)
all_walks = []
for i in range(500) :random_walk = [0]for x in range(100) :step = random_walk[-1]dice = np.random.randint(1,7)if dice <= 2:step = max(0, step - 1)elif dice <= 5:step = step + 1else:step = step + np.random.randint(1,7)if np.random.rand() <= 0.001 :step = 0random_walk.append(step)all_walks.append(random_walk)import matplotlib.pyplot as plt
np_aw_t = np.transpose(np.array(all_walks))
ends = np_aw_t[-1,:] # 选取np_aw_t最后一个点
plt.hist(ends)
plt.show()

np.mean(ends >= 60)<script.py> output:0.784

笔记：Intermediate Python相关推荐

DataCamp的intermediate python学习笔记(001)
DataCamp DataScientist系列之intermediate python的学习笔记(001) 个人感悟:接触python是从2017年1月开始的,中间的学习之路也是断断续续的,学了忘, ...
python编程语言继承_python应用：学习笔记（Python继承）
学习笔记(Python继承)Python是一种解释型脚本语言,可以应用于以下领域: web 和 Internet开发科学计算和统计人工智能教育桌面界面开发后端开发网络爬虫有几种叫法(父类 ...
python里面两个大于号_【课堂笔记】Python常用的数值类型有哪些？
学习了视频课程<财务Python基础>,小编特为大家归纳了Python常用的数值类型和运算符,大家一起来查缺补漏吧~~ 数值类型整型(int):整型对应我们现实世界的整数,比如1,2,1 ...
数据类型不匹配_笔记 | 自学Python 06：数据类型之列表
列表是一种用于保存一系列有序项目的集合,也就是说,你可以利用列表保存一串项目的序列. 想象起来也不难,你可以想象你有一张购物清单,上面列出了需要购买的商品,除开在购物清单上你可能为每件物品都单独列一行 ...
qstring截取一段字符串_笔记 | 自学Python 05：数据类型之字符串
3.2 String (字符串) 字符串,就是一个个字符组成的有序的序列,是字符的集合,在python中通常使用单引号.双引号和三引号引住的字符序列,由数字.字母.下划线组成.从以下6个方面来了解: ...
c语言字符串截取_笔记 | 自学Python 05：数据类型之字符串
字符串,就是一个个字符组成的有序的序列,是字符的集合,在python中通常使用单引号.双引号和三引号引住的字符序列,由数字.字母.下划线组成. 从以下6个方面来了解: ①字符串是不可变的上一期我们介 ...
python3.4学习笔记(九) Python GUI桌面应用开发工具选择
python3.4学习笔记(九) Python GUI桌面应用开发工具选择 Python GUI开发工具选择 - WEB开发者 http://www.admin10000.com/document/9 ...
python3.4学习笔记(八) Python第三方库安装与使用，包管理工具解惑
python3.4学习笔记(八) Python第三方库安装与使用,包管理工具解惑许多人在安装Python第三方库的时候, 经常会为一个问题困扰:到底应该下载什么格式的文件? 当我们点开下载页时, 一 ...
Python+Selenium学习笔记5 - python官网的tutorial - 交互模式下的操作
这篇笔记主要是从Python官网的Tutorial上截取下来,再加上个人理解 1. 在交互模式下,下划线'_'还可以表示上一步的计算结果 2.引号转义问题. 从下图总结的规律是,字符串里的引号如果和引 ...
python爬取b站视频封面_学习笔记(4)[Python爬虫]：爬取B站搜索界面的所有视频的封面...
学习笔记(4)[Python爬虫]:爬取B站搜索界面的所有视频的封面 import os import requests import re import json from bs4 import B ...

笔记：Intermediate Python