Module 7: Bucketing and pivoting numeric data 1

前言
下载资料
7.0 本周知识点介绍
7.1 Pattern 1: Filtering and aggregating csv data
- 7.1.1 小例子：聚合
- 7.1.2 分隔，索引和跳过第一行 The split method the index operator and skip the first line
- 7.1.3 Question：Summer rain 动动手动动脑
7.2 Pattern 2: Field-dependent aggregation
- 7.2.1 小例子：字典方法聚合
- 7.2.2 什么是字典类型
- 7.2.3 Updating and creating entries & Testing for existing keys & Formatting output from dictionaries
- 7.2.4 Question： 9am humidity 动动手动动脑
7.3 Pattern 3: Simple slice-and-dice
- 7.3.1 小例子：切片和切块
- 7.3.2 缺少或者空的条目 Missing or empty aggregates
- 7.3.3 Question :Where does the wind come from? 动动脑动动手你是我的好朋友
7.4 Pattern 4: Data binning
- 7.4.1 小例子：数据分箱 Data binning
- 7.4.2 Aggregating in lists & Alternative aggregation
- 7.4.3 Question：Temperature ranges 动动手动动脑
总结

前言

第七章以后的内容会以全新的方式进行讲解，看起来更通俗易懂，轻松的知道这章节讲什么，要学什么，代码要怎么写。Module3 到 Module6 等有时间了会进行更新和优化。
我尽可能做详细，每个步骤讲清楚。答案不止一种，有大神可以留言。其他章节在我的主页可以查看。文中空行和#注释不算讲解的代码行数，查代码行数的时候可以直接跳过。Question的代码不要直接copy，多动脑，多动手，多思考，你就是小可爱。

下载资料

数据文件链接:
climate_data_Dec2017.csv climate_data_2017.csv

7.0 本周知识点介绍

在这周的模组里将学习处理数字类型的数据，表格数据常见的存储方式是逗号分隔CSV格式。
表格数据例子：

A	B	C
12	34	56
78	90	01

CSV格式数据例子：

A , B , C
12 , 34 , 56
78 , 90 , 01

为了高效的处理数据，引用新的数据容器类型 direction，字典类似于在上一个模块中介绍的列表list，但它是无序的，将每个值value映射到指定的键key，而不是使用位置索引。在本模块中，将展示如何（以及如何不）以有效的方式使用字典来存储和处理任意类型的数据。
要学会：

过滤和聚合 CSV 数据 （Filtering and aggregating CSV data）
场相关聚合 （Field-dependent aggregation）
简单的切片 （Simple slice-and-dice）
数据分箱 (Data binning)

7.1 Pattern 1: Filtering and aggregating csv data

这小章节可以分成三个部分：

找出大风天平均最高温度的例子，聚合的方法编写程序。
用split分隔和index操作代码和跳过第一行（first line）
Question

7.1.1 小例子：聚合

这里有澳大利亚气象站的数据xxxxx，想找到大风天平均最高的温度，也就是风速超过60公里每小时的日子。发现温度记录在第6列，风速记录在第11列。下面是程序：

# 首先需要初始化一些变量方便后面的计算
temp = 0 # 设置temp变量为了聚合温度数据
count = 0 # 为了持续跟踪温度，并累计次数
is_first_line = True # 循环跳过第一行for row in open('climate_data_Dec2017.csv'): if is_first_line:is_first_line = False # 设置第一次循环为false，跳过第一行else:values = row.split(',') # 将每一行拆分成一个包含各个列的列表，逗号作为分隔符wind_speed = float(values[10]) # 在文件中提取风速的数据，在第11列就是列表的第10个。if wind_speed > 60: # 如果风速大于60执行下面语句count += 1 # 计数循环加一，为了后面求平均值用temp += float(values[5]) # 将符合条件的温度叠加在一起，求平均值用mean = temp / count # 总温度除以他们的数量求得平均值
print("Average max. temperature on windy days:", mean)-----输出结果-----
Average max. temperature on windy days: 26.1

7.1.2 分隔，索引和跳过第一行 The split method the index operator and skip the first line

The split method reviewed

链接: 菜鸟教程：split方法.

可以通过split 的方法提供分隔参数，进行数据的分隔。

row = "2017-01-01,22.7,26.6"
values = row.split(',')
print(values)-----------输出结果----------
['2017-01-01', '22.7', '26.6']

当在row里遇到一个新的逗号时，会split形成一个新的列表元素。因此，对于每行都有逗号分隔值的表格数据，可以使用此技术提取单个值。

The index operator reviewed

列表和字符串这样的桑上局容器支持索引，是有序的有固定的位置。将元素在列表或字符串中的位置称为索引。Python 中的索引，实际上是大多数编程语言，都是从零开始的。因此，索引1实际上是指第二个元素。

row = "2017-01-02,21.2,26.3,0.4,3.4"
Values = row.split(',')
print(row[1]) # 结果是2017里面的0
print(Values[1]) # 结果是21.2----------Output----------
0
21.2

反向递减索引方法: 索引指的是最后一个元素、倒数第二个元素、倒数第三个元素，依此类推。这种表示法非常方便，因为如果想要访问接近容器末尾的元素，需要注意不要请求超出容器范围的索引： -1-2-3

row = "2017-01-02,21.2,26.3,0.4,3.4"
Values = row.split(',')
print(Values[-1]) # 倒数第一个结果3.4
print(Values[-2]) # 倒数第二个结果0.4----------Output----------
3.4
0.4

Skipping lines in files

在示例中使用的文件有一个标题行。该行包含对数据的描述，而不是数据本身，因此不想在该行上执行过滤和转换。为了跳过这一行，创建了一个布尔变量，告诉是否正在处理文件中的第一行：

# 设置is_first_line为True 为了在开始for循环的时，第一次进行if语句告诉is_first_line是False，就不会读取第一行了
# 从技术上来说，并没有真正的跳过第一行，只是仍在读取第一行，但是没有处理它，其余的都在else模块里了。
is_first_line = True
for row in open("climate_data_Dec2017.csv"):if is_first_line:is_first_line = Falseelse:values = row.split(',')wind_speed = float(values[10])if wind_speed > 65:print(values[0], values[2],wind_speed)----------Output----------
2017-12-02 Wollongong 87.0
2017-12-03 Perth 70.0
2017-12-05 Wollongong 69.0

第二种方法跳过第一行： 在这里，line_count最初设置为零，然后在每次迭代结束时将其递增。该if语句仅在的 count 变量至少增加一次后才计算为真，这意味着不会if在循环的第一次迭代中进入该块。通过使用计数变量，原则上可以根据需要跳过任意数量的行 - 在文件的开头和结尾，以跳过，例如页脚行。

line_count = 0
for row in open("climate_data_Dec2017.csv"):if line_count > 0:values = row.split(",")wind_speed = float(values[10])if wind_speed > 65:print(values[0] , values[2] , wind_speed)line_count +=1----------Output----------
2017-12-02 Wollongong 87.0
2017-12-03 Perth 70.0
2017-12-05 Wollongong 69.0

7.1.3 Question：Summer rain 动动手动动脑

要求：

There’s nothing like a refreshing gush of rain after a sweltering hot summer day. Unfortunately, this relief from the heat isn’t a very common occurrence. For the full climate record of 2017, your task is to find the maximum amount of rainfall on days where the maximum temperature was above 35 degrees Celsius.

We have downloaded the full climate record of 2017 from Sydney weather stations and stored the data in a file called climate_data_2017.csv. The maximum temperature record is in the third column (index 2) and the rainfall is in the fourth column (index 3).

Let your program print out the result within the following sentence:

Maximum amount of rainfall on hot days: 3.8 mm
Make sure your program output matches this sample output exactly in capitalisation, punctuation and whitespaces.

要求重点： 任务是找到最高气温高于35度日子里的最大降雨量，最高温度在第三列（index2），降雨量在第四列（index3）。确保输出结果在大小写，标点和空格与实例结果一样。
实例结果：Maximum amount of rainfall on hot days: 3.8 mm

思路： 题目要求找出最大值，要用到max()函数，开始设置一个空变量用来对比出最大值。然后跳过第一行，逗号分隔，找出符合条件的温度，对比大小，通过max()找出最大值。

max_rain = 0
is_first_line = Truefor row in open("climate_data_2017.csv"):if is_first_line:is_first_line = Falseelse:values = row.split(",")temp = float(values[2])if temp > 35:rain = float(values[3])max_rain = max(max_rain, rain)print("Maximum amount of rainfall on hot days: "+str(max_rain)+" mm")

7.2 Pattern 2: Field-dependent aggregation

小章节重点：

用Python字典来聚合数据的例子和步骤讲解。
什么是字典类型
更新，创建条目，测试keys和格式化字典输出
Question 9am humidity 动动手动动脑

7.2.1 小例子：字典方法聚合

在想要聚合按另一个字段中的值分组的某些字段的场景中，可以使用 Python 的字典，它是一种类似于列表和字符串的数据容器。

例如，假设要查找每个风向的风速记录总数；即北 (N)、北-东北 (NNE)、东北 (NE) 等。

实例代码：

wind_directions = {} # 首先创建一个空字典
is_first_line = True # 初始化一个布尔值，用于跳过第一行for row in open("climate_data_Dec2017.csv"): # for in循环遍历文件if is_first_line: # 第一行为False，遍历的时候跳过第一行is_first_line = Falseelse:values = row.split(",") # 拆分列表，逗号为参数，逗号分隔wdir = values[9] # 找出wind directionif wdir in wind_directions: # 使用in检查在wind_directions字典中是否已经有与当前迭代的风向wdir对应的条目wind_directions[wdir] += 1 # 如果有累计加 1else: # 如果没有 创建一个新的值为 1wind_directions[wdir] = 1print("Wind directions by number of days:")
print(wind_directions)----------OUTPUT----------
Wind directions by number of days:
{'NE': 8, 'E': 2, 'N': 2, 'W': 5, 'NNE': 4, 'NNW': 5, 'NW': 6, 'SSW': 13, 'S': 9, 'WNW': 10, 'SW': 4, 'SSE': 5, 'ENE': 4, 'WSW': 6, 'SE': 7, 'ESE': 2}

7.2.2 什么是字典类型

Python 中的字典由两个基本元素组成：一个键和一个对应的值。称键值对为item。每个键与其值用冒号 ( : ) 分隔，项目之间用逗号分隔。字典中的所有项目都用大括号括起来{}。

创建字典的基本语法是： dictionary = {key1: value1, key2: value2}

更详细字典介绍请参考：
链接: 菜鸟教程：Python3 字典
链接: 【Python学习笔记】第六章容器类型的数据 6.5 字典

7.2.3 Updating and creating entries & Testing for existing keys & Formatting output from dictionaries

Updating and creating entries 更新和创建条目

更新条目： 通过重新分配来更新字典中的条目：

wind_directions = {'E' : 6 , 'C' : 8 , 'S' : 9}
print('Before:' , wind_directions['E'])
wind_directions['E'] *= 3 # E对应的键值是6 ，更新后就是3 * 6
print('After:' , wind_directions['E'])---------OUTPUT----------
Before: 6
After: 18

创建条目： 通过在方括号运算符中使用新键并为其分配值来在字典中创建新项目

wind_directions = {"E": 6, "SE": 2, "SSE": 6}
wind_directions["SW"] = 3 # 创建新条目包含原来的条目加上新加的条目
print(wind_directions)----------OUTPUT----------
{'E': 6, 'SE': 2, 'SSE': 6, 'SW': 3}

Testing for existing keys 测试现有密钥

每当迭代聚合中使用字典时，通常会为每个步骤执行以下两件事之一：
创建一个新条目 Create a new entry
更新现有条目 Update an existing entry

为了检查给定的键是否已经存在于字典中，可以使用这样的in关键字：

wind_directions = {"S": 2, "ESE": 1, "WSW": 1}
if "S" in wind_directions:print("S:", wind_directions["S"])----------OUTPUT----------
S: 2

通过检查一个给定的键是否已经存在于字典中，可以确保不会覆盖它的值，这在开始时的聚合任务中是至关重要的。这里的一般方法是创建一个新的条目如果该键不存在，如果它更新条目确实存在。

Formatting output from dictionaries 格式化字典的输出

为了以更好的方式格式化字典的内容，要做两件事：

以固定顺序打印字典项（由它们的键指定）

在一行上打印每个项目

对于第一点，需要对字典键进行排序。如果处理字符串，希望它们按字母顺序排序，如果有数字，希望它们根据它们的值进行排序。幸运的是，有一个内置的 Python 函数称为sorted，其工作方式如下：

letters = ["NE", "ENE", "NNE", "W", "E"]
numbers = [5, 2, 1, 4, 3]
print(sorted(letters))
print(sorted(numbers))----------OUTPUT----------
['E', 'ENE', 'NE', 'NNE', 'W']
[1, 2, 3, 4, 5]

7.2.4 Question： 9am humidity 动动手动动脑

For the climate data of the first week of December 2017, your task is to find the highest value of the 9am humidity (stored in column 12, starting from zero) for each state.

Using the provided climate_data_Dec2017.csv file, create a dictionary and aggregate the humidity records as values with the states as keys.

Let your program print out all dictionary items sorted by their key and with one item per line. The output should look like this:
NSW : 100.0
NT : 70.0
QLD : 99.0
SA : 84.0
VIC : 98.0
WA : 89.0
If you’re unsure how to start this problem, take another close look at the example on the two previous slides.

要求： 您的任务是找出每个州上午 9 点湿度的最大值（存储在第 12 列），创建一个字典并将湿度记录聚合为值，并将状态作为键。

思路： 创建一个新字典存储键值对，跳过第一行，split分隔分离数据，索引humidity和state的值，用Testing for existing keys的方法进行判断（上面讲过），max的方法找出最大值，sorted排序。

代码示例：多理解，多动手，多动脑，不要copy。

states = {}
is_first_line = Truefor row in open("climate_data_Dec2017.csv"):if is_first_line:is_first_line = Falseelse:values = row.split(",")humidity = float(values[12])state = values[1]if state in states:states[state] = max(humidity, states[state])else:states[state] = humidityfor key in sorted(states):print(key, ":", states[key])

7.3 Pattern 3: Simple slice-and-dice

小章节重点：

用切片和切块的方法讲述例子
缺少或者空的条目
Question Where does the wind come from? 动动手动动脑

7.3.1 小例子：切片和切块

新的方法通过将聚合与过滤相结合来扩展之前字典聚合特定字段。

假设想找到每个州在上午 9 点（列9am relative humidity (%)）的相对湿度高于 60% 的日子里的最高日降雨量。（使用州名作为字典键）

实例代码：

# 这个例子是7.2.1例子扩展模式，首先创建一个空字典和布尔变量，跳过文件的标题
max_rain_per_state = {}
is_first_line = Truefor row in open("climate_data_Dec2017.csv"): # 遍历循环打开文件if is_first_line:is_first_line = False # 跳过标题行else:values = row.split(",") # split函数进行分离state = values[1] # 获取州的值rain = float(values[6]) # 获取降雨量，记得要float转换humidity = float(values[12]) # 获取湿度值，记得要float转换if humidity > 60: # 条件判断，如果湿度大于60执行下面模块if state in max_rain_per_state: # 使用Testing for existing keys的方法（参考7.2.3）max_rain_per_state[state] = max(rain, max_rain_per_state[state])else:max_rain_per_state[state] = rainprint("Maximum rainfall per state on humid days:")
print(max_rain_per_state)----------OUTPUT----------
Maximum rainfall per state on humid days:
{'NSW': 19.6, 'VIC': 4.6, 'QLD': 54.8, 'NT': 20.0, 'SA': 2.4}

7.3.2 缺少或者空的条目 Missing or empty aggregates

如果在分配字典值之前执行过滤.

很容易遗漏字典键。现在可能是在某些情况下，并不关心是否完全丢失了键，因为没有对应于该键的数据满足的过滤标准。

对现有示例的一个相当简单的修改可能是使用一个特殊值（当存在满足过滤标准的数据时，它永远不会发生）。在这个问题中，可能是一个合适的特殊值（有时称为哨兵）。为了解决这个问题，可以在的代码中改变字典赋值和过滤语句的顺序。例如： -1

代码示例：

max_rain_per_state = {}
is_first_line = Truefor row in open("climate_data_Dec2017.csv"):if is_first_line:is_first_line = Falseelse:values = row.split(",")state = values[1]rain = float(values[6])humidity = float(values[12])if state not in max_rain_per_state:if humidity > 60:max_rain_per_state[state] = rainelse:max_rain_per_state[state] = -1else:if humidity > 60:     max_rain_per_state[state] = max(rain, max_rain_per_state[state])print("Maximum rainfall per state on humid days:")
print(max_rain_per_state)

7.3.3 Question :Where does the wind come from? 动动脑动动手你是我的好朋友

Using the provided climate_data_Dec2017.csv file, your task is to find out how many readings there were across our recorded weather stations for each wind direction on the 26th of December.

The file contains data for more than this particular day, so you will have to use a filtering statement to make sure you only include readings from the 26th.

Aggregate the number of readings in a dictionary. If there aren’t any records for a particular wind direction for 26th of December, it should still appear in the dictionary and have the value zero, as long as there is at least one record for that particular wind direction for other days.

Let your program print out all dictionary items sorted by their key and with one item per line. The output should look like this:
E : 4
ENE : 2
ESE : 1
N : 0
NE : 1
NNE : 1
NNW : 1
NW : 0
S : 0
SE : 3
SSE : 3
SSW : 0
SW : 1
W : 0
WNW : 0
WSW : 0

题目重点和要求： 找出 12 月 26 日每个风向在记录的气象站中有多少读数。汇总字典中的阅读次数。如果 12 月 26 日没有特定风向的任何记录，它仍应出现在字典中并具有零值，只要该特定风向在其他日子至少有一条记录。

代码思路： 设置空字典和布尔值跳过第一行，split函数分隔，两个条件判断，第一个判断键值对是否在里面，如果不在则等于0，第二个判断日期，如果在则加一，sorted排序。

实例代码：

wind_directions = {}
is_first_line = Truefor row in open("climate_data_Dec2017.csv"):if is_first_line:is_first_line = Falseelse:values = row.split(",")wdir = values[9]date = values[0]if wdir not in wind_directions:wind_directions[wdir] = 0if date == "2017-12-26":wind_directions[wdir] += 1for key in sorted(wind_directions):print(key, ":", wind_directions[key])

7.4 Pattern 4: Data binning

小章节重点：

小例子：数据分箱
列表聚合和代替聚合
Question：Temperature ranges 动动手动动脑

7.4.1 小例子：数据分箱 Data binning

数据分箱是一种常见的技术，其中数据在分箱之间进行划分，其中每个分箱包含与某个数据字段中的一系列值相对应的信息。例如，一个 bin 可能包含来自温度在 10 到 15 度之间的那些行的信息，另一个 bin 可能包含 15 到 20 度的温度等。

例子要求：对于每个月的数据，希望找到悉尼气象站在该月记录的平均最高温度。这里每个 bin 对应一个月份，也就是说，它反映了所有日期的信息，从该月的第一天到最后一天。

实例代码：

monthly_max_temps = {} # 创建新字典，值将是每个月的温度列表
is_first_line = True # 跳过第一行for row in open("climate_data_2017.csv"):if is_first_line: # 跳过标题is_first_line = Falseelse:values = row.split(",") # 对于每一行数据，使用split字符串方法将该行拆分为一个值列表city = values[2] # 城市字段if city == "Sydney":  # 现在所有剩余的代码都在这个if块中，所以每当遇到一个不同于悉尼的城市时，就直接跳到循环的下一次迭代date = values[0] # 对于包含悉尼数据的每一行，从值列表中提取所需的温度和日期：
# 因为将温度用于数值目的（计算平均值），所以立即将其转换为浮点数temp = float(values[5])# 为了汇总每个月的温度，日期格式为"yyyy-mm-dd"，因此为了提取月份，可以调用split带参数的方法"-"将日期拆分为年、月和日month = date.split("-")[1]# 一旦提取了月份，会检查字典，看看是否已经有了这个月的条目if month not in monthly_max_temps:monthly_max_temps[month] = [temp] # 如果月份还没有出现在的字典中，使用当前月份作为键为它创建一个新条目，并将温度存储在一个单元素列表中else:monthly_max_temps[month].append(temp) # 通过在变量周围放置方括号，创建了一个包含单个元素的 Python 列表。一旦在字典中存储了一个列表，就可以将该月的每个新温度附加到这个列表中print("Average maximum temperatures per month:")
for key in sorted(monthly_max_temps):temps = monthly_max_temps[key]print(key, ":", sum(temps)/len(temps)) # 为了计算每个月的平均温度，首先提取给定 的温度列表key，然后使用内置sum函数将所有元素相加，最后将总和除以元素数量（由 给出len） -----------OUTPUT----------
Average maximum temperatures per month:
06 : 18.163333333333338
07 : 19.132258064516133
08 : 19.532258064516128
09 : 23.263333333333343
10 : 23.967741935483872
11 : 23.819999999999997

7.4.2 Aggregating in lists & Alternative aggregation

Aggregating in lists：

示例有多层聚合：使用字典来存储每个月的温度，然后使用一个列表来存储属于该月的单个每日温度。

为了聚合列表中的值，首先使用遇到的第一个温度创建一个列表。一旦在一个变量周围放上一对方括号，Python 就会创建一个列表，该列表将该变量作为单个元素保存。看看这个：

temp = 27.5
temps = [temp]
print(type(temp), type(temps))----------OUTPUT----------
<class 'float'> <class 'list'>

Alternative aggregation：

为了计算一系列值的平均值，我们通常需要知道两件事：这些值的总和以及总共有多少个值。在示例中，我们使用列表来存储所有值，然后我们使用内置函数sum并len为我们提供两个数量。

我们可以通过两种替代方式实现相同的目的：首先通过执行总和，并跟踪组合字典值中的总和和值的数量，或者使用两个单独的字典来计算总和和数量值。
仅存储值的总和和数量
对于第一种选择，我们可以使用一个双元素列表来存储总温度和值的数量：

monthly_max_temps = {}
is_first_line = Truefor row in open("climate_data_2017.csv"):if is_first_line:is_first_line = Falseelse:values = row.split(",")city = values[2]if city == "Sydney":date = values[0]temp = float(values[5])month = date.split("-")[1]if month in monthly_max_temps:monthly_max_temps[month][0] += tempmonthly_max_temps[month][1] += 1else:monthly_max_temps[month] = [temp, 1]print("Average maximum temperatures per month:")
for key in sorted(monthly_max_temps):temps = monthly_max_temps[key]print(key, ":", temps[0]/temps[1])

使用两本词典
我们上面建议的第二种选择是使用两个单独的字典来分别跟踪值的总和和数量。看看下面的代码：

summed_temps = {}
number_of_values = {}
is_first_line = Truefor row in open("climate_data_2017.csv"):if is_first_line:is_first_line = Falseelse:values = row.split(",")city = values[2]if city == "Sydney":date = values[0]temp = float(values[5])month = date.split("-")[1]if month in summed_temps:summed_temps[month] += tempnumber_of_values[month] += 1else:summed_temps[month] = tempnumber_of_values[month] = 1print("Average maximum temperatures per month:")
for key in sorted(summed_temps):print(key, ":", summed_temps[key]/number_of_values[key])

7.4.3 Question：Temperature ranges 动动手动动脑

Using the provided climate_data_Dec2017.csv file, your task is to bin the temperature data into 5-degree ranges and find out which wheather stations, that is, which city, fell in which 5-degree range on the 25th of December 2017.

The file contains data for more than this particular day, so you will have to use a filtering statement to make sure you only include readings from the 25th.

For each city, use the maximum temperature recording from column 5 (counting from zero) and find out in which bin it belongs.

We want to use integer numbers from 0 to 7 to specify the bins. Bin 0 will aggregate cities where the were temperatures between 0 and 5, bin 1 between 5 and 10 and so on. You can calculate in which bin a given temperature belongs as follows:
temp = 23.5
print(temp // 5)
----------OUTPUT----------
4.0
To get the bin number, you need to divide the temperature by 5 in a way that ensures that a whole number is the result. This can be done using what is called “floor division” indicated with double-slashes. In this example, we get 4.0, which indicates the 20-25 degrees range.

Use a dictionary with the bin numbers as keys and aggregate the cities for each bin in individual lists. After the aggregation, print out the dictionary items sorted by their keys. The individual dictionary items (the lists) should also be sorted. Overall, the output of your program should look like this:
4.0 : ['Ballarat', 'Canberra', 'Geelong', 'Melbourne', 'Newcastle', 'Wollongong']
5.0 : ['Adelaide', 'Albury', 'Bendigo', 'Darwin']
6.0 : ['Cairns', 'Perth', 'Sunshine C', 'Toowoomba', 'Townsville']
7.0 : ['Brisbane', 'Gold Coast']
There aren’t any temperatures below 20 degrees recorded on that day, so the lowest bin index your program should find is 4.0.

In order to help with the output formatting, we have provided some skeleton code for you in the editor on the right. You can change the name of the dictionary if you want, but if you do remember to change it for the printing as well.

代码示例：（自己思考)

temp_bins = {}
is_first_line = Truefor row in open("climate_data_Dec2017.csv"):if is_first_line:is_first_line = Falseelse:values = row.split(",")date = values[0]if date == "2017-12-25":temp = float(values[5])bin_num = temp // 5city = values[2]if bin_num in temp_bins:temp_bins[bin_num].append(city)else:temp_bins[bin_num] = [city]for key in sorted(temp_bins):print(key, ":", sorted(temp_bins[key]))

总结

多思考，多练习，多举一反三。
如果博客中有小细节讲错了，或者讲的不好，你觉得有更好的思路可以在评论区或者私信联系我，共同进步。

USYD悉尼大学DATA1002 详细作业解析Module7（全新讲解）相关推荐

从本科作业到Nature子刊：悉尼大学大二学生突破困扰量子计算近20年的纠错码难题...
别人家孩子的本科生涯:悉尼大学的一位本科生在大二写物理作业时「一不小心」解决了一个量子计算难题,相关论文刚刚登上了<自然 - 通讯>杂志. >>>> 一作.悉尼大学 ...
悉尼大学计算机专业本科2019,悉尼大学2019 S1官方校历时间表……
原标题:悉尼大学2019 S1官方校历时间表-- 悉尼大学 2019 S1校历 2019 S1 USYD Key Date 2019 年第一学期校历一览表 2019.02.20-22 O-week ...
视觉+Transformer最新论文出炉，华为联合北大、悉尼大学发表
作者 | CV君来源 | 我爱计算机视觉 Transformer 技术最开始起源于自然语言处理领域,但今年5月份Facebook 的一篇文章将其应用于计算机视觉中的目标检测(DETR算法,目前已有7 ...
悉尼大学陶大程：遗传对抗生成网络有效解决GAN两大痛点
来源:新智元本文共7372字,建议阅读10分钟. 本文为你整理了9月20日的AI WORLD 2018 世界人工智能峰会上陶大程教授的演讲内容. [ 导读 ]悉尼大学教授.澳大利亚科学院院士.优必选 ...
悉尼大学计算机工程专业世界排名,2019QS澳洲计算机专业排名，7所大学进入世界百强！...
原标题:2019QS澳洲计算机专业排名,7所大学进入世界百强! 说起计算机专业,很多学生会联想到好就业薪水丰厚,不仅是国内,在全球来看,计算机专业人才都非常受欢迎,所以这几年出国留学就读计算机专业学生 ...
华为联合北大、悉尼大学对 Visual Transformer 的最新综述
Transformer 技术最开始起源于自然语言处理领域,但今年5月份Facebook 的一篇文章将其应用于计算机视觉中的目标检测(DETR算法,目前已有78次引用)使其大放异彩,并迅速得到CV研究社 ...
东京大学商汤悉尼大学等提出融合了动态规划、分治算法的MIM，实现绿色高效层次Transformer！已开源！...
关注公众号,发现CV技术之美本文分享论文『Green Hierarchical Vision Transformer for Masked Image Modeling』,由东京大学&商汤& ...
悉尼大学计算机专业本科2019,悉尼大学开学时间是什么时候？2019-2020年时间表介绍...
悉尼大学每年有两个开学时间,分别在2月底和8月初,具体的日期并未规定,每年都会有所差别,以下是2019-2020年悉尼大学课程时间表详细规定-- 一.2019年悉尼大学课程时间表 1.第一学期 2月1 ...
悉尼大学计算机研究生学制,悉尼大学研究生学制
澳大利亚悉尼大学具有丰富的研究生专业课程,学制安排一般在1-2年时间. 悉尼大学硕士申请要求要求非211大学申请者,暂不需清华认证 (毕业证.学位证.成绩单) 入学要求: 工程类专业(Enginee ...
澳大利亚悉尼大学徐畅教授招收深度学习方向全奖博士生
来源:AI求职悉尼大学悉尼大学(The University of Sydney),坐落于澳大利亚新南威尔士州首府悉尼,是研究型大学.悉尼大学注重理论与实践相结合,教育.法学.医学.会计与金融 . ...

USYD悉尼大学DATA1002 详细作业解析Module7（全新讲解）

Module 7: Bucketing and pivoting numeric data 1

前言

下载资料

7.0 本周知识点介绍

7.1 Pattern 1: Filtering and aggregating csv data

7.1.1 小例子：聚合

7.1.2 分隔，索引和跳过第一行 The split method the index operator and skip the first line

7.1.3 Question：Summer rain 动动手动动脑

7.2 Pattern 2: Field-dependent aggregation

7.2.1 小例子：字典方法聚合

7.2.2 什么是字典类型

7.2.3 Updating and creating entries & Testing for existing keys & Formatting output from dictionaries

7.2.4 Question： 9am humidity 动动手动动脑

7.3 Pattern 3: Simple slice-and-dice

7.3.1 小例子：切片和切块

7.3.2 缺少或者空的条目 Missing or empty aggregates

7.3.3 Question :Where does the wind come from? 动动脑动动手你是我的好朋友

7.4 Pattern 4: Data binning

7.4.1 小例子：数据分箱 Data binning

7.4.2 Aggregating in lists & Alternative aggregation

7.4.3 Question：Temperature ranges 动动手动动脑

总结

USYD悉尼大学DATA1002 详细作业解析Module7（全新讲解）相关推荐

最新文章

热门文章

USYD悉尼大学DATA1002 详细作业解析Module7（全新讲解）

Module 7: Bucketing and pivoting numeric data 1

前言

下载资料

7.0 本周知识点介绍

7.1 Pattern 1: Filtering and aggregating csv data

7.1.1 小例子：聚合

7.1.2 分隔，索引和跳过第一行 The split method the index operator and skip the first line

7.1.3 Question：Summer rain 动动手 动动脑

7.2 Pattern 2: Field-dependent aggregation

7.2.1 小例子：字典方法聚合

7.2.2 什么是字典类型

7.2.3 Updating and creating entries & Testing for existing keys & Formatting output from dictionaries

7.2.4 Question： 9am humidity 动动手 动动脑

7.3 Pattern 3: Simple slice-and-dice

7.3.1 小例子：切片和切块

7.3.2 缺少或者空的条目 Missing or empty aggregates

7.3.3 Question :Where does the wind come from? 动动脑 动动手 你是我的好朋友

7.4 Pattern 4: Data binning

7.4.1 小例子：数据分箱 Data binning

7.4.2 Aggregating in lists & Alternative aggregation

7.4.3 Question：Temperature ranges 动动手 动动脑

总结

USYD悉尼大学DATA1002 详细作业解析Module7（全新讲解）相关推荐

最新文章

热门文章

7.1.3 Question：Summer rain 动动手动动脑

7.2.4 Question： 9am humidity 动动手动动脑

7.3.3 Question :Where does the wind come from? 动动脑动动手你是我的好朋友

7.4.3 Question：Temperature ranges 动动手动动脑