python dataframe增加一行_python - 在pandas.DataFrame中添加一行

python - 在pandas.DataFrame中添加一行

据我所知，pandas旨在加载完全填充的DataFrame，但我需要创建一个空的DataFrame，然后逐个添加行。做这个的最好方式是什么？

我成功创建了一个空的DataFrame：

res = DataFrame(columns=('lib', 'qty1', 'qty2'))

然后我可以添加一个新行并填充一个字段：

res = res.set_value(len(res), 'qty1', 10.0)

它工作但似乎很奇怪： - /(它添加字符串值失败)

如何向我的DataFrame添加新行(具有不同的列类型)？

PhE asked 2019-01-25T09:34:12Z

18个解决方案

305 votes

@ Nasser回答的例子：

>>> import pandas as pd

>>> import numpy as np

>>> df = pd.DataFrame(columns=['lib', 'qty1', 'qty2'])

>>> for i in range(5):

>>> df.loc[i] = [np.random.randint(-1,1) for n in range(3)]

>>>

>>> print(df)

lib qty1 qty2

0 0 0 -1

1 -1 -1 1

2 1 -1 1

3 0 0 0

4 1 -1 -1

[5 rows x 3 columns]

fred answered 2019-01-25T09:34:24Z

231 votes

您可以使用pandas.concat()或DataFrame.append().有关详细信息和示例，请参阅合并，连接和连接。

NPE answered 2019-01-25T09:34:48Z

229 votes

如果您可以预先获取数据帧的所有数据，则可以采用比附加到数据框更快的方法：

创建一个字典列表，其中每个字典对应一个输入数据行。

从此列表创建数据框。

我有一个类似的任务，逐行追加数据框需要30分钟，并在几秒钟内完成一个字典列表中的数据框。

rows_list = []

for row in input_rows:

dict1 = {}

# get input row in dictionary format

# key = col_name

dict1.update(blah..)

rows_list.append(dict1)

df = pd.DataFrame(rows_list)

ShikharDua answered 2019-01-25T09:35:42Z

69 votes

如果您事先知道条目数，则应通过提供索引来预先分配空间(从不同答案中获取数据示例)：

import pandas as pd

import numpy as np

# we know we're gonna have 5 rows of data

numberOfRows = 5

# create dataframe

df = pd.DataFrame(index=np.arange(0, numberOfRows), columns=('lib', 'qty1', 'qty2') )

# now fill it up row by row

for x in np.arange(0, numberOfRows):

#loc or iloc both work here since the index is natural numbers

df.loc[x] = [np.random.randint(-1,1) for n in range(3)]

In[23]: df

Out[23]:

lib qty1 qty2

0 -1 -1 -1

1 0 0 0

2 -1 0 -1

3 0 -1 0

4 -1 0 0

速度比较

In[30]: %timeit tryThis() # function wrapper for this answer

In[31]: %timeit tryOther() # function wrapper without index (see, for example, @fred)

1000 loops, best of 3: 1.23 ms per loop

100 loops, best of 3: 2.31 ms per loop

而且 - 从评论中 - 大小为6000，速度差异变得更大：

增加数组(12)的大小和行数(500) 速度差异更加惊人：313ms vs 2.29s

FooBar answered 2019-01-25T09:36:31Z

58 votes

有效追加请参阅如何向pandas数据框添加额外行和使用放大设置。

在非现有密钥索引数据上通过loc/ix添加行。例如：

In [1]: se = pd.Series([1,2,3])

In [2]: se

Out[2]:

0 1

1 2

2 3

dtype: int64

In [3]: se[5] = 5.

In [4]: se

Out[4]:

0 1.0

1 2.0

2 3.0

5 5.0

dtype: float64

要么：

In [1]: dfi = pd.DataFrame(np.arange(6).reshape(3,2),

.....: columns=['A','B'])

.....:

In [2]: dfi

Out[2]:

A B

0 0 1

1 2 3

2 4 5

In [3]: dfi.loc[:,'C'] = dfi.loc[:,'A']

In [4]: dfi

Out[4]:

A B C

0 0 1 0

1 2 3 2

2 4 5 4

In [5]: dfi.loc[3] = 5

In [6]: dfi

Out[6]:

A B C

0 0 1 0

1 2 3 2

2 4 5 4

3 5 5 5

Nasser Al-Wohaibi answered 2019-01-25T09:37:08Z

51 votes

mycolumns = ['A', 'B']

df = pd.DataFrame(columns=mycolumns)

rows = [[1,2],[3,4],[5,6]]

for row in rows:

df.loc[len(df)] = row

Lydia answered 2019-01-25T09:37:24Z

37 votes

您可以使用ignore_index选项将单行附加为字典。

>>> f = pandas.DataFrame(data = {'Animal':['cow','horse'], 'Color':['blue', 'red']})

>>> f

Animal Color

0 cow blue

1 horse red

>>> f.append({'Animal':'mouse', 'Color':'black'}, ignore_index=True)

Animal Color

0 cow blue

1 horse red

2 mouse black

W.P. McNeill answered 2019-01-25T09:37:54Z

32 votes

为了Pythonic方式，这里添加我的答案：

res = pd.DataFrame(columns=('lib', 'qty1', 'qty2'))

res = res.append([{'qty1':10.0}], ignore_index=True)

print(res.head())

lib qty1 qty2

0 NaN 10.0 NaN

hkyi answered 2019-01-25T09:38:18Z

17 votes

已经很久了，但我也遇到了同样的问题。并在这里找到了很多有趣的答案。所以我很困惑使用什么方法。

在向数据帧添加大量行的情况下，我对速度性能感兴趣。所以我尝试了3种最流行的方法并检查了它们的速度。

速度表现

使用.append(NPE的答案)

使用.loc(fred的答案和FooBar的答案)

最后使用dict并创建DataFrame(ShikharDua的答案)

结果(以秒为单位)：

Adding 1000 rows 5000 rows 10000 rows

.append 1.04 4.84 9.56

.loc 1.16 5.59 11.50

dict 0.23 0.26 0.34

所以我通过字典为自己添加了。

码：

import pandas

import numpy

import time

numOfRows = 10000

startTime = time.perf_counter()

df1 = pandas.DataFrame(numpy.random.randint(100, size=(5,5)), columns=['A', 'B', 'C', 'D', 'E'])

for i in range( 1,numOfRows):

df1 = df1.append( dict( (a,numpy.random.randint(100)) for a in ['A','B','C','D','E']), ignore_index=True)

print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))

startTime = time.perf_counter()

df2 = pandas.DataFrame(numpy.random.randint(100, size=(5,5)), columns=['A', 'B', 'C', 'D', 'E'])

for i in range( 1,numOfRows):

df2.loc[df2.index.max()+1] = numpy.random.randint(100, size=(1,5))[0]

print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))

startTime = time.perf_counter()

row_list = []

for i in range (0,5):

row_list.append(dict( (a,numpy.random.randint(100)) for a in ['A','B','C','D','E']))

for i in range( 1,numOfRows):

dict1 = dict( (a,numpy.random.randint(100)) for a in ['A','B','C','D','E'])

row_list.append(dict1)

df3 = pandas.DataFrame(row_list, columns=['A','B','C','D','E'])

print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))

附：我相信，我的认识并不完美，也许有一些优化。

Mikhail_Sam answered 2019-01-25T09:40:30Z

13 votes

这不是OP问题的答案，而是一个玩具示例来说明@ShikharDua的答案，我发现它非常有用。

虽然这个片段是微不足道的，但在实际数据中我有1,000行和多列，我希望能够按不同的列进行分组，然后对多个taget列执行下面的统计。因此，一次一行地构建数据帧的可靠方法是非常方便的。谢谢@ShikharDua！

import pandas as pd

BaseData = pd.DataFrame({ 'Customer' : ['Acme','Mega','Acme','Acme','Mega','Acme'],

'Territory' : ['West','East','South','West','East','South'],

'Product' : ['Econ','Luxe','Econ','Std','Std','Econ']})

BaseData

columns = ['Customer','Num Unique Products', 'List Unique Products']

rows_list=[]

for name, group in BaseData.groupby('Customer'):

RecordtoAdd={} #initialise an empty dict

RecordtoAdd.update({'Customer' : name}) #

RecordtoAdd.update({'Num Unique Products' : len(pd.unique(group['Product']))})

RecordtoAdd.update({'List Unique Products' : pd.unique(group['Product'])})

rows_list.append(RecordtoAdd)

AnalysedData = pd.DataFrame(rows_list)

print('Base Data : \n',BaseData,'\n\n Analysed Data : \n',AnalysedData)

user3250815 answered 2019-01-25T09:41:03Z

7 votes

您还可以构建列表列表并将其转换为数据框 -

import pandas as pd

rows = []

columns = ['i','double','square']

for i in range(6):

row = [i, i*2, i*i]

rows.append(row)

df = pd.DataFrame(rows, columns=columns)

给

i double square

0 0 0 0

1 1 2 1

2 2 4 4

3 3 6 9

4 4 8 16

5 5 10 25

Brian Burns answered 2019-01-25T09:41:45Z

5 votes

创建一个新记录(数据框)并添加到old_data_frame。

传递值列表和相应的列名以创建new_record(data_frame)

new_record = pd.DataFrame([[0,'abcd',0,1,123]],columns=['a','b','c','d','e'])

old_data_frame = pd.concat([old_data_frame,new_record])

Jack Daniel answered 2019-01-25T09:42:17Z

4 votes

想出一个简单而好的方法：

>>> df

A B C

one 1 2 3

>>> df.loc["two"] = [4,5,6]

>>> df

A B C

one 1 2 3

two 4 5 6

Qinsi answered 2019-01-25T09:42:44Z

3 votes

另一种方法(可能不是非常高效)：

# add a row

def add_row(df, row):

colnames = list(df.columns)

ncol = len(colnames)

assert ncol == len(row), "Length of row must be the same as width of DataFrame: %s" % row

return df.append(pd.DataFrame([row], columns=colnames))

您还可以像这样增强DataFrame类：

import pandas as pd

def add_row(self, row):

self.loc[len(self.index)] = row

pd.DataFrame.add_row = add_row

qed answered 2019-01-25T09:43:19Z

1 votes

简单一点。通过将列表作为输入，将作为数据框中的行附加： -

import pandas as pd

res = pd.DataFrame(columns=('lib', 'qty1', 'qty2'))

for i in range(5):

res_list = list(map(int, input().split()))

res = res.append(pd.Series(res_list,index=['lib','qty1','qty2']), ignore_index=True)

Vineet Jain answered 2019-01-25T09:43:49Z

1 votes

这是在pandas DataFrame中添加/追加行的方法

def add_row(df, row):

df.loc[-1] = row

df.index = df.index + 1

return df.sort_index()

add_row(df, [1,2,3])

它可用于在空的或填充的pandas DataFrame中插入/追加一行

Shivam Agrawal answered 2019-01-25T09:44:33Z

0 votes

import pandas as pd

t1=pd.DataFrame()

for i in range(len(the number of rows)):

#add rows as columns

t1[i]=list(rows)

t1=t1.transpose()

t1.columns=list(columns)

Vicky answered 2019-01-25T09:44:49Z

-1 votes

这将负责将项添加到空DataFrame。问题是第一个索引的df.index.max()== nan：

df = pd.DataFrame(columns=['timeMS', 'accelX', 'accelY', 'accelZ', 'gyroX', 'gyroY', 'gyroZ'])

df.loc[0 if math.isnan(df.index.max()) else df.index.max() + 1] = [x for x in range(7)]

tomatom answered 2019-01-25T09:45:13Z

python dataframe增加一行_python - 在pandas.DataFrame中添加一行相关推荐

python千位分隔符_python – 为pandas数据帧中的整数设置千位分隔符
我正在尝试使用'{:,}'.格式(数字),如下例所示,格式化pandas数据帧中的数字: # This works for floats and integers print '{:,}'.forma ...
把dataframe删掉第一行_python – 从Pandas DataFrame中的所有行中减去第一行
我有一个pandas数据帧: a = pd.DataFrame(rand(5,6)*10, index=pd.DatetimeIndex(start='2005', periods=5, freq=' ...
python文件替换一行_python自动化替换文件中每一行中的特有字符串
在工作中,有可能有一些场景,例如我要替换我txt文档中,所有行中包含test文字的的行内容,例如我们可以替换为空,或者某一个特定字符串呢?不多说,上代码.. def repalceString(): ...
python将ElasticSearch索引数据读入pandas dataframe实战
python将ElasticSearch索引数据读入pandas dataframe实战 # 导入基础包和库 import pandas as pdpd.set_option('display.max ...
python解决数据框中添加一行或者一列(DataFrame的行列处理)
1.解决数据框中添加一行(给定值) ###原数据框data_Peak_2 = pd.DataFrame({"Peak_density": np.ndarray.tolist(hma ...
给DataTable中添加一行数据
给DataTable中添加一行数据一.如果该DataTable有两列,列的名称是Name,Age,且该DataTable的名称是dt; DataTable dt = new DataTable(); ...
python dataframe取一列_python - 从pandas DataFrame列标题中获取列表
python - 从pandas DataFrame列标题中获取列表我想从pandas DataFrame中获取列标题列表. DataFrame将来自用户输入,因此我不知道将会有多少列或将调用它们. ...
python绘制时间序列图_python matplotlib 画dataframe的时间序列图实例
python matplotlib 画dataframe的时间序列图实例在python中经常会用到pandas来处理数据,最常用的数据类型是dataframe,但是有时候在dataframe有时间字 ...
python字符串替换空格_python - 用pandas中的NaN替换空白值（空格）
python - 用pandas中的NaN替换空白值(空格) 我想在Pandas数据帧中找到包含空格(任意数量)的所有值,并用NaN替换这些值. 有什么想法可以改进吗? 基本上我想转此: A B C ...

python dataframe增加一行_python - 在pandas.DataFrame中添加一行

python dataframe增加一行_python - 在pandas.DataFrame中添加一行相关推荐

最新文章

热门文章