pandas对重复日期取均值合并再放回dataframe里

参考：pandas找出重复行后取均值并合并

import pandas as pd
import numpy as np
import matplotlib as mpl
%matplotlib inlinefrom ggplot import *
theme_bw()

ggplot麻烦的很，内部用的是老pandas的东西，比如sort之类的，还有一个date啥啥也有问题。如果要解决只能手动修改py文件。

import numpy as npimport pymongo,pandas as pd
from bson import ObjectIdimport matplotlib as mb
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
import plotnine as p9from textblob import TextBlob
from textblob.sentiments import NaiveBayesAnalyzerfrom dateutil import parserfrom ggplot import *
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

导入数据

pcfr = pd.read_excel('hair_dryer.xlsx')
df = pcfr
# '=='后面替换品牌名字即可
m = df[df['product_title']=='remington ac2015 t|studio salon collection pearl ceramic hair dryer, deep purple']
m = df

情感分析函数

中间 df[‘review_date’] = pd.to_datetime(df[‘review_date’])的时候会有玄学报错……整表和按品牌分类后的方法不一样……

def s_c_f(df):# 去重df.duplicated().value_counts() # NaN removedf['review_body'].str.split(expand = True)# date format convert# 切表用这个# df['review_date'] = df.review_date.apply(lambda x:parser.parse(x))# 整表用这个df['review_date'] = pd.to_datetime(df['review_date'])#将date设置为indexdf = df.set_index('review_date')## sentiment analysis# func for polaritydef sentiment_calc(text):try:return TextBlob(text).sentiment.polarityexcept:return None# func for subjectivity    def sentiment_calc_sub(text):try:return TextBlob(text).sentiment.subjectivityexcept:return Nonedf['polarity'] = df['review_body'].apply(sentiment_calc)df['subjectivity'] = df['review_body'].apply(sentiment_calc_sub)return df

画图函数

只画polarity

def drawfig_polarity(result):# 在s_t_f中，通过set_index，把日期设为index,就可以用这种方式求年平均值#m = result.groupby(result.index.year).mean()'''reset_index by default does not modify the DataFrame; it returns a new DataFrame with the reset index. If you want to modify the original, use the inplace argument: df.reset_index(drop=True, inplace=True). Alternatively, assign the result of reset_index by doing df = df.reset_index(drop=True).'''new = result.reset_index()plot = ggplot(aes(x='review_date', y='polarity'), data=new) + \geom_point() + \geom_line(color = 'blue') + \stat_smooth(span = 0.1)return plot

三个都画上去

def drawfig_star(df):# 在s_t_f中，通过set_index，把日期设为index,就可以用这种方式求年平均值#d = result.groupby(result.index.year).mean()#求完其实自动删除了不是数值的列'''reset_index by default does not modify the DataFrame; it returns a new DataFrame with the reset index. If you want to modify the original, use the inplace argument: df.reset_index(drop=True, inplace=True). Alternatively, assign the result of reset_index by doing df = df.reset_index(drop=True).'''#df = d.drop(['helpful_votes','total_votes','help_precentage'],axis=1)#通过列名指定列删除，axis默认0是行，=1是列# 统一尺度#df['subjectivity'] = df['subjectivity'].map(lambda x: x*45*0.76*0.8*0.7*0.4*1.08)  #df['polarity'] = df['polarity'].map(lambda x: x*45*0.76*0.8*0.7)def normalize(data):return (data - data.mean()) / data.std()#     df['subjectivity'] = df['subjectivity'].map(lambda x: normalize(x))  df['subjectivity'] = normalize(df['subjectivity'])
#     df['polarity'] = df['polarity'].map(lambda x: normalize(x))df['polarity'] = normalize(df['polarity'])df['star_rating'] = normalize(df['star_rating'])#ggplotdf['x'] = df.indexdf = pd.melt(df, id_vars='x')plot = ggplot(aes(x='x', y='value', color='variable'), df) + \geom_point() + stat_smooth(span = 0.1) + geom_line()plot.make()
#     plot.fig.set_size_inches(30, 5, forward=True)
#     plot.fig.set_dpi(100)
#     plot.figreturn plot

result = s_c_f(m)

把没必要的去掉。

pandas对重复日期取均值合并再放回dataframe里相关推荐

pandas基础操作大全之数据合并
在pandas 基础操作大全之数据读取&清洗&分析中介绍了pandas常见的数据处理操作,现在继续对pandas常用的数据合并操作做下介绍,便于大家快速了解,也方便后续需要时快速查询. ...
python pandas 日期_python+pandas+时间、日期以及时间序列处理方法
python+pandas+时间.日期以及时间序列处理方法先简单的了解下日期和时间数据类型及工具 python标准库包含于日期(date)和时间(time)数据的数据类型,datetime.time ...
python pandas 日期格式_python+pandas+时间、日期以及时间序列处理方法
先简单的了解下日期和时间数据类型及工具 python标准库包含于日期(date)和时间(time)数据的数据类型,datetime.time以及calendar模块会被经常用到. datetime以毫 ...
python处理时间的标准函数库_python+pandas+时间、日期以及时间序列处理方法
先简单的了解下日期和时间数据类型及工具 python标准库包含于日期(date)和时间(time)数据的数据类型,datetime.time以及calendar模块会被经常用到. datetime以毫 ...
EasyExcel：利用模板进行填充字段，生成公式处理，监听单元格填充后触发事件，相同日期单元格合并
EasyExcel EasyExcel是一个基于Java的简单.省内存的读写Excel的开源项目.在尽可能节约内存的情况下支持读写百M的Excel. github地址:https://github.c ...
git 怎么拉取线上代码到本地进行合并_android studio如何使用git提交、拉取、合并代码的操作...
我们在实际做项目开发时,一般都需要多人协同开发,这就产生了代码管控的需求,一些版本控制的工具就应运而生了.现在常用的一种是Git,另外还有些svn等,本人感觉git工具比较好用,这篇文章也只讲述git ...
pandas 中处理日期相减问题
pandas 中处理日期相减问题. 假设有这样一个需求, 我获取了一组日期是某个人的通话日期, 我想计算出这段时间里面, 没有通话的天数, 以及连续3天以上没有通话的次数 #!/usr/bin ...
7 爬虫爬取网页文章（保留图片和文本顺序，原封不动）的数据库设计，且避免重复抓取...
1 设计思考 1.1 关于爬取文章存储的思考第一,文章要抓取到本地: 第二,查询文件大小,如果文件过大,超出多少M,则新建一个主题文件比如:file="./"+"微信文 ...
Sparrow算法篇从日期取交集到思维模式-2
接上一篇 Sparrow算法篇从日期取交集到思维模式这样的时间段有成百上千条该如何处理? 如果我们需要根据具有日期交集的时间段分组呢? 如果我们的业务不是日期,而是其他数据类型呢?如何抽象出计算模 ...

pandas对重复日期取均值合并再放回dataframe里

导入数据

情感分析函数

画图函数

pandas对重复日期取均值合并再放回dataframe里相关推荐

最新文章

热门文章