女性服装数据分析(电商数据)版本1

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
color = sns.color_palette()
data = pd.read_csv('Womens_Clothing.csv')
#  查看数据结构
data
Unnamed: 0 Clothing ID Age Title Review Text Rating Recommended IND Positive Feedback Count Division Name Department Name Class Name
0 0 767 33 NaN Absolutely wonderful - silky and sexy and comf... 4 1 0 Initmates Intimate Intimates
1 1 1080 34 NaN Love this dress! it's sooo pretty. i happene... 5 1 4 General Dresses Dresses
2 2 1077 60 Some major design flaws I had such high hopes for this dress and reall... 3 0 0 General Dresses Dresses
3 3 1049 50 My favorite buy! I love, love, love this jumpsuit. it's fun, fl... 5 1 0 General Petite Bottoms Pants
4 4 847 47 Flattering shirt This shirt is very flattering to all due to th... 5 1 6 General Tops Blouses
... ... ... ... ... ... ... ... ... ... ... ...
23481 23481 1104 34 Great dress for many occasions I was very happy to snag this dress at such a ... 5 1 0 General Petite Dresses Dresses
23482 23482 862 48 Wish it was made of cotton It reminds me of maternity clothes. soft, stre... 3 1 0 General Petite Tops Knits
23483 23483 1104 31 Cute, but see through This fit well, but the top was very see throug... 3 0 1 General Petite Dresses Dresses
23484 23484 1084 28 Very cute dress, perfect for summer parties an... I bought this dress for a wedding i have this ... 3 1 2 General Dresses Dresses
23485 23485 1104 52 Please make more like this one! This dress in a lovely platinum is feminine an... 5 1 22 General Petite Dresses Dresses

23486 rows × 11 columns

有上面结果可知:

该数据集包括23486行和10个特征变量。每行对应一个客户评论,并包含以下变量:

**服装ID:**整数分类变量,指的是要查看的特定作品。
**年龄:**评论者年龄的正整数变量。
**标题:**评论标题的字符串变量。
**评论文本:**评论正文的字符串变量。
**评分:**客户授予的产品评分的正序整数变量,从1最差,到5最佳。
**推荐的IND:**二进制变量,说明客户在推荐1的地方推荐产品,不推荐0的地方。
**积极的反馈计数:**积极的整数,记录发现该评论为积极的其他客户的数量。
**高级部门名称:**产品高级部门的分类名称。
**部门名称:**产品部门名称的分类名称。
**类名称:**产品类名称的分类名称。

中文名称 英文名称

服装ID Clothing ID

年龄 Age

标题 Title

评论文本 Review Text

评分: Rating

推荐的IND Recommended IND

积极的反馈计数 Positive Feedback Count

高级部门名称 Division Name

部门名称 Department Name

类名称 Class Name

data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23486 entries, 0 to 23485
Data columns (total 11 columns):
Unnamed: 0                 23486 non-null int64
Clothing ID                23486 non-null int64
Age                        23486 non-null int64
Title                      19676 non-null object
Review Text                22641 non-null object
Rating                     23486 non-null int64
Recommended IND            23486 non-null int64
Positive Feedback Count    23486 non-null int64
Division Name              23472 non-null object
Department Name            23472 non-null object
Class Name                 23472 non-null object
dtypes: int64(6), object(5)
memory usage: 2.0+ MB
#  查看缺失值
# data.isnull()
#  删除缺失值
df = data.dropna()
df
Unnamed: 0 Clothing ID Age Title Review Text Rating Recommended IND Positive Feedback Count Division Name Department Name Class Name
2 2 1077 60 Some major design flaws I had such high hopes for this dress and reall... 3 0 0 General Dresses Dresses
3 3 1049 50 My favorite buy! I love, love, love this jumpsuit. it's fun, fl... 5 1 0 General Petite Bottoms Pants
4 4 847 47 Flattering shirt This shirt is very flattering to all due to th... 5 1 6 General Tops Blouses
5 5 1080 49 Not for the very petite I love tracy reese dresses, but this one is no... 2 0 4 General Dresses Dresses
6 6 858 39 Cagrcoal shimmer fun I aded this in my basket at hte last mintue to... 5 1 1 General Petite Tops Knits
... ... ... ... ... ... ... ... ... ... ... ...
23481 23481 1104 34 Great dress for many occasions I was very happy to snag this dress at such a ... 5 1 0 General Petite Dresses Dresses
23482 23482 862 48 Wish it was made of cotton It reminds me of maternity clothes. soft, stre... 3 1 0 General Petite Tops Knits
23483 23483 1104 31 Cute, but see through This fit well, but the top was very see throug... 3 0 1 General Petite Dresses Dresses
23484 23484 1084 28 Very cute dress, perfect for summer parties an... I bought this dress for a wedding i have this ... 3 1 2 General Dresses Dresses
23485 23485 1104 52 Please make more like this one! This dress in a lovely platinum is feminine an... 5 1 22 General Petite Dresses Dresses

19662 rows × 11 columns

分析

# 1. 可视化 给出评分者的年龄
plt.hist(df['Age'], color=color[1], label='age')
plt.legend()
plt.xlabel('age')
plt.ylabel('count')
plt.title('age of commentator')
print('\n figure 01')
 figure 01

得出结论

由figure01 可得出:给出评论的人的年龄大多在25到45之间,青年、中年人较多

# 2. 可视化不同年龄的等级图
plt.figure(figsize=(10, 8))
sns.boxplot(x='Rating', y='Age', data=df)
plt.title('age of rating')
print('\n figure 02')
 figure 02

得出结论

由figure02 可得出:给出评分分布的年龄都差不多

3、每个部门、推荐什么服装?
查看Division Name,Department Name和’Class Name的唯一值

print('高级部门Division Name', df['Division Name'].unique())
print()
print('部门Department Name',df['Department Name'].unique())
print()
print('类名称Class Name',df['Class Name'].unique())
高级部门Division Name ['General' 'General Petite' 'Initmates']部门Department Name ['Dresses' 'Bottoms' 'Tops' 'Intimate' 'Jackets' 'Trend']类名称Class Name ['Dresses' 'Pants' 'Blouses' 'Knits' 'Intimates' 'Outerwear' 'Lounge''Sweaters' 'Skirts' 'Fine gauge' 'Sleep' 'Jackets' 'Swim' 'Trend' 'Jeans''Shorts' 'Legwear' 'Layering' 'Casual bottoms' 'Chemises']

将Recommended IND推荐产品为1,不推荐0的数据分开

# recommend  not_recommend
recommend = df[df['Recommended IND'] == 1]
not_recommend = df[df['Recommended IND'] == 0]
# recommend.head()
not_recommend.head()
Unnamed: 0 Clothing ID Age Title Review Text Rating Recommended IND Positive Feedback Count Division Name Department Name Class Name
2 2 1077 60 Some major design flaws I had such high hopes for this dress and reall... 3 0 0 General Dresses Dresses
5 5 1080 49 Not for the very petite I love tracy reese dresses, but this one is no... 2 0 4 General Dresses Dresses
10 10 1077 53 Dress looks like it's made of cheap material Dress runs small esp where the zipper area run... 3 0 14 General Dresses Dresses
22 22 1077 31 Not what it looks like First of all, this is not pullover styling. th... 2 0 7 General Dresses Dresses
25 25 697 31 Falls flat Loved the material, but i didnt really look at... 3 0 0 Initmates Intimate Lounge
# 4.可视化不同部门的推荐和不推荐的叠加柱状图
plt.figure(figsize=(12,8))
plt.hist(recommend['Department Name'], color=color[2], alpha=0.5, label='recommend')
plt.hist(not_recommend['Department Name'], color=color[4], alpha=0.5, label='not_recommend')
plt.legend()
plt.xticks(rotation=45)
plt.title('Department recommend and not_recommend')
print('\n figure 03')
 figure 03

得出结论

由figure03可知 绿色的面积大于X色的面积,由此说明,大部分部门都可以推荐商品

# 可视化不同商品的推荐和不推荐叠加柱状图
plt.figure(figsize=(12,8))
plt.hist(recommend['Class Name'], color=color[1], alpha=0.5, label='recommend')
plt.hist(not_recommend['Class Name'], color=color[5], alpha=0.5, label='not_recommend')
plt.legend()
plt.xticks(rotation=45)
plt.title('Class recommend and not_recommend')
print('\n figure 04')
 figure 04

得出结论

从figure04看出:并不是卖最多的Knits商品推荐成功率最大

# 哪个年龄段的人对什么样的衣服发表什么样的评论
df['Review Length'] = df['Review Text'].astype(str).apply(len)
df
E:\anaconda\lib\site-packages\ipykernel_launcher.py:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value insteadSee the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Unnamed: 0 Clothing ID Age Title Review Text Rating Recommended IND Positive Feedback Count Division Name Department Name Class Name Review Length
2 2 1077 60 Some major design flaws I had such high hopes for this dress and reall... 3 0 0 General Dresses Dresses 500
3 3 1049 50 My favorite buy! I love, love, love this jumpsuit. it's fun, fl... 5 1 0 General Petite Bottoms Pants 124
4 4 847 47 Flattering shirt This shirt is very flattering to all due to th... 5 1 6 General Tops Blouses 192
5 5 1080 49 Not for the very petite I love tracy reese dresses, but this one is no... 2 0 4 General Dresses Dresses 488
6 6 858 39 Cagrcoal shimmer fun I aded this in my basket at hte last mintue to... 5 1 1 General Petite Tops Knits 496
... ... ... ... ... ... ... ... ... ... ... ... ...
23481 23481 1104 34 Great dress for many occasions I was very happy to snag this dress at such a ... 5 1 0 General Petite Dresses Dresses 131
23482 23482 862 48 Wish it was made of cotton It reminds me of maternity clothes. soft, stre... 3 1 0 General Petite Tops Knits 223
23483 23483 1104 31 Cute, but see through This fit well, but the top was very see throug... 3 0 1 General Petite Dresses Dresses 208
23484 23484 1084 28 Very cute dress, perfect for summer parties an... I bought this dress for a wedding i have this ... 3 1 2 General Dresses Dresses 427
23485 23485 1104 52 Please make more like this one! This dress in a lovely platinum is feminine an... 5 1 22 General Petite Dresses Dresses 110

19662 rows × 12 columns

#  绘制单Review Length变量分布
# 单变量分布的最方便的方法是sns.distplot()功能。默认情况下,这将绘制直方图并拟合核密度估计(KDE)
fig = plt.figure(figsize=(12, 8))
ax = sns.distplot(df['Review Length'], color=color[3])
ax = plt.title("Length of Reviews")
print('\n figure 05')
 figure 05

得出结论

由figure05可得出 大部分人评论的长度都基本在500

#  可视化不同年龄段的评论长度分布
plt.figure(figsize=(18,8))
sns.boxplot(x='Age', y='Review Length', data=df)
print('\n figure 06')
 figure 06

# 评分与正面反馈计数
plt.figure(figsize=(12,8))
sns.boxplot(x = 'Rating', y = 'Positive Feedback Count', data = df)
print('\n figure 07')
 figure 07

得出结论

由图figure07可得出 评分在3以上的正面反馈的计数大

词云评论可视化

# 1. 数据清洗
import re
from wordcloud import WordCloud, STOPWORDSdef clean_data(text):letters_only = re.sub("[^a-zA-Z]", " ", text) #  替换标点符合等words = letters_only.lower().split()                            return( " ".join( words ))
#     return letters_onlystopwords= set(STOPWORDS)|{'skirt', 'blouse','dress','sweater', 'shirt','bottom', 'pant', 'pants' 'jean', 'jeans','jacket', 'top', 'dresse'}def create_cloud(rating):x= [i for i in rating]y= ' '.join(x)cloud = WordCloud(background_color='white',width=1600, height=800,max_words=100,stopwords= stopwords).generate(y)plt.figure(figsize=(15,7.5))plt.axis('off')plt.imshow(cloud)plt.show()
#  等级是5的词云图
rating5= df[df['Rating']==5]['Review Text'].apply(clean_data)
create_cloud(rating5)

#  等级是4的词云图
rating4= df[df['Rating']==4]['Review Text'].apply(clean_data)
create_cloud(rating4)

#  等级是3的词云图
rating3= df[df['Rating']==3]['Review Text'].apply(clean_data)
create_cloud(rating3)

#  等级是2的词云图
rating2= df[df['Rating']==2]['Review Text'].apply(clean_data)
create_cloud(rating2)

#  等级是1的词云图
rating1= df[df['Rating']==1]['Review Text'].apply(clean_data)
create_cloud(rating1)

女性服装数据分析(电商数据)版本1相关推荐

  1. 电商数据指标与《电商数据分析与数据化营销》

    文章目录 前言 1 电商数据指标 2 <电商数据分析与数据化营销> 3 总结 参考 前言 想了解电商的指标和电商行业的一些数据分析 1 电商数据指标 2 <电商数据分析与数据化营销& ...

  2. 数据仓库之电商数仓-- 4、可视化报表Superset

    目录 一.Superset入门 1.1 Superset概述 1.2 Superset应用场景 二.Superset安装及使用 2.1 安装Python环境 2.1.1 安装Miniconda 2.1 ...

  3. 2 大数据电商数仓项目——项目需求及架构设计

    2 大数据电商数仓项目--项目需求及架构设计 2.1 项目需求分析 用户行为数据采集平台搭建. 业务数据采集平台搭建. 数据仓库维度建模(核心):主要设计ODS.DWD.DWS.AWT.ADS等各个层 ...

  4. 大数据项目之电商数仓离线计算

    本次项目是基于企业大数据的电商经典案例项目(大数据日志以及网站数据分析),业务分析.技术选型.架构设计.集群规划.安装部署.整合继承与开发和web可视化交互设计. 1.系统数据流程设计 我这里主要分享 ...

  5. 电商数仓描述_笔记-尚硅谷大数据项目数据仓库-电商数仓V1.2新版

    架构 项目框架 数仓架构 存储压缩 Snappy与LZO LZO安装: 读取LZO文件时,需要先创建索引,才可以进行切片. 框架版本选型Apache:运维麻烦,需要自己调研兼容性. CDH:国内使用最 ...

  6. 基于电商数据的用户行为分析之需求分析

    电商用户行为分析需求分析说明书 项目名称: 电商用户行为分析 修订时间: 2021-05-28 修订版本: 1.0 一.引言 1.目的 通过编写需求分析文档,对基于电商数据的用户行为分析系统进行介绍, ...

  7. 大数据项目 --- 电商数仓(一)

    这个项目实在数据采集基础使用的,需要提前复习之前学的东西,否则的话就是很难继续学习.详见博客数据项目一 ---数据采集项目.大数据项目 --- 数据采集项目_YllasdW的博客-CSDN博客大数据第 ...

  8. 复盘离线电商数仓3.0项目–数据开发梳理

    复盘离线电商数仓项目–数据开发梳理 业务数据 数仓分层 ods层到ads层的开发 开源BI工具Superset ODS层业务数据&日志数据 ods层业务数据 使用Sqoop脚本从Mysql数据 ...

  9. python爬虫实例电商_如何用代码爬抓电商数据(附淘宝API调用实例)

    原标题:如何用代码爬抓电商数据(附淘宝API调用实例) 欢迎关注天善智能 hellobi.com,我们是专注于商业智能BI,大数据,数据分析领域的垂直社区,学习.问答.求职,一站式搞定! 对商业智能B ...

最新文章

  1. wangEditor - 轻量级web富文本编辑器(可带图片上传)
  2. 第三章 线性代数回顾-机器学习老师板书-斯坦福吴恩达教授
  3. Redis简介和Redis Template用法整理
  4. 数据结构练习——双向链表
  5. 镭速(Raysync)文件传输高可用部署介绍!
  6. Git忽略文件或文件夹
  7. 如何运行vue项目(从gethub上download的开源项目)
  8. 基于深度卷积神经网络的玉米病害识别
  9. win10设置自定html背景,win10开始菜单背景和图标自定义的方法
  10. dev chart 绘制图形
  11. Android系统五层架构
  12. 带张光盘去装机(转)
  13. 02. 只允许使用QQ和微信 - 服务 ❀ 飞塔 (Fortinet6.0) 防火墙
  14. 涂鸦蓝牙SDK开发系列教程——8.Board API 说明
  15. python系列tkinter之pack布局、place布局和grid布局
  16. ios 按钮图片拉伸_iOS中实现图片自适应拉伸效果的方法
  17. 浅谈一下线程中synchronized块、wait,notify的用法
  18. 网吧组网 光纤接入与ADSL接入的较量
  19. Python机器学习实战:如何用Pandas处理缺失值
  20. 【题解】LuoGu4611:[COI2012] TRAMPOLIN

热门文章

  1. 【云原生网关】apisix使用详解
  2. 微信 9 年:张小龙指路,微信 AI 全面开放 NLP 能力
  3. 蒙特卡洛模拟模拟的matlab语言代码
  4. (一)移动App开发——Native App-原生开发Web App-网页开发Hybrid App-混合开发网页打包成App四方式-Cordova-APPCan-DCloud-API Cloud
  5. jquery追加html及移除,jQuery 添加元素和删除元素的方法
  6. 业务建模 活动图和序列图
  7. 第3章 面向对象设计基础
  8. 【论文笔记】Semi-supervised Domain Adaptation via Minimax Entropy(ICCV 2019)
  9. isNaN()的用法
  10. 【C语言】strncpy函数和strncpy_s函数的不同!关于末尾追加\0