bokeh pandas

by Gautham Koorma

通过Gautham Koorma

使用Pandas和Bokeh将Rolling Stone的500张最伟大专辑可视化 (Rolling Stone’s 500 Greatest Albums Visualized Using Pandas and Bokeh)

In 2003, Rolling Stones Magazine polled musicians, producers, and industry executives about their favorite albums. The result was a special issue titled “The 500 Greatest Albums of All Time.”

2003年,《滚石杂志》(Rolling Stones Magazine)对音乐家,制作人和行业高管进行了投票,调查了他们最喜欢的专辑。 结果是一期名为“有史以来最伟大的500张专辑”的特刊。

The list — which they revised in 2012 — mainly features American and British music from the 1960s and the 1970s.

该清单在2012年进行了修订,主要包含1960年代和1970年代的美国和英国音乐。

As an ardent music fan and an aspiring music producer, I listen to a wide variety of genres. The Rolling Stones list served as an introduction to rock music for me back in the day.

作为热心的音乐迷和有抱负的音乐制作人,我喜欢听各种各样的音乐。 滚石乐队的清单在当时为我介绍了摇滚音乐。

One day I while browsing through Kaggle to pick up a simple data set and test my newly acquired data visualization skills, I stumbled upon the list uploaded as a CSV dataset. I decided to get my hands dirty by using pandas to explore the data and bokeh to visualize the results.

有一天,我在浏览Kaggle并选择一个简单的数据集并测试新获得的数据可视化技能时,偶然发现了以CSV数据集上传的列表。 我决定通过使用熊猫来探索数据并通过散景来可视化结果来弄脏我的手。

Bokeh is a Python library for interactive visualization. It features a powerful interface that supports high-level charting, intermediate-level plotting, and lower-level modeling.

Bokeh是用于交互式可视化的Python库。 它具有强大的界面,可支持高级图表,中级绘图和低级建模。

The complete code I used for reading, refining, exploring, and visualizing the data can be found on my GitHub page, and also in this notebook submitted on Kaggle.

我用于读取,精炼,探索和可视化数据的完整代码可以在我的GitHub页面上找到 ,也可以在Kaggle上提交的笔记本中找到 。

This post will describe the approaches I took, complete with my visualizations and the insight I gained from building them.

这篇文章将描述我采用的方法,并结合我的可视化效果以及从构建它们中获得的见解。

获取和构建数据 (Getting and Structuring the Data)

Getting the data was simple, since it was in a 500 x 6 excel spreadsheet. All I had to do was read it into a pandas data frame directly by using the read_excel() function.

获取数据非常简单,因为它位于500 x 6的Excel电子表格中。 我要做的就是直接使用read_excel()函数将其读入pandas数据框中。

The data frame had 500 rows, one for each album listing the Chart Number, Year, Album, Artist, Genre, and Subgenre. The Genre and Subgenre columns had multiple comma separated values in a string, so I had to split the string at the first comma and keep just the first value in new columns as the most relevant categorization of the album’s Genre and Subgenre.

数据框有500行,每张专辑一个,列出图表编号,年份,专辑,艺术家,流派和子流派。 “类型”和“子类型”列在字符串中具有多个逗号分隔的值,因此,我必须在第一个逗号处拆分字符串,并仅将新列中的第一个值保留为专辑的“类型”和“子类型”的最相关分类。

The master data frame became 500 x 8 after the Genres_Refined and Subgenres_Refined columns were added.

添加Genres_Refined和Subgenres_Refined列后,主数据帧变为500 x 8。

I used a Python 3.5.2 kernel (Anaconda 4.2.0 distribution) on a Jupyter notebook.

我在Jupyter笔记本上使用了Python 3.5.2内核(Anaconda 4.2.0发行版)。

探索数据并获得见解 (Exploring the Data and Gaining Insights)

I adopted the split-apply-combine strategy using pandas inbuilt groupby() function in most cases and the reshaping strategy using pandas inbuilt pivot_table() function for a single case. I fed the resulting data frames into bokeh charts and figures.

我采用了熊猫内置groupby()的拆分应用组合策略 多数情况下使用“函数”,并且在单个情况下使用pandas内置的pivot_table()函数进行重塑策略。 我将得到的数据框输入到散景图和图形中。

Here are the questions I posed and their resulting visualizations.

以下是我提出的问题及其产生的可视化效果。

名单上专辑数量最多的前10位歌手 (The top-10 artists who have the most albums on the list)

To get the top 10 artists, I used groupby() on the artists column, took a count, and sorted the resulting data frame to get the top 10 artists having the most number of albums.

为了获得前10位艺术家,我在groupby()列上使用了groupby() ,进行了计数,并对所得数据框进行了排序,以获取专辑数量最多的前10位艺术家。

To visualize the results, I used the a figure object from the bokeh.plotting library and drew black circles using the circle() method.

为了使结果可视化,我使用了bokeh.plotting库中的一个Figure对象,并使用circle()绘制了黑色圆圈 方法。

Clearly, the Beatles, Bob Dylan, and the Rolling Stones topped the list with 10 albums apiece.

显然,甲壳虫乐队,鲍勃·迪伦和滚石乐队以每张10张专辑高居榜首。

列表中的专辑数量的年度计数 (Year-wise count of the number of albums in the list)

To get this, I used groupby() on the year column and took a count following which I sorted the data by year and plotted the resulting data frame using a line chart from bokeh.charts.

为此,我在year列上使用groupby()并进行了计数,然后按年份对数据进行了排序,并使用了来自bokeh.charts的折线图来绘制结果数据框。

Maximum number of albums in the list were released in 1970. Albums released in the late 1960s and early 1970s were also found abundantly. The final spike is found in the early 1990s accounting for the outbreak of Pop, R&B, and Hip-Hop music.

该列表中的专辑数量最多,是1970年发行的。还发现了1960年代末和1970年代初发行的专辑。 最后一个高峰出现在1990年代初,原因是流行音乐,节奏布鲁斯音乐和嘻哈音乐的爆发。

热门流派和子流派 (Top Genres and Subgenres)

To identify the top genres and the subgenres within them, I reshaped the data using the pandas pivot_table() function in which I set the index as the Genre_Refined and Subgenre_Refine columns, and set the aggfunc parameter to count.

为了识别顶级流派和其中的子流派,我使用pandasivot_table()函数重塑了数据,在该函数中,我将索引设置为Genre_Refined和Subgenre_Refine列,并将aggfunc参数设置为count。

After taking a subset of the data frame using a condition that there should be more than 5 albums in a subgenre, I fed the data frame to a bokeh donut chart and set the palette to Purples9.

在子流派中应有5张以上专辑的条件下获取数据框的子集后,我将数据框馈入bokeh 甜甜圈图,并将调色板设置为Purples9。

Rock and its subgenres cover about 80% of the selection. Hip-Hop, R&B, Soul, Country, and Electronic music albums covered the remaining 20%.

岩石及其子流派覆盖了大约80%的选择。 嘻哈,R&B,灵魂,乡村和电子音乐专辑覆盖了剩余的20%。

各流派的歌曲(按年份) (Songs in each Genre by year)

To get this data, I did a groupby() on Year and Genre_Refined, took the count, sorted the values by Year, and fed the resulting data frame to a bokeh heatmap. This time I used the Reds9 palette.

为了获得这些数据,我做了一个groupby() 在Year和Genre_Refined上,对计数进行计数,按Year对值进行排序,然后将得到的数据帧提供给bokeh 热图 。 这次我使用了Reds9调色板。

Rock music albums from the late 60s and the 70s are clearly the most numerous. Funk, Soul, and Jazz music albums reduced in numbers over the years, paving the way for Hip-Hop and Electronic albums.

60年代和70年代末的摇滚音乐专辑显然是最多的。 近年来,Funk,Soul和Jazz音乐专辑的数量有所减少,为嘻哈和电子专辑铺平了道路。

多年以来的岩石亚体 (Subgenres of Rock Over the Years)

To get this data, I did a groupby() on the Year, Genre_Refined, and Subgenre_Refined, took a count, and subset the data frame to include just Rock in the Genre_Refined column. I then fed the resulting data frame to a bokeh heatmap.

为了获得此数据,我在Year,Genre_Refined和Subgenre_Refined上进行了一个groupby() ,进行了计数,并对数据框进行了子集处理,以在Genre_Refined列中仅包含Rock。 然后,我将得到的数据框输入到bokeh热图。

The initial few years were dominated by Rock & Roll, while Blues Rock and Pop Rock slowly increased in number by the mid 1960s. By the mid 1970s, Alternative Rock started coming into the picture, followed by Indie Rock in the mid 1980s.

最初的几年以摇滚乐为主,而布鲁斯摇滚乐和流行摇滚乐的数量在1960年代中期逐渐增加。 到1970年代中期,Alternative Rock开始出现,随后是1980年代中期的Indie Rock。

前10张专辑的摘要 (A summary of the Top 10 albums)

Finally, I summarized the top 10 albums in the list after grouping it by artist.

最后,在按艺术家分组后,我总结了列表中的前10张专辑。

The final results are not really surprising. The Rolling Stone Magazine list mostly contains songs from from Rock and its various subgenres, with a few outliers in the form of Hip-Hop, R&B, Soul, Country, and Electronic music albums.

最终结果并不令人惊讶。 《滚石杂志》列表主要包含来自Rock及其各个子流派的歌曲,以及一些以Hip-Hop,R&B,Soul,Country和Electronic音乐专辑形式出现的离群值。

If you’re like me and like to occasionally reconnect with the music of the Beatles, Bob Dylan, Rolling Stones, and the other pioneers of Rock and Roll during the 60s and 70s, I suggest you give these top albums a listen, then explore from there.

如果您像我一样,喜欢偶尔与甲壳虫乐队,鲍勃·迪伦,滚石乐队以及摇滚乐的其他先驱者重新建立联系,那么建议您试听一下这些顶级专辑,然后再进行探索从那里。

If you’re curious, you can read the full list of albums here.

如果您有好奇心,可以在此处阅读相册的完整列表。

I’m a technology consultant, data science enthusiast, and aspiring music producer. If you have writing opportunities or are interested in getting in touch for work, feel free to write to me at contact at gautham dot biz.

我是技术顾问,数据科学爱好者和有抱负的音乐制作人。 如果您有写作机会或有兴趣联系工作,请随时与gautham dot biz的联系人联系。

If you liked this article, please hit the recommend button and share it with your friends.

如果您喜欢这篇文章,请点击“推荐”按钮并与您的朋友分享。

翻译自: https://www.freecodecamp.org/news/visualising-rolling-stones-500-greatest-songs-using-bokeh-78ebc0eaff3f/

bokeh pandas

bokeh pandas_使用Pandas和Bokeh将Rolling Stone的500张最伟大专辑可视化相关推荐

  1. pandas使用resample函数计算每个月的统计均值、使用matplotlib可视化特定年份的按月均值

    pandas使用resample函数计算每个月的统计均值.使用matplotlib可视化特定年份的按月均值(month mean with resample and viz with matplotl ...

  2. pandas中时间窗函数rolling的使用

    在建模过程中,我们常常需要需要对有时间关系的数据进行整理.比如我们想要得到某一时刻过去30分钟的销量(产量,速度,消耗量等),传统方法复杂消耗资源较多,pandas提供的rolling使用简单,速度较 ...

  3. python划分数据集用pandas_用pandas划分数据集实现训练集和测试集

    1.使用model_select子模块中的train_test_split函数进行划分 数据:使用kaggle上Titanic数据集 划分方法:随机划分 # 导入pandas模块,sklearn中mo ...

  4. Java入门超简单程序Song List

    题目 The goal of this project is to write an application for maintaining a list of songs. Each song ha ...

  5. python图表交互控件_用djang中的交互式控件制作bokeh图表

    有两个用例: 没有服务器 如果您可以在JS中执行任何更新(不需要调用实际的python代码),那么使用CustomJS callbacks添加交互非常容易.在这个链接中有很多示例,但是一个基本的简单代 ...

  6. Python Bokeh 库进行数据可视化实用指南

    写在前面 我相信大家已经阅读了不少有关"机器学习"."数据科学家"."数据可视化"等话题的文章.有些人将数据科学称为 21 世纪最性感的工作 ...

  7. Bokeh,一个超强交互式Python可视化库!

    今天这篇推文,给大家介绍一下Python中常用且可灵活交互使用的的可视化绘制包- Bokeh,由于网上关于该包较多及官方介绍也较为详细,这里就在不再过多介绍,我们直接放出几副精美的可视化作品供大家欣赏 ...

  8. matplotlib可视化_EDA:Geopandas,Matplotlib和Bokeh中的可视化

    matplotlib可视化 Nowadays, everyone is immersed with plenty of data from news sources, cellphones, lapt ...

  9. python bokeh_提升视觉效果:使用Python和Bokeh制作交互式地图

    python bokeh Let's face it, fellow data scientists: our clients LOVE dashboards. Why wouldn't they? ...

最新文章

  1. arm中断保护和恢复_ARM中断异常处理的返回
  2. C++ const 理解
  3. ios辅助功能之voiceover实战
  4. Swift3 获取当前连接WIFI名称
  5. 「小程序JAVA实战」java-sesion的状态会话与无状态会话(38)
  6. GUN Global + Vim及其插件 打造Android源码阅读器
  7. 测试lazy_enable_if的所有变体
  8. SAP S/4HANA 的30天免费试用版
  9. C#中 ??、 ?、 ?: 、?.、?[ ]、:
  10. 1260 不一样的A+B
  11. 精进不休 .NET 4.0 (2) - asp.net 4.0 新特性之url路由
  12. 华为云的研究成果又双叒叕被MICCAI收录了!
  13. linux服务占用的真实内存,为什么TOP看不出真实的内存占用情况?
  14. mysql忘记命令后半部分_MySQL常用命令
  15. python学习系列--str类型
  16. ibm 服务器 win7系统安装,最详细thinkpad win7系统重装教程
  17. STM32选型与命名规则
  18. HTML5 播放视频代码
  19. win10安装wget,从此可以更快的下载文件 and windows10 下 zip命令行参数详解
  20. ArduinoUNO实战-第十七章-火焰传感器

热门文章

  1. [置顶]完美简版学生信息管理系统(附有源码)管理系统
  2. wireshark-wincap安装问题
  3. 学习笔记之vue根据权限动态添加路由
  4. 一个6年iOS程序员的工作感悟,送给还在迷茫的你
  5. 【转】不分主副卡!全网通5.0时代到来
  6. springMVC swagger2
  7. keepalived and heartbeat
  8. 全球SDN测试认证中心发布OpenDaylight测试报告
  9. 优化eclipse启动速度
  10. iOS 预览word pdf 文件