关于爬取今日头条图片中的链接的提取（ajax）

在爬取今日头条的图片时，由于今日头条用了ajax加载图片，所以，通过re模块来对链接进行提取，但是在提取的过程中，遇到了一点小问题，如图：

['"{\\"count\\":9,\\"sub_images\\":[{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/418185332_tt\\",\\"width\\":1200,\\"url_list\\":[{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/418185332_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb9.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/418185332_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb1.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/418185332_tt\\"}],\\"uri\\":\\"origin\\\\/tuchong.fullscreen\\\\/418185332_tt\\",\\"height\\":1200},{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/529858694_tt\\",\\"width\\":1200,\\"url_list\\":[{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/529858694_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb9.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/529858694_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb1.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/529858694_tt\\"}],\\"uri\\":\\"origin\\\\/tuchong.fullscreen\\\\/529858694_tt\\",\\"height\\":1200},{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/374079621_tt\\",\\"width\\":1200,\\"url_list\\":[{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/374079621_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb9.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/374079621_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb1.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/374079621_tt\\"}],\\"uri\\":\\"origin\\\\/tuchong.fullscreen\\\\/374079621_tt\\",\\"height\\":1200},{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/583008374_tt\\",\\"width\\":1200,\\"url_list\\":[{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/583008374_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb9.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/583008374_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb1.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/583008374_tt\\"}],\\"uri\\":\\"origin\\\\/tuchong.fullscreen\\\\/583008374_tt\\",\\"height\\":1200},{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/458686594_tt\\",\\"width\\":1200,\\"url_list\\":[{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/458686594_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb9.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/458686594_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb1.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/458686594_tt\\"}],\\"uri\\":\\"origin\\\\/tuchong.fullscreen\\\\/458686594_tt\\",\\"height\\":1200},{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/147390595_tt\\",\\"width\\":1200,\\"url_list\\":[{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/147390595_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb9.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/147390595_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb1.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/147390595_tt\\"}],\\"uri\\":\\"origin\\\\/tuchong.fullscreen\\\\/147390595_tt\\",\\"height\\":1200},{\\"url\\":\\"http:\\\\/\\\\/p1.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/22543963_tt\\",\\"width\\":1200,\\"url_list\\":[{\\"url\\":\\"http:\\\\/\\\\/p1.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/22543963_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/22543963_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb9.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/22543963_tt\\"}],\\"uri\\":\\"origin\\\\/tuchong.fullscreen\\\\/22543963_tt\\",\\"height\\":1200},{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/552992907_tt\\",\\"width\\":1200,\\"url_list\\":[{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/552992907_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb9.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/552992907_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb1.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/552992907_tt\\"}],\\"uri\\":\\"origin\\\\/tuchong.fullscreen\\\\/552992907_tt\\",\\"height\\":1200},{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/420610157_tt\\",\\"width\\":1200,\\"url_list\\":[{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/420610157_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb9.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/420610157_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb1.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/420610157_tt\\"}],\\"uri\\":\\"origin\\\\/tuchong.fullscreen\\\\/420610157_tt\\",\\"height\\":1200}],\\"max_img_width\\":1200,\\"labels\\":[\\"\\\\u6444\\\\u5f71\\"],\\"sub_abstracts\\":[\\" \\\\u6444\\\\u5f71\\\\uff1a\\\\u61d2\\\\u4ebade\\\\u903b\\\\u8f91\\",\\" \\",\\" \\",\\" \\",\\" \\",\\" \\",\\" \\",\\" \\",\\" \\"],\\"sub_titles\\":[\\"\\\\u56fe\\\\u866b\\\\u8857\\\\u62cd\\\\u6444\\\\u5f71\\\\uff1a\\\\u8857\\\\u62cd06\\",\\"\\\\u56fe\\\\u866b\\\\u8857\\\\u62cd\\\\u6444\\\\u5f71\\\\uff1a\\\\u8857\\\\u62cd06\\",\\"\\\\u56fe\\\\u866b\\\\u8857\\\\u62cd\\\\u6444\\\\u5f71\\\\uff1a\\\\u8857\\\\u62cd06\\",\\"\\\\u56fe\\\\u866b\\\\u8857\\\\u62cd\\\\u6444\\\\u5f71\\\\uff1a\\\\u8857\\\\u62cd06\\",\\"\\\\u56fe\\\\u866b\\\\u8857\\\\u62cd\\\\u6444\\\\u5f71\\\\uff1a\\\\u8857\\\\u62cd06\\",\\"\\\\u56fe\\\\u866b\\\\u8857\\\\u62cd\\\\u6444\\\\u5f71\\\\uff1a\\\\u8857\\\\u62cd06\\",\\"\\\\u56fe\\\\u866b\\\\u8857\\\\u62cd\\\\u6444\\\\u5f71\\\\uff1a\\\\u8857\\\\u62cd06\\",\\"\\\\u56fe\\\\u866b\\\\u8857\\\\u62cd\\\\u6444\\\\u5f71\\\\uff1a\\\\u8857\\\\u62cd06\\",\\"\\\\u56fe\\\\u866b\\\\u8857\\\\u62cd\\\\u6444\\\\u5f71\\\\uff1a\\\\u8857\\\\u62cd06\\"]}"']

提取出来的文本全部都转义了的，解决方法也十分的简单，用replace来进行替换：

replace('\\\\','\\')replace('\\"','"')

然后用json.loads(),将str 转换为dict

这样，就可以获得正常的json数据了

关于爬取今日头条图片中的链接的提取（ajax）相关推荐

python分析并爬取今日头条的视频链接
如题,分析并爬取今日头条的视频链接代码仅供交流使用一.分析 1.进入现在的官网http://www.365yg.com/,然后通过抓包发现首页数据的走向,一般来说首页数据放在网页中,要不然就是用j ...
python爬虫今日头条_python爬虫—分析Ajax请求对json文件爬取今日头条街拍美图
python爬虫-分析Ajax请求对json文件爬取今日头条街拍美图前言本次抓取目标是今日头条的街拍美图,爬取完成之后,将每组图片下载到本地并保存到不同文件夹下.下面通过抓取今日头条街拍美图讲解一 ...
python爬取今日头条的文章_Python3爬取今日头条有关《人民的名义》文章
Python3爬取今日头条有关<人民的名义>文章最近一直在看Python的基础语法知识,五一假期手痒痒想练练,正好<人民的名义>刚结束,于是决定扒一下头条上面的人名的名义文章 ...
python抽取指定url页面的title_Python使用scrapy爬虫，爬取今日头条首页推荐新闻
爬取今日头条https://www.toutiao.com/首页推荐的新闻,打开网址得到如下界面查看源代码你会发现全是js代码,说明今日头条的内容是通过js动态生成的. 用火狐浏览器F12查看得知 ...
python爬取今日头条专栏_[python3]今日头条图片爬取
前言代码设计流程,先模拟ajax发送搜索"街拍美女",提取返回json里面的article_url,再访问article_url,提取article_url响应的图片url,访问 ...
利用Ajax爬取今日头条头像，街拍图片。关于崔庆才python爬虫爬取今日头条街拍内容遇到的问题的解决办法。
我也是初学爬虫,在看到崔庆才大佬的爬虫实战:爬取今日头条街拍美图时,发现有些内容过于陈旧运行程序时已经报错,网页的源代码早已不一样了.以下是我遇到的一些问题. 1.用开发者选项筛选Ajax文件时预览看 ...
java爬取今日头条_今日头条抓取街拍图片数据集
spider1: 抓取街拍页面的所有入口链接: 1.数据查看到,街拍页面需要的数据集都在data这个集合中,而data是整个数据集字典的一个键,data这个键又包括了一个list,list中是一个个字 ...
[爬虫笔记01] Ajax爬取今日头条文章
1.爬取分析我们首先打开今日头条,搜索"罗志祥" 打开浏览器的开发者工具,红色框中就是我们请求到的数据将搜索界面的滚动条滑到底,在开发者工具中就可以看到所有请求到的数据,加上前 ...
博客搬家系列（六）-爬取今日头条文章
博客搬家系列(六)-爬取今日头条文章一.前情回顾博客搬家系列(一)-简介:https://blog.csdn.net/rico_zhou/article/details/83619152 博客搬家 ...

关于爬取今日头条图片中的链接的提取（ajax）

关于爬取今日头条图片中的链接的提取（ajax）相关推荐

最新文章

热门文章