笔者是一个痴迷于挖掘数据中的价值的学习人,希望在平日的工作学习中,挖掘数据的价值,找寻数据的秘密,笔者认为,数据的价值不仅仅只体现在企业中,个人也可以体会到数据的魅力,用技术力量探索行为密码,让大数据助跑每一个人,欢迎直筒们关注我的公众号,大家一起讨论数据中的那些有趣的事情。

我的公众号为:livandata

网站的分析是数据分析师的一项常用技能,主要是通过了解客户、网站、业务逻辑构建以套完整的指标体系,协助分析师了解网站的运营情况,这一分析可以协助产品经理熟悉网站不同功能的使用情况,协助营销人员了解哪一类客户喜欢什么样的产品,协助运营人员了解流程上的不合理,总之,在现在情况下,网站分析为企业决策人员了解企业提供了一个线上的全量数据集。

网站分析可以分为:日志分析、流量分析、客群分析等;

常用的指标有:

1)页面停留时长;

2)跳出率/退出率;

3)页面深度分析;

4)唯一身份浏览量;

5)访问者/唯一身份访问者;

6)访问频率/新老客群区分;

7)曝光量/曝光达标率/唯一曝光量/曝光频率;

8)点击量/点击达标率/唯一点击量/点击频率;

9)曝光点击率;

10)转化率;

11)综合浏览量(PV)/独立访问用户量(UV)。

由于近期项目需要,构建了一套转化率分析代码,用户漏斗构建,本文先对转化率进行分析,后面会逐渐的分析剩下的各个指标。

项目的起源是统计某APP中,用户在每个页面上的转化率,常用的思路为先计算每个页面的PV和UV,然后计算一个页面访问流程中不同节点的页面的转化率。

即:PV转化率=PV(第i+1节点)/PV(第i节点)。

UV转化率=UV(第i+1节点)/UV(第i节点)。

文中由于不确定UV的统计指标,所有构建了两个指标(UV和UV_M),思路上没有影响,后续比较哪个更准一点即可。

在项目进展过程中遇到了一些问题,即python的运算平台与数据平台,无法实现网络联通,请教了公司的前辈,确定了如下思路,在此项目中曲线救国,通过代码拼写出HQL语言,通过文件手动传入到数据平台上,每天运行HQL,将数据结果存储在csv的文件中,再手动将数据传入到运算平台,计算出转化率,并通过移动平滑构建转化率监控平台。

因此,项目中的流程为:

1)整理需要构建漏斗的流程,按照固定的格式存入到excel文件中;

2)按照excel中的内容格式拼写HQL;

3)将生成的HQL传到数据平台,定时抽取每天、每个漏斗、每个节点的PV/UV值;

4)将数据传入到运算平台,汇总节点,计算他们对应的PV/UV转化率;

5)将转化率呈现在漏斗中;

6)将每天转化率的值传入到监控平台,按天监控每个流程的转化率。

用到的漏斗模版为:

funnel_name funnel_time_start funnel_time_end funnel_node_name step event_type event_name action page_id page_name product_id inner_id push_id last_page_id label_id label context_id context datetime
login 20190203 20190305 login_page 1 page_level 页面级别 5 page_login_1 login_page                 20190203
login 20190203 20190305 login_label 2 label_level 区域级别 5 page_login_1 login_page         label_login_1 login_label     20190203
print 20190203 20190305 print_page 1 page_level 页面级别 5 page_print_1 print_page                 20190203
print 20190203 20190305 print_page 1 page_level 页面级别 5 page_print_2 print_page                 20190203
print 20190203 20190305 print_page 1 page_level 页面级别 5 page_print_3 print_page                 20190203
print 20190203 20190305 print_label 2 label_level 区域级别 5 page_print_2 print_page       page_print_1 label_print print_label     20190203
print 20190203 20190305 print_label 2 label_level 区域级别 5 page_print_4 print_page       page_print_1 label_print print_label     20190203

关于数据的解读:

1)页面流程有三个级别:页面级别、区域级别和事件级别。

2)同一个页面中有多个区域区、同一个区域区有多个事件,用户可以操作其中的任意节点,用户操作区域区时页面首先有值,操作事件区时前两个也有值。

3)由于数据量比较大,数据平台将数据切片成按天的数据,每天一个HQL,统计数据,然后合并。

对应的代码为:

1)funnel_build文件:

#!/usr/bin/env python
# _*_ UTF-8 _*_import pandas as pd
import datetime
# 构建漏斗类:
class Funnel_build(object):def __init__(self, path):self.path = pathdef get_data(self):path = self.pathfunnel_data = pd.read_excel(path, dtype='O').fillna('nan')return funnel_datadef get_sql(self,funnel_name,funnel_time_start,funnel_time_end,funnel_node_name,step,event_type,event_name,action,page_id,page_name,product_id,inner_id,push_id,last_page_id,label_id,label,context_id,context,cond):sql = 'SELECT' \'\n\t %s funnel_name,' \'\n\t %s funnel_time_start,' \'\n\t %s funnel_time_end,' \'\n\t %s funnel_node_name,' \'\n\t %s step,' \'\n\t %s event_type,' \'\n\t %s event_name,' \'\n\t %s action,' \'\n\t %s page_id,' \'\n\t %s page_name,' \'\n\t %s product_id,' \'\n\t %s inner_id,' \'\n\t %s push_id,' \'\n\t %s last_page_id,' \'\n\t %s label_id,' \'\n\t %s label,' \'\n\t %s context_id,' \'\n\t %s context,' \'\n\t t.dt datetime,' \'\n\t count(1) pv,' \'\n\t size(collect_set(t.becif_no)) uv,' \'\n\t count(distinct t.mid) uv_m' \'\nFROM mid.tracker_action_event t' \'\nWHERE' \'\n\t %s' % ("'"+str(funnel_name)+"'","'"+str(funnel_time_start)+"'","'"+str(funnel_time_end)+"'","'"+str(funnel_node_name)+"'","'"+str(step)+"'","'"+str(event_type)+"'","'"+str(event_name)+"'","'"+str(action)+"'","'"+str(page_id)+"'","'"+str(page_name)+"'","'"+str(product_id)+"'","'"+str(inner_id)+"'","'"+str(push_id)+"'","'"+str(last_page_id)+"'","'"+str(label_id)+"'","'"+str(label)+"'","'"+str(context_id)+"'","'"+str(context)+"'",cond)return sqldef add_times(self, funnel_time_start, funnel_time_end):period = []# 字符串转换为datetime类型times1 = datetime.datetime.strptime(str(funnel_time_start), '%Y%m%d')times2 = datetime.datetime.strptime(str(funnel_time_end), '%Y%m%d')# 利用datetime计算时间差并格式化输出times = str(times2 - times1).split(',')times = times[0].split(' ')for j in range(0, int(times[0])):delta = datetime.timedelta(days=j)next_day = times1 + deltanext_day = str(next_day).split(' ')[0]next_day = next_day.split('-')next_day = next_day[0] + next_day[1] + next_day[2]period.append(next_day)return perioddef join_sqls(self, sql_list):j0 = '\n\nunion all\n'return j0.join(sql_list)def built_sql(self):sqls = []funnel_data = self.get_data()for i in range(len(funnel_data)):if((funnel_data['page_id'][i] != 'nan')&(funnel_data['label_id'][i] == 'nan')&(funnel_data['context_id'][i] == 'nan')):cond="\nand\tt.page_id='%s'" %(funnel_data['page_id'][i])if ((funnel_data['page_id'][i] != 'nan')&(funnel_data['label_id'][i] != 'nan')&(funnel_data['context_id'][i] == 'nan')):cond1 = "\nand\tt.page_id='%s'" % (funnel_data['page_id'][i])cond2 = "\nand\tt.label_id='%s'" % (funnel_data['label_id'][i])cond = cond1 + '\t' + cond2if ((funnel_data['page_id'][i] != 'nan')&(funnel_data['label_id'][i] != 'nan')&(funnel_data['context_id'][i] != 'nan')):cond1 = "\nand\tt.page_id='%s'" % (funnel_data['page_id'][i])cond2 = "\nand\tt.label_id='%s'" % (funnel_data['label_id'][i])cond3 = "\nand\tt.name_id='%s'" % (funnel_data['context_id'][i])cond = cond1 + '\t' + cond2 + '\t' + cond3cond_time = self.add_times(funnel_data['funnel_time_start'][i],funnel_data['funnel_time_end'][i])for j in range(len(cond_time)):cond_t = "t.dt='" + cond_time[j] + "'\t"conds = cond_t + condsql = self.get_sql(funnel_data['funnel_name'][i],funnel_data['funnel_time_start'][i],funnel_data['funnel_time_end'][i],funnel_data['funnel_node_name'][i],funnel_data['step'][i],funnel_data['event_type'][i],funnel_data['event_name'][i],funnel_data['action'][i],funnel_data['page_id'][i],funnel_data['page_name'][i],funnel_data['product_id'][i],funnel_data['inner_id'][i],funnel_data['push_id'][i],funnel_data['last_page_id'][i],funnel_data['label_id'][i],funnel_data['label'][i],funnel_data['context_id'][i],funnel_data['context'][i],conds)sqls.append(sql)sqls_total = self.join_sqls(sqls)return sqls_total

2)常规工具代码:Funnel_utils

#!/usr/bin/env python
# _*_ UTF-8 _*_
import pandas as pd
import numpy as npdef ch_dtype(df):dtype = dict(funnel_id=np.str, idx=np.int, page_id=np.str,page_name=np.str, pv=np.int64, date_=np.str,funnel_name=np.str, uv=np.int64, dt=np.str,product_id=np.str, push_id=np.str, inner_id=np.str)return df.astype(dtype)def write(string, path):with open(path, 'w') as f:f.write(string)def write_excel(df, path):writer = pd.ExcelWriter(path)try:df.to_excel(writer)except Exception as e:raise efinally:writer.close()def set_ch_font():return FontProperties(fname='../data/font/msyh.ttf')def read_page_info():return pd.read_csv('info/page_info.csv', index_col = 0), set_index('page_id')def write_csv(df, path):df.to_csv(path)def read_funnel_info():return pd.read_table('../data/funnel_info.txt').fillna('nan')def read_funnel_info_xls():return pd.read_excel('../data/funnel_infos.xlsx').fillna('nan')def join_sqls(sql_list):j0 = '\n\nUNION ALL\n'return j0.join(sql_list)def format_cols(df):df.columns = [c.split('.')[1] for c in df.columns]def product(x):def _product(x, y):if x:if y:z = []for i in x.pop():for k in y:if isinstance(k, list):ik = [i]ik.extend(k)else:z.append([i, k])y = zelse:y = x.pop()return _product(x, y)y = []return _product(x, y)def flow_means(data):ms = []for i in range(len(data)):sums = 0for j in range(i, i + 10):if (j < len(data)):sums = sums + data[j]else:sums = sums + 0if (len(data) - i >= 10):means = sums / 10else:means = sums / (len(data) - i)ms.append(means)return ms

3)数据的整理代码为:Funnel_data

#!/usr/bin/env python
# _*_ UTF-8 _*_import pandas as pdclass Funnel_data(object):def __init__(self, path):self.path = pathdef get_data(self):path = self.pathresult_data = pd.read_excel(path, dtype='O').fillna('nan')return result_datadef get_funnels_name(self):path = self.pathresult_data = pd.read_excel(path, dtype='O').fillna('nan')funnel_names = result_data['funnel_name'].drop_duplicates().tolist()return funnel_names# 单步转化率:本漏斗、本节点、所有天的求和。def result_calculate_single(self):result_data = self.get_data()funnel_data = result_data[['funnel_name', 'step', 'pv', 'uv', 'uv_m']]sum_pv = funnel_data['pv'].groupby([funnel_data['funnel_name'], funnel_data['step']]).sum()sum_uv = funnel_data['uv'].groupby([funnel_data['funnel_name'], funnel_data['step']]).sum()sum_uv_m = funnel_data['uv_m'].groupby([funnel_data['funnel_name'], funnel_data['step']]).sum()sum_data = pd.DataFrame(pd.concat([sum_pv, sum_uv, sum_uv_m], axis=1)).reset_index()funnel_names = sum_data['funnel_name'].drop_duplicates()funnel_pv_rates = {}funnel_uv_rates = {}funnel_uv_m_rates = {}for funnel_name in funnel_names:data = sum_data[sum_data['funnel_name'] == funnel_name]# pv计算:pv_rates = []for i in range(1, 7):t1 = data[data['step'] == i+1]['pv']t2 = data[data['step'] == i]['pv']if(len(t1.values) != 0):pv_rate = float(t1)/float(t2)pv_rates.append(round(pv_rate, 6))else:pv_rate = 0pv_rates.append(pv_rate)funnel_pv_rates[funnel_name] = pv_rates# uv计算:uv_rates = []for i in range(1, 7):t1 = data[data['step'] == i + 1]['uv']t2 = data[data['step'] == i]['uv']if (len(t1.values) != 0):uv_rate = float(t1) / float(t2)uv_rates.append(round(uv_rate, 6))else:uv_rate = 0uv_rates.append(uv_rate)funnel_uv_rates[funnel_name] = uv_rates# uv_m计算:uv_m_rates = []for i in range(1, 7):t1 = data[data['step'] == i + 1]['uv_m']t2 = data[data['step'] == i]['uv_m']if (len(t1.values) != 0):uv_m_rate = float(t1) / float(t2)uv_m_rates.append(round(uv_m_rate, 6))else:uv_m_rate = 0uv_m_rates.append(uv_m_rate)funnel_uv_m_rates[funnel_name] = uv_m_ratesfunnel_pv = pd.DataFrame(funnel_pv_rates)funnel_uv = pd.DataFrame(funnel_uv_rates)funnel_uv_m = pd.DataFrame(funnel_uv_m_rates)funnel_pv_single = funnel_pv.rename({0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},axis='index')funnel_uv_single = funnel_uv.rename({0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},axis='index')funnel_uv_m_single = funnel_uv_m.rename({0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},axis='index')funnel_pv_single.index.name = '环节'funnel_uv_single.index.name = '环节'funnel_uv_m_single.index.name = '环节'funnel_pv_single = funnel_pv_single.reset_index()funnel_uv_single = funnel_uv_single.reset_index()funnel_uv_m_single = funnel_uv_m_single.reset_index()print(funnel_pv_single)return funnel_pv_single, funnel_uv_single, funnel_uv_m_single# 汇总转化率:本漏斗、本节点、所有天的求和。def result_calculate_total(self):result_data = self.get_data()funnel_data = result_data[['funnel_name', 'step', 'pv', 'uv', 'uv_m']]sum_pv = funnel_data['pv'].groupby([funnel_data['funnel_name'], funnel_data['step']]).sum()sum_uv = funnel_data['uv'].groupby([funnel_data['funnel_name'], funnel_data['step']]).sum()sum_uv_m = funnel_data['uv_m'].groupby([funnel_data['funnel_name'], funnel_data['step']]).sum()sum_data = pd.DataFrame(pd.concat([sum_pv, sum_uv, sum_uv_m], axis=1)).reset_index()funnel_names = sum_data['funnel_name'].drop_duplicates()funnel_pv_total_rates = {}funnel_uv_total_rates = {}funnel_uv_m_total_rates = {}for funnel_name in funnel_names:data = sum_data[sum_data['funnel_name'] == funnel_name]# pv计算:pv_rates = []for i in range(1, 7):t1 = data[data['step'] == i+1]['pv']t2 = data[data['step'] == 1]['pv']if (len(t1.values) != 0):pv_rate = float(t1) / float(t2)pv_rates.append(round(pv_rate, 6))else:pv_rate = 0pv_rates.append(pv_rate)funnel_pv_total_rates[funnel_name] = pv_rates# uv计算:uv_rates = []for i in range(1, 7):t1 = data[data['step'] == i + 1]['uv']t2 = data[data['step'] == 1]['uv']if (len(t1.values) != 0):uv_rate = float(t1) / float(t2)uv_rates.append(round(uv_rate, 6))else:uv_rate = 0uv_rates.append(uv_rate)funnel_uv_total_rates[funnel_name] = uv_rates# uv_m计算:uv_m_rates = []for i in range(1, 7):t1 = data[data['step'] == i + 1]['uv_m']t2 = data[data['step'] == 1]['uv_m']if (len(t1.values) != 0):uv_m_rate = float(t1) / float(t2)uv_m_rates.append(round(uv_m_rate, 6))else:uv_m_rate = 0uv_m_rates.append(uv_m_rate)funnel_uv_m_total_rates[funnel_name] = uv_m_ratesfunnel_pv = pd.DataFrame(funnel_pv_total_rates)funnel_uv = pd.DataFrame(funnel_uv_total_rates)funnel_uv_m = pd.DataFrame(funnel_uv_m_total_rates)funnel_pv_total = funnel_pv.rename({0: '1-->2', 1: '1-->3', 2: '1-->4', 3: '1-->5', 4: '1-->6', 5: '1-->7'},axis='index')funnel_uv_total = funnel_uv.rename({0: '1-->2', 1: '1-->3', 2: '1-->4', 3: '1-->5', 4: '1-->6', 5: '1-->7'},axis='index')funnel_uv_m_total = funnel_uv_m.rename({0: '1-->2', 1: '1-->3', 2: '1-->4', 3: '1-->5', 4: '1-->6', 5: '1-->7'},axis='index')funnel_pv_total.index.name = '环节'funnel_uv_total.index.name = '环节'funnel_uv_m_total.index.name = '环节'funnel_pv_total = funnel_pv_total.reset_index()funnel_uv_total = funnel_uv_total.reset_index()funnel_uv_m_total = funnel_uv_m_total.reset_index()return funnel_pv_total, funnel_uv_total, funnel_uv_m_total# 单个漏斗的数据:本漏斗、本节点、所有天的求和。def data_combine_funnel(self, funnel_name):funnel_pv_total, funnel_uv_total, funnel_uv_m_total = self.result_calculate_total()funnel_pv_single, funnel_uv_single, funnel_uv_m_single = self.result_calculate_single()funnel_index = funnel_pv_single['环节']funnel_pv_t = funnel_pv_total[funnel_name]funnel_uv_t = funnel_uv_total[funnel_name]funnel_uv_m_t = funnel_uv_m_total[funnel_name]funnel_pv_s = funnel_pv_single[funnel_name]funnel_uv_s = funnel_uv_single[funnel_name]funnel_uv_m_s = funnel_uv_m_single[funnel_name]funnel_pv = pd.concat([funnel_index, funnel_pv_s, funnel_pv_t], axis=1)funnel_pv.columns = ['环节', '单一环节转化率', '总体转化率']funnel_uv = pd.concat([funnel_index, funnel_uv_t, funnel_uv_s], axis=1)funnel_uv.columns = ['环节', '单一环节转化率', '总体转化率']funnel_uv_m = pd.concat([funnel_index, funnel_uv_m_t, funnel_uv_m_s], axis=1)funnel_uv_m.columns = ['环节', '单一环节转化率', '总体转化率']return funnel_pv, funnel_uv, funnel_uv_m

4)数据的绘图代码:Funnel_plot

#!/usr/bin/env python
# _*_ UTF-8 _*_from pyecharts import Funnel, Page, Line
from Funnel_livan import Funnel_utils
import numpy as np
import osclass Funnel_plot(object):def __init__(self, name, data=[]):self.name = nameself.data = datadef draw_plot(self):funnel_name = self.namefunnels = self.datapage = Page()for funnel in funnels:funnel_list = funnel['环节'].tolist()funnel_l_total = (np.array(funnel.ix[:, [1]]) * 100).tolist()funnel_plot = Funnel('%s' % funnel_name,width=800,height=400,title_pos='center')funnel_plot.add(name=funnel_name,  # 指定图例名称attr=funnel_list,  # 指定属性名称value=funnel_l_total,  # 指定漏斗所对应的值is_label_show=True,  # 指定标签是否显示label_formatter='{c}%',  # 指定标签显示的格式label_pos="inside",  # 指定标签的位置legend_orient='vertical',  # 指定图例的方向legend_pos='left',  # 指定图例的位置is_legend_show=True)  # 指定图例是否显示has_files = os.path.exists(funnel_name)if not has_files:os.mkdir('./' + funnel_name)funnel_plot.render(path='./%s/%s.gif' % (funnel_name, funnel_name))page.add(funnel_plot)page.render("./plots/%s.html" % funnel_name)return pagedef check_plot(self):funnel_name = self.namefunnels = self.datapage = Page()for funnel in funnels:funnel_n = funnel['环节'].tolist()# 横轴:funnel_c = funnel.columns.tolist()attr = []for i in range(1, len(funnel_c)):attr.append(str(funnel_c[i]))# 取值:v1 = funnel[funnel['环节'] == funnel_n[0]][:].filter(regex="[^环节]").iloc[0, :].tolist()v2 = funnel[funnel['环节'] == funnel_n[1]][:].filter(regex="[^环节]").iloc[0, :].tolist()v3 = funnel[funnel['环节'] == funnel_n[2]][:].filter(regex="[^环节]").iloc[0, :].tolist()v4 = funnel[funnel['环节'] == funnel_n[3]][:].filter(regex="[^环节]").iloc[0, :].tolist()v5 = funnel[funnel['环节'] == funnel_n[4]][:].filter(regex="[^环节]").iloc[0, :].tolist()v6 = funnel[funnel['环节'] == funnel_n[5]][:].filter(regex="[^环节]").iloc[0, :].tolist()mov_mean1 = Funnel_utils.flow_means(v1)mov_mean2 = Funnel_utils.flow_means(v2)mov_mean3 = Funnel_utils.flow_means(v3)mov_mean4 = Funnel_utils.flow_means(v4)mov_mean5 = Funnel_utils.flow_means(v5)mov_mean6 = Funnel_utils.flow_means(v6)mov_mean_up1 = []mov_mean_down1 = []mov_mean_up2 = []mov_mean_down2 = []mov_mean_up3 = []mov_mean_down3 = []mov_mean_up4 = []mov_mean_down4 = []mov_mean_up5 = []mov_mean_down5 = []mov_mean_up6 = []mov_mean_down6 = []for i in range(len(v1)):mov_mean_up1.append(mov_mean1[i]*1.1)mov_mean_down1.append(mov_mean1[i]*0.9)mov_mean_up2.append(mov_mean2[i] * 1.1)mov_mean_down2.append(mov_mean2[i] * 0.9)mov_mean_up3.append(mov_mean3[i]*1.1)mov_mean_down3.append(mov_mean3[i]*0.9)mov_mean_up4.append(mov_mean4[i]*1.1)mov_mean_down4.append(mov_mean4[i]*0.9)mov_mean_up5.append(mov_mean5[i]*1.1)mov_mean_down5.append(mov_mean5[i]*0.9)mov_mean_up6.append(mov_mean6[i]*1.1)mov_mean_down6.append(mov_mean6[i]*0.9)has_files = os.path.exists(funnel_name)if not has_files:os.mkdir('./' + funnel_name)line1 = Line("%s pv_total转化率" % funnel_name)line1.add("平均下限", attr, mov_mean_down1,yaxis_min=0,yaxis_max="dataMax",yaxis_name='转化率',is_yaxis_show=True,is_stack=True,is_label_show=True)line1.add(funnel_n[0], attr, v1,yaxis_min=0,yaxis_max="dataMax",yaxis_name='转化率',is_yaxis_show=True,is_stack=True,is_label_show=True)line1.add("平均上限", attr, mov_mean_up1,yaxis_min=0,yaxis_max="dataMax",yaxis_name='转化率',is_yaxis_show=True,is_stack=True,is_label_show=True)line1.render(path='./%s/pv_total转化率.gif' % funnel_name)page.add(line1)line2 = Line("%s uv_total转化率" % funnel_name)line2.add("平均下限", attr, mov_mean_down2,yaxis_min=0,yaxis_max="dataMax",yaxis_name='转化率',is_yaxis_show=True,is_stack=True,is_label_show=True)line2.add(funnel_n[1], attr, v2,yaxis_min=0,yaxis_max="dataMax",yaxis_name='转化率',is_yaxis_show=True,is_stack=True,is_label_show=True)line2.add("平均上限", attr, mov_mean_up2,yaxis_min=0,yaxis_max="dataMax",yaxis_name='转化率',is_yaxis_show=True,is_stack=True,is_label_show=True)line2.render(path='./%s/uv_total转化率.gif' % funnel_name)page.add(line2)line3 = Line("%s uv_m_total转化率" % funnel_name)line3.add("平均下限", attr, mov_mean_down3,yaxis_min=0,yaxis_max="dataMax",yaxis_name='转化率',is_yaxis_show=True,is_stack=True,is_label_show=True)line3.add(funnel_n[2], attr, v3,yaxis_min=0,yaxis_max="dataMax",yaxis_name='转化率',is_yaxis_show=True,is_stack=True,is_label_show=True)line3.add("平均上限", attr, mov_mean_up3,yaxis_min=0,yaxis_max="dataMax",yaxis_name='转化率',is_yaxis_show=True,is_stack=True,is_label_show=True)line3.render(path='./%s/uv_m_total转化率.gif' % funnel_name)page.add(line3)line4 = Line("%s pv_single转化率" % funnel_name)line4.add("平均下限", attr, mov_mean_down4,yaxis_min=0,yaxis_max="dataMax",yaxis_name='转化率',is_yaxis_show=True,is_stack=True,is_label_show=True)line4.add(funnel_n[3], attr, v4,yaxis_min=0,yaxis_max="dataMax",yaxis_name='转化率',is_yaxis_show=True,is_stack=True,is_label_show=True)line4.add("平均上限", attr, mov_mean_up4,yaxis_min=0,yaxis_max="dataMax",yaxis_name='转化率',is_yaxis_show=True,is_stack=True,is_label_show=True)line4.render(path='./%s/pv_single转化率.gif' % funnel_name)page.add(line4)line5 = Line("%s uv_single转化率" % funnel_name)line5.add("平均下限", attr, mov_mean_down5,yaxis_min=0,yaxis_max="dataMax",yaxis_name='转化率',is_yaxis_show=True,is_stack=True,is_label_show=True)line5.add(funnel_n[4], attr, v5,yaxis_min=0,yaxis_max="dataMax",yaxis_name='转化率',is_yaxis_show=True,is_stack=True,is_label_show=True)line5.add("平均上限", attr, mov_mean_up5,yaxis_min=0,yaxis_max="dataMax",yaxis_name='转化率',is_yaxis_show=True,is_stack=True,is_label_show=True)line5.render(path='./%s/uv_single转化率.gif' % funnel_name)page.add(line5)line6 = Line("%s uv_m_single转化率" % funnel_name)line6.add("平均下限", attr, mov_mean_down6,yaxis_min=0,yaxis_max="dataMax",yaxis_name='转化率',is_yaxis_show=True,is_stack=True,is_label_show=True)line6.add(funnel_n[5], attr, v6,yaxis_min=0,yaxis_max="dataMax",yaxis_name='转化率',is_yaxis_show=True,is_stack=True,is_label_show=True)line6.add("平均上限", attr, mov_mean_up6,yaxis_min=0,yaxis_max="dataMax",yaxis_name='转化率',is_yaxis_show=True,is_stack=True,is_label_show=True)line6.render(path='./%s/uv_m_single转化率.gif' % funnel_name)page.add(line6)page.render("./check/%s.html" % funnel_name)

5)转化率的监督平台:Funnel_check

#!/usr/bin/env python
# _*_ UTF-8 _*_
import pandas as pd
# 将某一个漏斗每天的转化率统计成一个点,转化成趋势图,然后呈现在plot上;
# 输入的是每天的转化率,横轴是时间,纵轴是转化率。
# 以漏斗为单位,一个漏斗构建一个检测图,一个漏斗分为最多六个步骤,
# 获取一个漏斗一段时间的转化率,得出对应的趋势图
class Funnel_check(object):def __init__(self, path):self.path = pathdef get_data(self):path = self.pathresult_data = pd.read_excel(path, dtype='O').fillna('nan')return result_datadef get_funnels_name(self):path = self.pathresult_data = pd.read_excel(path, dtype='O').fillna('nan')funnel_names = result_data['funnel_name'].drop_duplicates().tolist()return funnel_names# 计算本漏斗、本节点、每一天的漏斗。# 单步转化率:本漏斗、本节点、每一天的求和。def day_result_calculate_single(self):result_data = self.get_data()funnel_data = result_data[['funnel_name', 'step', 'datetime', 'pv', 'uv', 'uv_m']]sum_pv = funnel_data['pv'].groupby([funnel_data['funnel_name'],funnel_data['step'],funnel_data['datetime']]).sum()sum_uv = funnel_data['uv'].groupby([funnel_data['funnel_name'],funnel_data['step'],funnel_data['datetime']]).sum()sum_uv_m = funnel_data['uv_m'].groupby([funnel_data['funnel_name'],funnel_data['step'],funnel_data['datetime']]).sum()sum_data = pd.DataFrame(pd.concat([sum_pv, sum_uv, sum_uv_m], axis=1)).reset_index()funnel_names = sum_data['funnel_name'].drop_duplicates()date_times = sum_data['datetime'].drop_duplicates()funnel_pv_rates = {}funnel_uv_rates = {}funnel_uv_m_rates = {}for funnel_name in funnel_names:day_funnel_pv_rates = {}day_funnel_uv_rates = {}day_funnel_uv_m_rates = {}for datetime in date_times:data = sum_data[(sum_data['funnel_name'] == funnel_name) & (sum_data['datetime'] == datetime)]# pv计算:pv_rates = []for i in range(1, 7):t1 = data[data['step'] == i + 1]['pv']t2 = data[data['step'] == i]['pv']if (len(t1.values) != 0):pv_rate = float(t1) / float(t2)pv_rates.append(round(pv_rate, 6))else:pv_rate = 0pv_rates.append(pv_rate)day_funnel_pv_rates[datetime] = pv_rates# uv计算:uv_rates = []for i in range(1, 7):t1 = data[data['step'] == i + 1]['uv']t2 = data[data['step'] == i]['uv']if (len(t1.values) != 0):uv_rate = float(t1) / float(t2)uv_rates.append(round(uv_rate, 6))else:uv_rate = 0uv_rates.append(uv_rate)day_funnel_uv_rates[datetime] = uv_rates# uv_m计算:uv_m_rates = []for i in range(1, 7):t1 = data[data['step'] == i + 1]['uv_m']t2 = data[data['step'] == i]['uv_m']if (len(t1.values) != 0):uv_m_rate = float(t1) / float(t2)uv_m_rates.append(round(uv_m_rate, 6))else:uv_m_rate = 0uv_m_rates.append(uv_m_rate)day_funnel_uv_m_rates[datetime] = uv_m_ratesfunnel_pv_rates[funnel_name] = day_funnel_pv_ratesfunnel_uv_rates[funnel_name] = day_funnel_uv_ratesfunnel_uv_m_rates[funnel_name] = day_funnel_uv_m_ratesfunnel_pv_singles = {}funnel_uv_singles = {}funnel_uv_m_singles = {}for day_funnel_pv_rate in funnel_pv_rates.keys():funnel_pv = pd.DataFrame(funnel_pv_rates[day_funnel_pv_rate])funnel_pv_single = funnel_pv.rename({0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},axis='index')funnel_pv_single.index.name = '环节'funnel_pv_single = funnel_pv_single.reset_index()funnel_pv_singles[day_funnel_pv_rate] = funnel_pv_singlefor day_funnel_uv_rate in funnel_uv_rates.keys():funnel_uv = pd.DataFrame(funnel_uv_rates[day_funnel_uv_rate])funnel_uv_single = funnel_uv.rename({0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},axis='index')funnel_uv_single.index.name = '环节'funnel_uv_single = funnel_uv_single.reset_index()funnel_uv_singles[day_funnel_uv_rate] = funnel_uv_singlefor day_funnel_uv_m_rate in funnel_uv_m_rates.keys():funnel_uv_m = pd.DataFrame(funnel_uv_m_rates[day_funnel_uv_m_rate])funnel_uv_m_single = funnel_uv_m.rename({0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},axis='index')funnel_uv_m_single.index.name = '环节'funnel_uv_m_single = funnel_uv_m_single.reset_index()funnel_uv_m_singles[day_funnel_uv_m_rate] = funnel_uv_m_singlereturn funnel_pv_singles, funnel_uv_singles, funnel_uv_m_singles# 汇总转化率:本漏斗、本节点、每一天的求和。def day_result_calculate_total(self):result_data = self.get_data()funnel_data = result_data[['funnel_name', 'step', 'datetime', 'pv', 'uv', 'uv_m']]sum_pv = funnel_data['pv'].groupby([funnel_data['funnel_name'],funnel_data['step'],funnel_data['datetime']]).sum()sum_uv = funnel_data['uv'].groupby([funnel_data['funnel_name'],funnel_data['step'],funnel_data['datetime']]).sum()sum_uv_m = funnel_data['uv_m'].groupby([funnel_data['funnel_name'],funnel_data['step'],funnel_data['datetime']]).sum()sum_data = pd.DataFrame(pd.concat([sum_pv, sum_uv, sum_uv_m], axis=1)).reset_index()funnel_names = sum_data['funnel_name'].drop_duplicates()date_times = sum_data['datetime'].drop_duplicates()funnel_pv_total_rates = {}funnel_uv_total_rates = {}funnel_uv_m_total_rates = {}for funnel_name in funnel_names:day_funnel_pv_total_rates = {}day_funnel_uv_total_rates = {}day_funnel_uv_m_total_rates = {}for datetime in date_times:data = sum_data[(sum_data['funnel_name'] == funnel_name) & (sum_data['datetime'] == datetime)]# pv计算:pv_rates = []for i in range(1, 7):t1 = data[data['step'] == i + 1]['pv']t2 = data[data['step'] == 1]['pv']if (len(t1.values) != 0):pv_rate = float(t1) / float(t2)pv_rates.append(round(pv_rate, 6))else:pv_rate = 0pv_rates.append(pv_rate)day_funnel_pv_total_rates[datetime] = pv_rates# uv计算:uv_rates = []for i in range(1, 7):t1 = data[data['step'] == i + 1]['uv']t2 = data[data['step'] == 1]['uv']if (len(t1.values) != 0):uv_rate = float(t1) / float(t2)uv_rates.append(round(uv_rate, 6))else:uv_rate = 0uv_rates.append(uv_rate)day_funnel_uv_total_rates[datetime] = uv_rates# uv_m计算:uv_m_rates = []for i in range(1, 7):t1 = data[data['step'] == i + 1]['uv_m']t2 = data[data['step'] == 1]['uv_m']if (len(t1.values) != 0):uv_m_rate = float(t1) / float(t2)uv_m_rates.append(round(uv_m_rate, 6))else:uv_m_rate = 0uv_m_rates.append(uv_m_rate)day_funnel_uv_m_total_rates[datetime] = uv_m_ratesfunnel_pv_total_rates[funnel_name] = day_funnel_pv_total_ratesfunnel_uv_total_rates[funnel_name] = day_funnel_uv_total_ratesfunnel_uv_m_total_rates[funnel_name] = day_funnel_uv_m_total_ratesfunnel_pv_totals = {}funnel_uv_totals = {}funnel_uv_m_totals = {}for day_funnel_pv_rate in funnel_pv_total_rates.keys():funnel_pv = pd.DataFrame(funnel_pv_total_rates[day_funnel_pv_rate])funnel_pv_total = funnel_pv.rename({0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},axis='index')funnel_pv_total.index.name = '环节'funnel_pv_total = funnel_pv_total.reset_index()funnel_pv_totals[day_funnel_pv_rate] = funnel_pv_totalfor day_funnel_uv_rate in funnel_uv_total_rates.keys():funnel_uv = pd.DataFrame(funnel_uv_total_rates[day_funnel_uv_rate])funnel_uv_total = funnel_uv.rename({0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},axis='index')funnel_uv_total.index.name = '环节'funnel_uv_total = funnel_uv_total.reset_index()funnel_uv_totals[day_funnel_uv_rate] = funnel_uv_totalfor day_funnel_uv_m_rate in funnel_uv_m_total_rates.keys():funnel_uv_m = pd.DataFrame(funnel_uv_m_total_rates[day_funnel_uv_m_rate])funnel_uv_m_total = funnel_uv_m.rename({0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},axis='index')funnel_uv_m_total.index.name = '环节'funnel_uv_m_total = funnel_uv_m_total.reset_index()funnel_uv_m_totals[day_funnel_uv_m_rate] = funnel_uv_m_totalreturn funnel_pv_totals, funnel_uv_totals, funnel_uv_m_totals# 单个漏斗的数据:本漏斗、本节点、每一天的求和。def day_data_combine_funnel(self, funnel_name):funnel_pv_totals, funnel_uv_totals, funnel_uv_m_totals = self.day_result_calculate_total()funnel_pv_singles, funnel_uv_singles, funnel_uv_m_singles = self.day_result_calculate_single()# 会产生六个字典:funnel_data = []funnel_pv_total = funnel_pv_totals[funnel_name]funnel_uv_total = funnel_uv_totals[funnel_name]funnel_uv_m_total = funnel_uv_m_totals[funnel_name]funnel_pv_single = funnel_pv_singles[funnel_name]funnel_uv_single = funnel_uv_singles[funnel_name]funnel_uv_m_single = funnel_uv_m_singles[funnel_name]funnel_data.append(funnel_pv_total)funnel_data.append(funnel_uv_total)funnel_data.append(funnel_uv_m_total)funnel_data.append(funnel_pv_single)funnel_data.append(funnel_uv_single)funnel_data.append(funnel_uv_m_single)return funnel_data

6)主函数main:

#!/usr/bin/env python
# _*_ UTF-8 _*_from Funnel_livan import Funnel_build, Funnel_plot, Funnel_data, Funnel_checkif __name__ == '__main__':# 1、sql拼写过程。# path = '/Users/livan/PycharmProjects/offices/data/train_data.xlsx'# funnels = Funnel_build.Funnel_build(path=path)# sqls = funnels.built_sql()# with open('sqls.sql', 'w+') as f:#     f.write(sqls)# 2、获取数据,计算转化率,求所有天的漏斗:# path = '/Users/livan/PycharmProjects/offices/data/result_data.xlsx'# result = Funnel_data.Funnel_data(path=path)# funnels_name = result.get_funnels_name()# for i in range(len(funnels_name)):#     funnel_pv, funnel_uv, funnel_uv_m = result.data_combine_funnel(funnels_name[i])#     # 经过上一步共生成六组需要计算的漏斗:#     funnel_pv_s = funnel_pv[['环节', '单一环节转化率']]#     funnel_uv_s = funnel_uv[['环节', '单一环节转化率']]#     funnel_uv_m_s = funnel_uv_m[['环节', '单一环节转化率']]#     funnel_pv_t = funnel_pv[['环节', '总体转化率']]#     funnel_uv_t = funnel_uv[['环节', '总体转化率']]#     funnel_uv_m_t = funnel_uv_m[['环节', '总体转化率']]#     funnel_d = [funnel_pv_s,#                 funnel_uv_s,#                 funnel_uv_m_s,#                 funnel_pv_t,#                 funnel_uv_t,#                 funnel_uv_m_t]#     Funnel_plot.Funnel_plot(funnels_name[i], funnel_d).draw_plot()# 3、转化率趋势分析,求每天的漏斗:path = '/Users/livan/PycharmProjects/offices/data/result_data.xlsx'result = Funnel_check.Funnel_check(path=path)funnels_name = result.get_funnels_name()for i in range(len(funnels_name)):funnel_data = result.day_data_combine_funnel(funnels_name[i])Funnel_plot.Funnel_plot(funnels_name[i], funnel_data).check_plot()

生成的HQL为:

SELECT'login' funnel_name,'20190203' funnel_time_start,'20190305' funnel_time_end,'login_page' funnel_node_name,'1' step,'page_level' event_type,'页面级别' event_name,'5' action,'page_login_1' page_id,'login_page' page_name,'nan' product_id,'nan' inner_id,'nan' push_id,'nan' last_page_id,'nan' label_id,'nan' label,'nan' context_id,'nan' context,t.dt datetime,count(1) pv,size(collect_set(t.becif_no)) uv,count(distinct t.mid) uv_m
FROM mid.tracker_action_event t
WHEREt.dt='20190203'
and  t.page_id='page_login_1'
union all……

文中的图形展示使用的pyecharts,方便好用,下面会做一些相应的介绍。

网站分析01——转化率分析(漏斗构建)相关推荐

  1. 数据运营-计算留存率和转化率(漏斗分析Python)

    一.案例数据 在数据运营中,留存率分析和转化率(漏斗)分析是经常用到的,本文结合具体案例总结了如何利用python求n日留存率以及各环节间转化率. 指标释义 案例数据集介绍: 本文是利用淘宝app的运 ...

  2. Z05 - 006、网站转化以及漏斗分析(转化分析)

    初学耗时:0.5h 注:CSDN手机端暂不支持章节内链跳转,但外链可用,更好体验还请上电脑端. 一.网站转化以及漏斗分析(转化分析)   1.1  转化中阻力的流失.   1.2  访问者的迷失. 记 ...

  3. 【商业分析 01】商业分析网站汇总

    1.产品用户信息 艾瑞数据 易观智库 TalkingData(移动观象台) 腾讯大数据 2.产品版本信息 ASO100(七麦数据) App Annie 3.行业报告分析 虎嗅 爱范儿 凯度 家电行业: ...

  4. 网站日志采集和分析流程

    目录 1. 网站分析意义 2. 如何进行网站分析 3.  整体技术流程及架构 3.1. 数据处理流程 3.2. 系统的架构 4.  模块开发----数据采集 5.  数据采集之 js 自定义采集 1. ...

  5. 【网络营销】 ——网站搜索引擎友好型分析

    网站搜索引擎友好型分析 本实验所选的网站是苏宁易购公司网站www.suning.com,分别在搜狗和百度上进行搜索.分析苏宁易购网站搜索引擎友好性并对调查结果提出合理化建议. 一.苏宁易购公司简介 苏 ...

  6. linux1.0内核下载,《Linux 0.01 内核分析与操作系统设计》(Linxu 0.01Source)

    中文名: Linux_0_01_内核分析与操作系统设计 英文名: Linxu 0.01Source 发行时间: 2003年 地区: 大陆 对白语言: 普通话 简介: 清华 卢军<Linux0.0 ...

  7. SEO小程:医院网站优化之如何分析竞争对手的网站

    SEO小程:医院网站优化之如何分析竞争对手的网站 网站优化 , 搜索结果 , 网络营销 , 搜索引擎 , 谷歌搜索 原创分享 重要程度 ★★★★★ 医院 网站优化 之如何分析竞争对手的网站 :做医院网 ...

  8. Python 爬虫进阶必备——某体育网站登录令牌加密分析,赶紧收藏哦!

    某体育网站登录令牌加密分析 aHR0cHMlM0EvL3d3dy55YWJvMjU5LmNvbS9sb2dpbg== 这个网站需要分析的是登录时候的 sign令牌 抓包与加密定位 老规矩先用开发者工具 ...

  9. php mysql索引原理_加速PHP动态网站 关于MySQL索引分析优化

    本文主要讲述了如何加速动态网站的MySQL索引分析和优化. 一.什么是索引? 索引用来快速地寻找那些具有特定值的记录,所有MySQL索引都以B-树的形式保存.如果没有索引,执行查询时MySQL必须从第 ...

最新文章

  1. kotlin内联函数let、with、run、apply、also
  2. STM32开发 -- IAP详解
  3. 我的算法学习(一)----数组的全排列
  4. 微信公众号文章中图片加载时,占位图宽高大小的确定
  5. 【C++grammar】左值、右值和将亡值
  6. python目录名称无效怎么处理_Python目录和文件处理总结详解
  7. 关于Hive中case when不准使用子查询的解决方法
  8. centos下docker无法正常启动检查与解决方法
  9. 正则表达式驼峰标示转下划线
  10. 新浪微博Emoji表情解析
  11. 人工智能-八数码问题-启发式搜索
  12. 新浪微博 mysql_新浪微博,腾讯微博mysql数据库主表猜想
  13. 淘宝天猫页面详情采集API调用展示(APP端商品详情)
  14. ubuntu 使用 fdisk 磁盘分区
  15. 如何使用jquery插件
  16. html怎么引用网页链接,网页中各种链接引用方法小结
  17. 网络存储服务器接显示器,Unraid下,单核显IGPU实现win10外接显示屏,显卡成功驱动...
  18. oracle orclpdb是什么,oracle cdb、pdb参考
  19. x123.fun gf.php,hao123网址源码下载,hao123源码php版带后台 v2.1
  20. GNSS导航电文模拟生成软件介绍

热门文章

  1. Oracle补丁快速下载的途径
  2. python+django共享汽车租赁管理系统pycharm
  3. mysql 分区的作用_MySQL分区的优点
  4. 笔记本各型号CPU性能比较
  5. Perl中chomp和chop的用法和区别介绍
  6. KVM虚拟化详解以及如何创建KVM虚拟机
  7. LeetCode / Scala - 无重复字符最长子串 ,最长回文子串
  8. matlab语音停止程序,MATLAB语音信号处理程序
  9. “百合杯“表彰典礼即将于电影频道播出
  10. CentOS挂载新硬盘