SQL、Hive场景题及答案

点赞收藏呦!

默认mysql老版本没有支持,在最新的8.0版本中支持, Oracle和Hive中都支持窗口函数,下列题目皆在Hive中实现

第一题:

了解哪些窗口函数,都是什么功能?找一个在某个业务中的应用? 手写窗口函数及功能意义,同时随便写一个带窗口函数的sql,并说明其sql的含义。

窗口函数:通常格式为 可用函数+over()函数

-- 查询所有明细
select * from t_order;
# 查询总量
select count(*) from t_order;
等价于:
select *, count(*) over() from t_order;

2、求出每个栏目的被观看次数及累计观看时长?

数据: video表

uid channel min
1 1 23
2 1 12
3 1 12
4 1 32
5 1 342
6 2 13
7 2 34
8 2 13
9 2 134
create table video(
uid int,
channel string,
min int
)
row format delimited
fields terminated by ' '
;
load data local inpath './data/video' into table video;

答案:

select channel,count(*) count,sum(min) total from video group by channel;

3、编写sql实现

数据:

userid,month,visits
A,2015-01,5
A,2015-01,15
B,2015-01,5
A,2015-01,8
B,2015-01,25
A,2015-01,5
A,2015-02,4
A,2015-02,6
B,2015-02,10
B,2015-02,5
A,2015-03,16
A,2015-03,22
B,2015-03,23
B,2015-03,10
B,2015-03,1
drop table visits;
create table visits(
userid string,
month string,
visits int
)
row format delimited
fields terminated by ','
;
load data local inpath './hivedata/visits' overwrite into table visits;

完成需求:每个用户截止到每月为止的最大单月访问次数和累计到该月的总访问次数,结果数据格式如下:

+---------+----------+---------+-------------+---------------+--+
| userid  |  month   | visits  | max_visits  | total_visits  |
+---------+----------+---------+-------------+---------------+--+
| A       | 2015-01  | 33      | 33          | 33            |
| A       | 2015-02  | 10      | 33          | 43            |
| A       | 2015-03  | 38      | 38          | 81            |
| B       | 2015-01  | 30      | 30          | 30            |
| B       | 2015-02  | 15      | 30          | 45            |
| B       | 2015-03  | 34      | 34          | 79            |
+---------+----------+---------+-------------+---------------+--+
select userid,month,
visits,
max(visits) over(distribute by userid sort by month) max_visits,
sum(visits) over(distribute by userid sort by month) total_visits
from (
select userid,month,
sum(visits) visits
from visits
group by userid,month
) t;

4、编写连续7天登录的总人数:

数据: t1表

Uid dt login_status(1登录成功,0异常)
1 2019-07-11 1
1 2019-07-12 1
1 2019-07-13 1
1 2019-07-14 1
1 2019-07-15 1
1 2019-07-16 1
1 2019-07-17 1
1 2019-07-18 1
2 2019-07-11 1
2 2019-07-12 1
2 2019-07-13 0
2 2019-07-14 1
2 2019-07-15 1
2 2019-07-16 0
2 2019-07-17 1
2 2019-07-18 0
2 2019-07-19 1
2 2019-07-20 0
2 2019-07-21 1
2 2019-07-22 0
2 2019-07-23 1
2 2019-07-24 0
3 2019-07-11 1
3 2019-07-12 1
3 2019-07-13 1
3 2019-07-14 1
3 2019-07-15 1
3 2019-07-16 1
3 2019-07-17 1
3 2019-07-18 1
drop table login;
create table login(
Uid int,
dt string,
login_status int
)
row format delimited
fields terminated by ' '
;load data local inpath './hivedata/login' into table login;
select count(*)
from
(
select distinct uid
from(
select uid,dt,lag(dt,6) over(partition by uid order by dt) pre_dt,
sum(login_status) over(partition by uid order by dt rows between 6 preceding and current row) total
from login
) t
where date_sub(dt,6)=pre_dt and t.total=7) t1;

5、你知道的排名函数有哪些?说一说它们之间的区别? 文字说明即可

  • row_number从1开始,按照顺序,生成分组内记录的序列,row_number()的值不会存在重复,当排序的值相同时,按照表中记录的顺序进行排列

  • RANK() 生成数据项在分组中的排名,排名相等会在名次中留下空位

  • DENSE_RANK() 生成数据项在分组中的排名,排名相等会在名次中不会留下空位

6、编写sql语句实现每班前三名,分数一样不并列,同时求出前三名按名次排序的一次的分差:

数据: stu表

Stu_no class score
1   1901    90
2   1901    90
3   1901    83
4   1901    60
5   1902    66
6   1902    23
7   1902    99
8   1902    67
9   1902    87
drop table stu;
create table stu(
Stu_no int,
class string,
score int
)
row format delimited
fields terminated by '\t'
;load data local inpath './hivedata/stu' into table stu;

编写sql实现,结果如下:

+--------+---------+--------+-----+----------+--+
| class  | stu_no  | score  | rn  | rn_diff  |
+--------+---------+--------+-----+----------+--+
| 1901   | 2       | 90     | 1   | 90       |
| 1901   | 1       | 90     | 2   | 0        |
| 1901   | 3       | 83     | 3   | -7       |
| 1902   | 7       | 99     | 1   | 99       |
| 1902   | 9       | 87     | 2   | -12      |
| 1902   | 8       | 67     | 3   | -20      |
+--------+---------+--------+-----+----------+--+
select class,stu_no,score,rn,rn_diff
from(
select class,stu_no,score,
row_number() over(partition by class order by score desc) rn,
score-nvl(lag(score,1) over(partition by class order by score desc),0) rn_diff
from stu
) t
where t.rn<4;

7、对于行列互换,你有哪些解决方式,详细说明每一种方式? 使用语言描述即可

collect_set,collect_list,explode.....

8、编写sql实现行列互换。数据如下:

id sid subject int
1,001,语文,90
2,001,数学,92
3,001,英语,80
4,002,语文,88
5,002,数学,90
6,002,英语,75.5
7,003,语文,70
8,003,数学,85
9,003,英语,90
10,003,政治,82

编写sql实现,得到结果如下:

+---------+--------+--------+--------+--------+-----------+--+
| sid     | 语文   | u2.数学  | u2.英语  | u2.政治  | u2.total  |
+---------+--------+--------+--------+--------+-----------+--+
| 001     | 90.0   | 92.0   | 80.0   | 0.0    | 262.0     |
| 002     | 88.0   | 90.0   | 75.5   | 0.0    | 253.5     |
| 003     | 70.0   | 85.0   | 90.0   | 82.0   | 327.0     |
| total   | 248.0  | 267.0  | 245.5  | 82.0   | 842.5     |
+---------+--------+--------+--------+--------+-----------+--+
drop table score;
create table score(
id int,
sid string,
subject string,
score double
)
row format delimited
fields terminated by ','
;load data local inpath './hivedata/score' into table score;
select
sid,
sum(if(subject="语文",score,0)) as `语文`,
sum(if(subject="数学",score,0)) as `数学`,
sum(case when subject="英语" then score else 0 end) as `英语`,
sum(case when subject="政治" then score else 0 end) as `政治`,
sum(score) total
from score
group by sid
union
select "total",sum(`语文`),sum(`数学`),sum(`英语`),sum(`政治`),sum(total) from
(
select
sid,
sum(if(subject="语文",score,0)) as `语文`,
sum(if(subject="数学",score,0)) as `数学`,
sum(case when subject="英语" then score else 0 end) as `英语`,
sum(case when subject="政治" then score else 0 end) as `政治`,
sum(score) total
from score
group by sid
)t;

9、编写sql实现如下:

数据: t1表

uid tags
1   1,2,3
2   2,3
3   1,2

编写sql实现如下结果:

uid tag
1   1
1   2
1   3
2   2
2   3
3   1
3   2
create table t1(
uid int,
tags string
)
row format delimited
fields terminated by '\t'
;load data local inpath './hivedata/t1' into table t1;
select uid,tag
from t1 lateral view explode(split(tags,",")) t  as tag;

10、行转列

数据: T2表:

Tags
1,2,3
1,2
2,3

T3表:

id lab
1 A
2 B
3 C

根据T2和T3表的数据,编写sql实现如下结果:

+--------+--------+--+
|  tags  |  labs  |
+--------+--------+--+
| 1,2    | A,B    |
| 1,2,3  | A,B,C  |
| 2,3    | B,C    |
+--------+--------+--+
create table t2(
tags string
);
load data local inpath './hivedata/t2' overwrite into table t2;create table t3(
id int,
lab string
)
row format delimited
fields terminated by ' '
;load data local inpath './hivedata/t3' overwrite into table t3;
select tags,
concat_ws(",",collect_set(lab)) labs
from
(select tags,lab
from
(select tags,tag
from t2 lateral view explode(split(tags,",")) A as tag) B
join t3 on B.tag = t3.id) C
group by tags;

11、行转列

数据: t4表:

id tag flag
a b 2
a b 1
a b 3
c d 6
c d 8
c d 8

编写sql实现如下结果:

id tag flag
a b 1|2|3
c d 6|8
create table t4(
id string,
tag string,
flag int
)
row format delimited
fields terminated by ' '
;load data local inpath './hivedata/t4' overwrite into table t4;
select id,tag,concat_ws("|",collect_set(cast(flag as string))) from t4 group by id,tag;

12、列转行

数据: t5表

uid name tags
1   goudan  chihuo,huaci
2   mazi    sleep
3   laotie  paly

编写sql实现如下结果:

uid name tag
1 goudan chihuo
1 goudan huaci
2 mazi sleep
3 laotie paly
create table t5(
uid string,
name string,
tags string
)
row format delimited
fields terminated by '\t' ;load data local inpath './hivedata/t5' overwrite into table t5;
select
uid,
name,
tag
from t5
lateral view explode(split(tags,",")) t1 as tag ;

13、行转列

数据: content表:

uid contents
1   i|love|china
2   china|is|good|i|i|like

统计结果如下,如果出现次数一样,则按照content名称排序:

+----------+------+--+
| content  | num  |
+----------+------+--+
| i        | 3    |
| china    | 2    |
| good     | 1    |
| is       | 1    |
| like     | 1    |
| love     | 1    |
+----------+------+--+
drop table content;
create table content(
uid int,
contents string
)
row format delimited
fields terminated by '\t'
;
load data local inpath './hivedata/content' overwrite into table content;
select content,count(1) num
from content lateral view explode(split(contents,"\\|")) tmp as content
group by content
order by num desc,content;

14、列转行

数据: course1表

id course
1,a
1,b
1,c
1,e
2,a
2,c
2,d
2,f
3,a
3,b
3,c
3,e

根据编写sql,得到结果如下(表中的1表示选修,表中的0表示未选修):

+-----+----+----+----+----+----+----+--+
| id  | a  | b  | c  | d  | e  | f  |
+-----+----+----+----+----+----+----+--+
| 1   | 1  | 1  | 1  | 0  | 1  | 0  |
| 2   | 1  | 0  | 1  | 1  | 0  | 1  |
| 3   | 1  | 1  | 1  | 0  | 1  | 0  |
+-----+----+----+----+----+----+----+--+
create table course(
id int,
course string
)
row format delimited
fields terminated by ','
;load data local inpath './hivedata/course' overwrite into table course;
select
id,
sum(case when c.course="a" then 1 else 0 end) as `a`,
sum(case when c.course="b" then 1 else 0 end) as `b`,
sum(case when c.course="c" then 1 else 0 end) as `c`,
sum(case when c.course="d" then 1 else 0 end) as `d`,
sum(case when c.course="e" then 1 else 0 end) as `e`,
sum(case when c.course="f" then 1 else 0 end) as `f`
from course c
group by id ;

15、时间戳函数:unix_timestamp,from_unixtime

获取当前时间戳:

select unix_timestamp();

获取"2019-07-31 11:57:25"对应的时间戳:

select unix_timestamp("2019-07-31 11:57:25");

获取"2019-07-31 11:57"对应的时间戳:

select unix_timestamp("2019-07-31 11:57","yyyy-MM-dd HH:mm");

获取时间戳:1564545445所对应的日期和时分秒:

select from_unixtime(1564545445);

获取时间戳:1564545446所对应的日期和小时(yyyy/MM/dd HH):

select from_unixtime(1564545445,"yyyy/mm/dd HH");

16、时间格式转换:yyyyMMdd -> yyyy-MM-dd

数据: dt表

20190730
20190731

编写sql实现如下的结果:

2019-07-30
2019-07-31
create table dt(
dt string
);load data local inpath './hivedata/dt' overwrite into table dt;
select from_unixtime(unix_timestamp(dt,"yyyyMMdd"),"yyyy-MM-dd") from dt;

17、数据: 店铺,月份,金额

sid month money
a,01,150
a,01,200
b,01,1000
b,01,800
c,01,250
c,01,220
b,01,6000
a,02,2000
a,02,3000
b,02,1000
b,02,1500
c,02,350
c,02,280
a,03,350
a,03,250
create table store(
sid string,
month string,
money int
)
row format delimited
fields terminated by ','
;load data local inpath './hivedata/store' overwrite into table store;

编写Hive的HQL语句求出每个店铺的当月销售额和累计到当月的总销售额?

select sid,month,sales,
sum(sales) over(partition by sid order by month) total
from (
select sid,month,sum(money) sales from store group by sid,month
) t;

18、Hive是否发生过数据倾斜,怎么处理的,原理是什么?


19、Hive中什么时候使用过array和map,为什么使用?


20、使用sql编写下面的需求:

Table A 是一个用户登陆时间记录表,当月每次登陆一次会记录一条记录。A表如下:

log_time uid
2018-10-01 12:34:11 123
2018-10-02 13:21:08 123
2018-10-02 14:21:08 123
2018-10-02 14:08:09 456
2018-10-04 05:10:22 123
2018-10-04 21:38:38 456
2018-10-04 22:38:38 456
2018-10-05 09:57:32 123
2018-10-06 13:22:56 123
2018-11-01 12:34:11 123
2018-11-02 13:21:08 123
2018-11-02 14:21:08 123
2018-11-02 14:08:09 456
2018-11-04 05:10:22 123
2018-11-04 21:38:38 456
2018-11-05 09:57:32 123
2018-11-06 13:22:56 123

需计算出每个用户本月最大连续登陆天数。如表A样例数据中,用户123最大连续登陆天数为3,而用户456最大连续登陆天数为1

drop table login_time;
create table login_time(
log_time timestamp,
uid string
)
row format delimited
fields terminated by '\t';load data local inpath './hivedata/login_time' overwrite into table login_time;-- 注意:可能需要对原始数据做清洗,保证每个用户每天只有一条登录信息
select uid,max(cnt)
from
(select uid,date_format(dt,"yyyy-MM"),count(1) cnt
from
(select uid,dt,date_sub(dt,row_number() over(partition by uid order by dt)) grp_dt
from
(
select distinct uid,date_format(log_time,"yyyy-MM-dd") as dt
from login_time
)a
)b group by uid,grp_dt,date_format(dt,"yyyy-MM")
)c
group by uid;

21、介绍一下LAG和LEAD函数:

  • lag返回当前数据行的前第n行的数据
  • lead返回当前数据行的后第n行的数据

需求:查询顾客下次购买的时间

select name,orderdate,cost,lead(orderdate,1) over(partition by name order by orderdate ) as time1
from t_order;

22、使用hive求出两个数据集的差集?

数据 diff_t1表:

id name
1 zs
2 ls

diff_t2表:

id name
1 zs
3 ww

结果如下:

id name
2 ls
3 ww
create table diff_t1(
id string,
name string
)
row format delimited
fields terminated by ' '
;load data local inpath './hivedata/diff_t1.txt' overwrite into table diff_t1;
create table diff_t2(
id string,
name string
)
row format delimited
fields terminated by ' '
;load data local inpath './hivedata/diff_t2.txt' overwrite into table diff_t2;
select t1.id id, t1.name name
from diff_t1 t1
left join diff_t2 t2 on t1.id=t2.id
where t2.id is null
union
select t2.id id, t2.name name
from diff_t1 t1
right join diff_t2 t2 on t1.id=t2.id
where t1.id is null ;

23、使用hive的hql实现如下需求

现有某网站购买记录字段如下

orderid,userid,productid,price,timestamp,date
121,张三,3,100,1535945356,2018-08-07
122,张三,3,200,1535945356,2018-08-08
123,李四,3,200,1535945356,2018-08-08
124,王五,1,200,1535945356,2018-08-08
125,张三,3,200,1535945356,2018-08-09
126,张三,2,200,1535945356,2018-08-09
127,李四,3,200,1535945356,2018-08-09
128,李四,3,200,1535945356,2018-08-10
129,李四,3,200,1535945356,2018-08-11

用sql统计今日及昨日都购买过商品productid为3的用户及其昨日消费。

drop table product;
create table product(
orderid string,
userid string,
productid int,
price int,
tamp int,
dt date
)
row format delimited
fields terminated by ',';load data local inpath './hivedata/product.txt' overwrite into table product;
select *
from(
select userid,dt,
lag(dt,1) over(partition by userid order by dt) yestoday,
case when
datediff(dt,lag(dt,1) over(partition by userid order by dt))=1
then lag(total_price) over(partition by userid order by dt)
else null end yestoday_price
from
(
select userid,dt,productid,sum(price) over(partition by userid,dt order by dt) total_price
from product order by userid,dt
)A
where productid = 3
)B where yestoday_price is not null;

24、使用hive的hql如下:

表user_action_log用户行为故据

uid   time   action
1   Time1   Read
3   Time2   Comment
1   Time3   Share
2   Time4   Like
1   Time5   Write
2   Time6   like
3   Time7   Write
2   Time8   Read

分析用户行为习惯找到毎一个用户在表中的第一次行为

drop table user_action_log;
create table user_action_log(
uid int,
time string,
action string
)
row format delimited
fields terminated by '\t';load data local inpath './hivedata/user_action_log.txt' overwrite into table user_action_log;

使用代码实现

select uid,time,action
from (
select uid,time,action,row_number() over(partition by uid order by time) rn
from user_action_log
)A
where rn = 1;

25、每个用户连续登陆的最大天数?

数据: user_login表

uid,dt
1,2019-08-01
1,2019-08-02
1,2019-08-03
2,2019-08-01
2,2019-08-02
3,2019-08-01
3,2019-08-03
4,2019-07-28
4,2019-07-29
4,2019-08-01
4,2019-08-02
4,2019-08-03

结果如下:

uid cnt_days
1 3
2 2
3 1
4 3
create table user_login(
uid int,
dt date
)
row format delimited
fields terminated by ',';load data local inpath './hivedata/user_login.txt' overwrite into table user_login;
select uid,max(cnt)
from
(select uid,dt_sub,count(1) cnt
from
(select uid,dt,
date_sub(dt,row_number() over(partition by uid order by dt)) dt_sub
from user_login
)A
group by uid, dt_sub
)B group by uid;

26、请使用sql计算pv、uv?

数据:

t1表

uid dt url
1 2019-08-06 http://www.baidu.com
2 2019-08-06 http://www.baidu.com
3 2019-08-06 http://www.baidu.com
3 2019-08-06 http://www.soho.com
3 2019-08-06 http://www.meituan.com
3 2019-08-06

结果如下:

dt uv pv
2019-08-6 3 5
create table user_net_log(
uid int,
dt date,
url string
)
row format delimited
fields terminated by ' ';load data local inpath './hivedata/user_net_log.txt' overwrite into table user_net_log;
select dt,count(distinct uid),count(url)
from user_net_log
group by dt;

27、hive中coalease()、nvl()、concat_ws()、collect_list()、collect_set()、regexp_replace().这几个函 数的意义?

自行扩展吧

28、有表如下记录了智智同学玩某moba游戏每局的英雄pick情况

pk_moba表

id   names
1   亚索,挖据机,艾瑞莉亚,洛,卡莎
2   亚索,盖伦,奥巴马,牛头,皇子
3   亚索,盖伦,艾瑞莉亚,宝石,琴女
4   亚索,盖伦,赵信,老鼠,锤石

请用HiveSQL计算出出场次数最多的top3英雄及其Pick率(=出现场数/总场数)

create table pk_moba(
id int,
names array<string>
)
row format delimited
fields terminated by '\t'
collection items terminated by ',';load data local inpath './hivedata/pk_moba.txt' overwrite into table pk_moba;
select name,cnt,rn,concat(round(cnt/4*100,0),"%") lv
from
(
select name,cnt,dense_rank() over(sort by cnt desc) rn
from (
select name,count(name) cnt
from pk_moba lateral view explode(names) tmp as name
group by name
)A
)B where rn<4;

29、有三个表,分别是:

区域(district) 区域中有两个字段分别是区域Id(disid)和区域名称(disname)

城市(city) 城市有两个字段分别是城市ID(cityid)和区域ID(disid)

订单(order) 订单有四个字段分别是订单ID(orderid)、用户ID(userid)、城市ID(cityid)和消费金额(amount)。

district表:

disid disname
1 华中
2 西南create table district(
disid int,
disname string
)
row format delimited
fields terminated by ' ';load data local inpath './hivedata/district.txt' overwrite into table district;

city表:

cityid disid
1 1
2 1
3 2
4 2
5 2create table city(
cityid int,
disid int
)
row format delimited
fields terminated by ' ';load data local inpath './hivedata/city.txt' overwrite into table city;

order表:

oid userid cityid amount
1 1 1 1223.9
2 1 1 9999.9
3 2 2 2322
4 2 2 8909
5 2 3 6789
6 2 3 798
7 3 4 56786
8 4 5 78890create table order_t(
oid int,
userid int,
cityid int,
amount float
)
row format delimited
fields terminated by ' ';load data local inpath './hivedata/order.txt' overwrite into table order_t;

高消费者是消费金额大于1W的用户,使用hive hql生成如下报表:

区域名 高消费者人数 消费总额

select disname,count(1),sum(amount)
from(
select disname,userid,sum(amount) amount from district A join city B on A.disid = B.disid
join order_t C on B.cityid = C.cityid
group by disname,userid
)D
where amount>10000
group by disname;

30、某APP每天访问数据存放在表access_log里面,包含日期字段ds,用户类型字段user_type,用户账 号user_id,用户访问时间log_time,请使用hive的hql语句实现如下需求:

(1)、每天整体的访问UV、PV?

select log_time,count(distinct user_id) uv,count(1) pv
from access_log
group by log_time

(2)、每天每个类型的访问UV、PV?

select log_time,user_type,count(distinct user_id) uv,count(1) pv
from access_log
group by log_time;

(3)、每天每个类型中最早访问时间和最晚访问时间?

select log_time,user_type,min(log_time),max(log_time)
from access_log
group by log_time;

(4)、每天每个类型中访问次数最高的10个用户?

select dt,user_type,cnt,rn
from(
select dt,user_type,cnt,row_number() over(partition by dt,user_type order by cnt desc) rn
from
(select substr(log_time,1,7) dt,user_type,count(1) cnt
from access_log
group by substr(log_time,1,7),user_type
)A
)B where rn <11;

31、描述序列函数NTILE

它把有序的数据集合 平均分配指定的数量(num)个桶中, 将桶号分配给每一行。如果不能平均分配,则优先分配较小编号的桶,并且各个桶中能放的行数最多相差1。

例子:

select name,orderdate,cost,ntile(3) over(partition by name) # 按照name进行分组,在分组内将数据切成3份
from t_order;
mart 2018-04-13  94  1
mart    2018-04-11  75  1
mart    2018-04-09  68  2
mart    2018-04-08  62  3
neil    2018-06-12  80  1
neil    2018-05-10  12  2
saml    2018-01-01  10  1
saml    2018-01-08  55  1
saml    2018-04-06  42  2
saml    2018-01-05  46  2
saml    2018-02-03  23  3
tony    2018-01-07  50  1
tony    2018-01-02  15  2
tony    2018-01-04  29  3

32、有如下三张表:

表login_a(登录表):

ds user_id
2019-08-06 1
2019-08-06 2
2019-08-06 3
2019-08-06 4create table login_a(
ds date,
user_id int
)
row format delimited
fields terminated by ' ';load data local inpath './hivedata/login_a.txt' overwrite into table login_a;

表read_b(阅读表):

ds user_id read_num
2019-08-06 1 2
2019-08-06 2 3
2019-08-06 3 6create table read_b(
ds date,
user_id int,
read_num int
)
row format delimited
fields terminated by ' ';load data local inpath './hivedata/read_b.txt' overwrite into table read_b;

表cost_c(付费表):

ds user_id price
2019-08-06 1 55.6
2019-08-06 2 55.8create table cost_c(
ds date,
user_id int,
price float
)
row format delimited
fields terminated by ' ';load data local inpath './hivedata/cost_c.txt' overwrite into table cost_c;

基于上述三张表,请使用hive的hql语句实现如下需求:

(1)、用户登录并且当天有个阅读的用户数,已经阅读书籍数量

select A.ds,count(distinct A.user_id),sum(B.read_num)
from login_a A join read_b B on A.user_id = B.user_id and A.ds = B.ds
group by A.ds;

(2)、用户登录并且阅读,但是没有付费的用户数

select A.ds,count(1)
from login_a A join read_b B on A.user_id = B.user_id and A.ds = B.ds
left join cost_c C on B.user_id = C.user_id and B.ds = C.ds
where C.price is null
group by A.ds;

(3)、用户登录并且付费,付费用户数量和金额总数

select A.ds,count(1),sum(price)
from login_a A join read_b B on A.user_id = B.user_id and A.ds = B.ds
left join cost_c C on B.user_id = C.user_id and B.ds = C.ds
group by A.ds;

33、hive的hql中,left outer join和left semi join的区别?

left semi join,我们称之为半开连接。它是left join的一种优化形式,只能查询左表的信息,主要用于解决hive中左表的数据是否存在的问题

34、有一个订单表和渠道表,结构如下:

create table order(
order_id long,
user_id long comment '用户id',
amount double comment '订单金额',
channel string comment '渠道',
time string comment '订单时间,yyyy-MM-dd HH:mi:ss'
)
partition by (dt string comment '天,yyyy-MM-dd');

请使用hive hql查询出2019-08-06号 每个渠道的下单用户数、订单总金额。

hql语句实现,结果表头如下: channel user_num order_amount

select channel,count(user_id) user_num,sum(amount) order_acount
from order
where to_date(time) = '2019-08-06'
group by channel;

35、考虑表设计和sql查询:

设计数据库表,用来存放学生基本信息,课程信息,学生的课程及成绩,并给出查询语句,查询平均成绩大于85的所有学生。

create table stu_1(
id string,
name string,
age int,
addr string
)
row format delimited
fields terminated by ','
;create table course_1(
cid string,
cname string
)
row format delimited
fields terminated by ',' ;create table course_sc(
id string,
cid string,
score int
)
row format delimited
fields terminated by ','
;load data local inpath '/hivedata/course_1.txt' overwrite into table course_1;
load data local inpath '/hivedata/stu_1.txt' overwrite into table stu_1;
load data local inpath '/hivedata/course_sc.txt' overwrite into table course_sc;
select cs.id, avg(score) avgscore
from course_sc cs
group by cs.id
having avg(score)>85 ;

36、需求如下:

有用户表user(uid,name) 以及黑名单BanUser(uid)

  1. 用left join 方式写sql查出所有不在黑名单的用户信息
  2. 用not exists 方法写sql查询所有不在黑名单的用户信息
create table u(
id string,
name string
)
row format delimited
fields terminated by ','
;create table banuser(
id string
);load data local inpath '/hivedata/banuser.txt' overwrite into table banuser;
load data local inpath '/hivedata/u.txt' overwrite into table u;
select u.*
from u left join banuser on u.id = banuser.id
where banuser.id is null;select u.*
from u
where not exists (select 1 from banuser where banuser.id = u.id);

37、需求如下:

course_score表数据:

1,zhangsan,数学,80,2015
2,lisi,语文,90,2016
3,lisi,数学,70,2016
4,wangwu,化学,80,2017
5,zhangsan,语文,85,2015
6,zhangsan,化学,90,2015
create table course_score(
id string,
name string,
course string,
score int,
year string
)
row format delimited
fields terminated by ','
;load data local inpath './hivedata/course_score.txt' overwrite into table course_score;

1、查出每个学期每门课程最高分记录(包含全部5个字段)

select id,name,course,score,year,max(score) over(partition by year,course)
from course_score;select year,course,max(score)
from course_score
group by year,course order by year;

2、查出单个学期中语文课在90分以上的学生的数学成绩记录(包含全部字段)

select A.id,A.name,A.course,A.score,A.year
from course_score A join course_score B on A.name = B.name
where B.course='语文' and B.score>=90 and A.course='数学';
select s.id, s.name, s.course, s.score, s.year
from course_score s
join
(select
course,
year,
max(score) score
from course_score
group by course,year) t1
on s.course=t1.course
and
s.year=t1.year
and
s.score=t1.score
;
select s.id, s.name, s.course, s.score, s.year
from course_score s
join
(select
id, name, course, score, year from course_score
where score >=90 and course="语文") t1
on s.name=t1.name
where s.course="数学"
;

38、需求如下

t1表:

name course score
aa English 75
bb math 85
aa math 90create table t1_1(
name string,
course string,
score int
)
row format delimited
fields terminated by ' ';load data local inpath './hivedata/t1_1.txt' overwrite into table t1_1;

使用hql输出以下结果

name English math
aa 75 90
bb 0 85
select name,
max(if(course="English",score,0)) English,
max(if(course="math",score,0)) math
from t1_1
group by name;

39、需求如下

t1表:

用户 商品
A P1
B P1
A P2
B P3

请你使用hql变成如下结果:

用户 P1 P2 P3
A 1 1 0
B 1 0 1
select username,
max(if(product="p1",1,0)) p1,
max(if(product="p2",1,0)) p2,
max(if(product="p3",1,0)) p3
from t1
group by username;

40、需求如下

dpt部门

dpt_id   dpt_name
1   产品
2   技术

User用户表

User_id  Dpt_id
1   1
2   1
3   2
4   2
5   3

result表

user_id  dpt_id  dpt_name
1   1   产品
2   1   产品
3   2   技术
4   2   技术
5   3   其他部门

原数据表时user和dpt表,请使用hql实现result表中的结果.

select user.user_id,user.dpt_id,dpt.dpt_id
from user
join dpt
on user.dpt_id = dpt.dpt_id;

41、需求如下

t1_order表:

order_id order_type order_time
111 N 10:00
111 A 10:05
111 B 10:10create table t1_order(
order_id string,
order_type string,
order_time string
)
row format delimited
fields terminated by ' ';load data local inpath './hivedata/t1_order.txt' overwrite into table t1_order;

是用hql获取结果如下:

order_id order_type_1 order_type_2 order_time_1 order_time_2
111 N A 10:00 10:05
111 A B 10:05 10:10
select order_id,order_type_1,order_type_2,order_time_1,order_time_2
from(
select order_id,order_type order_type_1,
lead(order_type,1) over(sort by order_time) order_type_2,
order_time order_time_1,
lead(order_time,1) over(sort by order_time) order_time_2
from t1_order
)A
where order_type_2 is not null;

42、需求如下

t1_hobby表

name sex hobby
janson 男 打乒乓球、游泳、看电影
tom 男 打乒乓球、看电影drop table t1_hobby;
create table t1_hobby(
name string,
sex string,
hobby string
)
row format delimited
fields terminated by ' ';load data local inpath './hivedata/t1_hobby.txt' overwrite into table t1_hobby;

hobby最多3个值,使用hql实现结果如下:

name sex hobby1 hobby2 hobby3
janson 男 打乒乓球 游泳 看电影
tom 男 打乒乓球 看电影
select name,sex,split(hobby,"、")[0] hobby1,
split(hobby,"、")[1] hobby2,
nvl(split(hobby,"、")[2],"") hobby3
from t1_hobby;

收集编写不易!来个收藏点赞呗

SQL、Hive场景题及答案相关推荐

  1. 银行计算机sql试题答案,试题(Sql Advance) (120题)带答案.doc

    试题(Sql Advance) (120题)带答案 <SQL Server 数据库管理和高级查询>试题库 答案仅作参考 选择题120题 1. 下列哪个SQL 语句属于DDL 语句()A. ...

  2. Hive sql 每天场景题 41

    41.现有各直播间的用户访问记录表(live_events)如下,表中每行数据表达的信息为,一个用户何时进入了一个直播间,又在何时离开了该直播间 户访问记录表(live_events)如下 user_ ...

  3. Hive场景模拟SQL编程实例 (销售业绩表)实验

    Hive场景模拟SQL编程实例 (销售业绩表)实验 实验手册 目录 一. 实验准备........................................................... ...

  4. SQL leetcode 刷题答案(二)

    承接上篇 SQL leetcode 刷题答案https://blog.csdn.net/hahaha66888/article/details/89925981 5.Big Countries sel ...

  5. SQL笔试经典50题及答案解析(题目1-10)

    前言 最近博主在准备数据分析岗位的笔试和面试,其中的考察重点就是SQL.听说练了这50道题目(题目和数据来源参考:互联网校招SQL笔试经典50题及答案解析),什么现场手撕SQL统统都不怕,我就赶紧来练 ...

  6. SQL笔试经典50题及答案解析

    [转]SQL笔试经典50题及答案解析 建表语句 create table Student(sid varchar(10),sname varchar(10),sage datetime,ssex nv ...

  7. SQL笔试经典50题及答案解析(题目21-30)

    前言 最近博主在准备数据分析岗位的笔试和面试,其中的考察重点就是SQL.听说练了这50道题目(题目和数据来源参考:互联网校招SQL笔试经典50题及答案解析),什么现场手撕SQL统统都不怕,我就赶紧来练 ...

  8. 微软等数据结构+算法面试100题全部答案集锦

    微软等数据结构+算法面试100题全部答案集锦 作者:July.阿财. 时间:二零一一年十月十三日. 引言 无私分享造就开源的辉煌. 今是二零一一年十月十三日,明日14日即是本人刚好开博一周年.在一周年 ...

  9. 史上最全Redis面试49题(含答案):哨兵+复制+事务+集群+持久化等

    最全面试题答案系列 史上最强多线程面试44题和答案:线程锁+线程池+线程同步等 最全MySQL面试60题和答案 史上最全memcached面试26题和答案 史上最全Spring面试71题与答案 今天主 ...

最新文章

  1. ES6笔记(4)-- Symbol类型
  2. UA MATH636 信息论5 信道编码定理的证明
  3. 【面试招聘】美团+阿里 | 机器学习算法春招面经
  4. c语言怎么判定结构体有无数据,C语言中什么是结构体,怎么定义结构体。
  5. Dynamic Web Module 3.0 requires Java 1.6 or newer报错
  6. 解析C#中is和as操作符的用法 two
  7. 数据结构与算法(一) 线性表之顺序表
  8. Linux下安装Go环境
  9. python中tkinter模块pack_使用Python中的tkinter模块作图的方法
  10. 2018美赛E题所有国家气候数据
  11. Android、Python实现微信运动
  12. 【人在运维囧途_14】打扫干净屋子再请客
  13. 编程也需要有大局观 新鲜出炉的大局观 了解一下
  14. 如何 运行WRF模型 天气预报 数据后处理
  15. 为什么项目验收前的总是非常忙乱?
  16. cesium gltf 1.0 2.0的一些扩展
  17. 推断统计学 假设检验 分布
  18. 监测数据报表系统使用和安装说明
  19. java主要用于神摸地方,超级无敌神奇的java期末考试题库
  20. 想提前躺平的程序员,这10个网站收好了!

热门文章

  1. 回归问题一般解决方法
  2. ZZULIOJ1061-1070
  3. 直播liveapp Android,Rockett Live
  4. Mybatis中接口与映射文件一定要同名且放在同一个包下?
  5. SSM POI根据预定好的模版导出execle
  6. 记科大讯飞、天源迪科、卫宁健康面试经历
  7. C++11:string和wstring之间互转换
  8. GO语言凭什么是区块链的首选语言
  9. 超详细解读带你读懂单细胞RNA测序分析的最佳实践教程 (原理、代码和评述)
  10. React脚手架中使用Sass