501.配置master Node的主机名为:master;slaver1 Node的主机名为:slaver1。将查询2个节点的主机名信息以文本形式提交到答题框。
[root@master ~]# hostname
[root@slave ~]# hostname
[root@master ~]# cat /etc/hosts localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 master slave

503.配置2个节点使用Ambari和iaas中的centos7的yum源。其中Ambari yum源在XianDian-BigData-v2.1-BASE.iso软件包中。
[root@master ~]# cat /etc/yum.repos.d/ambari.repo

[root@slave ~]# ntpdate master
21 Jan 02:21:08 ntpdate[10527]: adjust time server offset -0.000312 sec

[root@slave ~]# ssh master
Last login: Mon Jan 21 02:33:16 2019 from

# Welcome to XianDian #

[root@master ~]# ssh slave
Last login: Mon Jan 21 02:33:25 2019 from

# Welcome to XianDian #
[root@master ~]# java -version
java version “1.8.0_77”
Java™ SE Runtime Environment (build 1.8.0_77-b03)
Java HotSpot™ 64-Bit Server VM (build 25.77-b03, mixed mode)
[root@slave ~]# java -version
java version “1.8.0_77”
Java™ SE Runtime Environment (build 1.8.0_77-b03)
Java HotSpot™ 64-Bit Server VM (build 25.77-b03, mixed mode)

[root@master ~]# ls /var/www/html/

[root@master ~]# cat /etc/yum.repos.d/ambari.repo
[root@slave ~]# cat /etc/yum.repos.d/ambari.repo
[root@master ~]# java -version
java version “1.8.0_77”
Java™ SE Runtime Environment (build 1.8.0_77-b03)
Java HotSpot™ 64-Bit Server VM (build 25.77-b03, mixed mode)
[root@slave ~]# ntpdate master
21 Jan 02:50:01 ntpdate[12868]: adjust time server offset -0.028440 sec
[root@master ~]# systemctl status httpd
httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled)
Active: active (running) since Sun 2019-01-20 06:55:47 UTC; 19h ago
Docs: man:httpd(8)
Main PID: 11732 (httpd)
Status: “Total requests: 168; Current requests/sec: 0; Current traffic: 0 B/sec”
CGroup: /system.slice/httpd.service
├─11732 /usr/sbin/httpd -DFOREGROUND
├─11734 /usr/sbin/httpd -DFOREGROUND
├─11735 /usr/sbin/httpd -DFOREGROUND
├─11736 /usr/sbin/httpd -DFOREGROUND
├─11737 /usr/sbin/httpd -DFOREGROUND
├─11738 /usr/sbin/httpd -DFOREGROUND
├─14012 /usr/sbin/httpd -DFOREGROUND
├─14013 /usr/sbin/httpd -DFOREGROUND
├─14014 /usr/sbin/httpd -DFOREGROUND
└─25143 /usr/sbin/httpd -DFOREGROUND

Jan 20 06:55:47 master httpd[11732]: AH00558: httpd: Could not reliably d…ge
Jan 20 06:55:47 master systemd[1]: Started The Apache HTTP Server.
Hint: Some lines were ellipsized, use -l to show in full.

MariaDB [ambari]> show tables;
| Tables_in_ambari |
| ClusterHostMapping |
| adminpermission |
| adminprincipal |
| adminprincipaltype |
| adminprivilege |
| adminresource |
| adminresourcetype |
| alert_current |
| alert_definition |
| alert_group |
| alert_group_target |
| alert_grouping |
| alert_history |
| alert_notice |
| alert_target |
| alert_target_states |
| ambari_operation_history |
| ambari_sequences |
| artifact |
| blueprint |
| blueprint_configuration |
| blueprint_setting |
| clusterconfig |
| clusters |
| clusterservices |
| clusterstate |
| confgroupclusterconfigmapping |
| configgroup |
| configgrouphostmapping |
| execution_command |
| extension |
| extensionlink |
| groups |
| host_role_command |
| host_version |
| hostcomponentdesiredstate |
| hostcomponentstate |
| hostconfigmapping |
| hostgroup |
| hostgroup_component |
| hostgroup_configuration |
| hosts |
| hoststate |
| kerberos_descriptor |
| kerberos_principal |
| kerberos_principal_host |
| key_value_store |
| members |
| metainfo |
| permission_roleauthorization |
| remoteambaricluster |
| remoteambariclusterservice |
| repo_version |
| request |
| requestoperationlevel |
| requestresourcefilter |
| requestschedule |
| requestschedulebatchrequest |
| role_success_criteria |
| roleauthorization |
| servicecomponent_version |
| servicecomponentdesiredstate |
| serviceconfig |
| serviceconfighosts |
| serviceconfigmapping |
| servicedesiredstate |
| setting |
| stack |
| stage |
| topology_host_info |
| topology_host_request |
| topology_host_task |
| topology_hostgroup |
| topology_logical_request |
| topology_logical_task |
| topology_request |
| upgrade |
| upgrade_group |
| upgrade_history |
| upgrade_item |
| users |
| viewentity |
| viewinstance |
| viewinstancedata |
| viewinstanceproperty |
| viewmain |
| viewparameter |
| viewresource |
| viewurl |
| widget |
| widget_layout |
| widget_layout_user_widget |
103 rows in set (0.00 sec)

MariaDB [mysql]> select * from user;
| Host | User | Password | Select_priv | Insert_priv | Update_priv | Delete_priv | Create_priv | Drop_priv | Reload_priv | Shutdown_priv | Process_priv | File_priv | Grant_priv | References_priv | Index_priv | Alter_priv | Show_db_priv | Super_priv | Create_tmp_table_priv | Lock_tables_priv | Execute_priv | Repl_slave_priv | Repl_client_priv | Create_view_priv | Show_view_priv | Create_routine_priv | Alter_routine_priv | Create_user_priv | Event_priv | Trigger_priv | Create_tablespace_priv | ssl_type | ssl_cipher | x509_issuer | x509_subject | max_questions | max_updates | max_connections | max_user_connections | plugin | authentication_string |
| localhost | root | *C33A05FE652CA69965121A309F0DE7FA785D3916 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | | | | | 0 | 0 | 0 | 0 | | |
| master | root | *C33A05FE652CA69965121A309F0DE7FA785D3916 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | | | | | 0 | 0 | 0 | 0 | | |
| | root | *C33A05FE652CA69965121A309F0DE7FA785D3916 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | | | | | 0 | 0 | 0 | 0 | | |
| ::1 | root | *C33A05FE652CA69965121A309F0DE7FA785D3916 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | | | | | 0 | 0 | 0 | 0 | | |
| localhost | ambari | *C33A05FE652CA69965121A309F0DE7FA785D3916 | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | | | | | 0 | 0 | 0 | 0 | | |
| % | ambari | *C33A05FE652CA69965121A309F0DE7FA785D3916 | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | | | | | 0 | 0 | 0 | 0 | | |
| localhost | hive | *C33A05FE652CA69965121A309F0DE7FA785D3916 | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | | | | | 0 | 0 | 0 | 0 | | |
| % | hive | *C33A05FE652CA69965121A309F0DE7FA785D3916 | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | | | | | 0 | 0 | 0 | 0 | | |
8 rows in set (0.00 sec)

511.在master节点对ambari-server进行设置(ambari-server setup),指定JDK安装路径和数据库的主机、端口、用户、密码等参数,并启动ambari-server服务。配置完成后,通过curl命令在Linux Shell中查询http://master:8080界面内容,以文本形式提交查询结果到答题框中。
[root@master ~]# curl

* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements.  See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership.  The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License.  You may obtain a copy of the License at
*     http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* See the License for the specific language governing permissions and
* limitations under the License.
<!DOCTYPE html>
<html lang="en">
<head><meta charset="utf-8"><meta http-equiv="X-UA-Compatible" content="IE=edge"><meta name="viewport" content="width=device-width, initial-scale=1.0"><link rel="stylesheet" href="stylesheets/vendor.css"><link rel="stylesheet" href="stylesheets/app.css"><script src="javascripts/vendor.js"></script><script src="javascripts/app.js"></script><script>$(document).ready(function() {require('initialize');// make favicon work in firefox$('link[type*=icon]').detach().appendTo('head');$('#loading').remove();});</script><title>先电大数据平台</title><link rel="shortcut icon" href="/img/logo.png" type="image/x-icon">
<body><div id="loading">...加载中...</div><div id="wrapper"><!-- ApplicationView --></div><footer><div class="container"><a href="http://www.1daoyun.com/" target="_blank">© 南京第五十五所技术开发有限公司 版权所有 版本号:V2.2</a>.<br><a href="/licenses/NOTICE.txt" target="_blank">查看使用的第三方工具/资源,以及各自归属</a></div></footer>

512.在master节点对ambari-server进行设置(ambari-server setup),指定JDK安装路径和数据库的主机、端口、用户、密码等参数,并启动ambari-server服务。配置完成后,查询ambari-server的运行状态信息,以文本形式提交查询结果到答题框中。
[root@master ~]# ambari-server status
Using python /usr/bin/python
Ambari-server status
Ambari Server running
Found Ambari Server PID: 13714 at: /var/run/ambari-server/ambari-server.pid

[root@slave ~]# tail /var/log/ambari-agent/ambari-agent.log
INFO 2019-01-21 03:23:10,096 logger.py:75 - Testing the JVM’s JCE policy to see it if supports an unlimited key length.
INFO 2019-01-21 03:23:10,096 logger.py:75 - Testing the JVM’s JCE policy to see it if supports an unlimited key length.
INFO 2019-01-21 03:23:10,349 Hardware.py:176 - Some mount points were ignored: /, /dev, /dev/shm, /run, /sys/fs/cgroup
INFO 2019-01-21 03:23:10,351 Controller.py:320 - Sending Heartbeat (id = 76214)
INFO 2019-01-21 03:23:10,358 Controller.py:333 - Heartbeat response received (id = 76215)
INFO 2019-01-21 03:23:10,358 Controller.py:342 - Heartbeat interval is 1 seconds
INFO 2019-01-21 03:23:10,358 Controller.py:380 - Updating configurations from heartbeat
INFO 2019-01-21 03:23:10,358 Controller.py:389 - Adding cancel/execution commands
INFO 2019-01-21 03:23:10,358 Controller.py:406 - Adding recovery commands
INFO 2019-01-21 03:23:10,359 Controller.py:475 - Waiting 0.9 for next heartbeat

514.在先电大数据平台中创建Hadoop集群“XIANDIAN HDP”,选择安装栈为HDP 2.4,安装服务为HDFS、YARN+MapReduce2、Zookeeper、Ambari Metrics。安装完成后,在master节点和slaver节点的Linux Shell中查看Hadoop集群的服务进程信息,以文本形式提交查询结果到答题框中。
[root@slave ~]# jps
21344 SecondaryNameNode
4417 QuorumPeerMain
17971 Jps
18821 DataNode
20503 NodeManager
19689 ApplicationHistoryServer
20266 JobHistoryServer
20876 ResourceManager

515.在先电大数据平台中创建Hadoop集群“XIANDIAN HDP”,选择安装栈为HDP 2.4,安装服务为HDFS、YARN+MapReduce2、Zookeeper、Ambari Metrics。安装完成后,在Linux Shell中查看Hadoop集群的基本统计信息,以文本形式提交查询命令和查询结果到答题框中。
[root@master ~]# hdfs fsck /
Connecting to namenode via http://master:50070/fsck?ugi=root&path=%2F
FSCK started by root (auth:SIMPLE) from / for path / at Mon Jan 21 04:23:40 UTC 2019
/app-logs/ambari-qa/logs/application_1547971095325_0001/slave_45454_1547971260134: Under replicated BP-1577517373- Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/app-logs/ambari-qa/logs/application_1547971095325_0002/master_45454_1547971343610: Under replicated BP-1577517373- Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/app-logs/ambari-qa/logs/application_1547971095325_0002/slave_45454_1547971344834: Under replicated BP-1577517373- Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/hdp/apps/ Under replicated BP-1577517373- Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/hdp/apps/ Under replicated BP-1577517373- Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/hdp/apps/ Under replicated BP-1577517373- Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).

/hdp/apps/ Under replicated BP-1577517373- Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/hdp/apps/ Under replicated BP-1577517373- Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).

/hdp/apps/ Under replicated BP-1577517373- Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/hdp/apps/ Under replicated BP-1577517373- Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/mr-history/done/2019/01/20/000000/job_1547971095325_0002-1547971269040-ambari%2Dqa-word+count-1547971336333-1-1-SUCCEEDED-default-1547971292433.jhist: Under replicated BP-1577517373- Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/mr-history/done/2019/01/20/000000/job_1547971095325_0002_conf.xml: Under replicated BP-1577517373- Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/tmp/id000a0d00_date582019: Under replicated BP-1577517373- Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/tmp/idtest.ambari-qa.1547973953.18.in: Under replicated BP-1577517373- Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/tmp/idtest.ambari-qa.1547973953.18.pig: Under replicated BP-1577517373- Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/ambari-qa/DistributedShell/application_1547971095325_0001/AppMaster.jar: Under replicated BP-1577517373- Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/ambari-qa/DistributedShell/application_1547971095325_0001/shellCommands: Under replicated BP-1577517373- Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/ambari-qa/mapredsmokeinput: Under replicated BP-1577517373- Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).

/user/ambari-qa/mapredsmokeoutput/part-r-00000: Under replicated BP-1577517373- Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
Total size: 502825041 B
Total dirs: 48
Total files: 18
Total symlinks: 0
Total blocks (validated): 19 (avg. block size 26464475 B)
Minimally replicated blocks: 19 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 19 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 5
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 33 (46.478874 %)
Number of data-nodes: 2
Number of racks: 1
FSCK ended at Mon Jan 21 04:23:40 UTC 2019 in 16 milliseconds

The filesystem under path ‘/’ is HEALTHY

516.检查master Node的主机名master,slaver1 Node的主机名slaver1。修改2个节点的hosts文件,配置IP地址与主机名之间的映射关系。查询2个节点的hosts文件的信息以文本形式提交到答题框。
[root@master ~]# cat /etc/hosts localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 master slave
[root@slave ~]# cat /etc/hosts localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 master slave

[root@slave ~]# ntpdate master
21 Jan 04:27:46 ntpdate[23426]: adjust time server offset -0.146752 sec

518.检查master节点ambari-server的运行状态,如未启动,则启动ambari-server服务。使用curl命令在Linux Shell中查询http://master:8080界面内容,以文本形式提交查询结果到答题框中。
[root@master ~]# curl http://master:8080

* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements.  See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership.  The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License.  You may obtain a copy of the License at
*     http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* See the License for the specific language governing permissions and
* limitations under the License.
--><!DOCTYPE html>
<html lang="en">
<head><meta charset="utf-8"><meta http-equiv="X-UA-Compatible" content="IE=edge"><meta name="viewport" content="width=device-width, initial-scale=1.0"><link rel="stylesheet" href="stylesheets/vendor.css"><link rel="stylesheet" href="stylesheets/app.css"><script src="javascripts/vendor.js"></script><script src="javascripts/app.js"></script><script>$(document).ready(function() {require('initialize');// make favicon work in firefox$('link[type*=icon]').detach().appendTo('head');$('#loading').remove();});</script><title>先电大数据平台</title><link rel="shortcut icon" href="/img/logo.png" type="image/x-icon">
<body><div id="loading">...加载中...</div><div id="wrapper"><!-- ApplicationView --></div><footer><div class="container"><a href="http://www.1daoyun.com/" target="_blank">© 南京第五十五所技术开发有限公司 版权所有 版本号:V2.2</a>.<br><a href="/licenses/NOTICE.txt" target="_blank">查看使用的第三方工具/资源,以及各自归属</a></div></footer>

Using python /usr/bin/python
Ambari-server status
Ambari Server running
Found Ambari Server PID: 13714 at: /var/run/ambari-server/ambari-server.pid

[root@slave ~]# tail /var/log/ambari-agent/ambari-agent.log
INFO 2019-01-21 04:29:36,593 logger.py:75 - Testing the JVM’s JCE policy to see it if supports an unlimited key length.
INFO 2019-01-21 04:29:36,593 logger.py:75 - Testing the JVM’s JCE policy to see it if supports an unlimited key length.
INFO 2019-01-21 04:29:36,841 Hardware.py:176 - Some mount points were ignored: /, /dev, /dev/shm, /run, /sys/fs/cgroup
INFO 2019-01-21 04:29:36,844 Controller.py:320 - Sending Heartbeat (id = 80504)
INFO 2019-01-21 04:29:36,850 Controller.py:333 - Heartbeat response received (id = 80505)
INFO 2019-01-21 04:29:36,850 Controller.py:342 - Heartbeat interval is 1 seconds
INFO 2019-01-21 04:29:36,851 Controller.py:380 - Updating configurations from heartbeat
INFO 2019-01-21 04:29:36,851 Controller.py:389 - Adding cancel/execution commands
INFO 2019-01-21 04:29:36,851 Controller.py:475 - Waiting 0.9 for next heartbeat
INFO 2019-01-21 04:29:37,751 Controller.py:482 - Wait for next heartbeat over

521.启动成功后,分别在master节点和slaver节点的Linux Shell中查看Hadoop集群的服务进程信息,以文本形式提交查询结果到答题框中。
[root@slave ~]# jps
21344 SecondaryNameNode
23601 Jps
4417 QuorumPeerMain
18821 DataNode
20503 NodeManager
19689 ApplicationHistoryServer
20266 JobHistoryServer
20876 ResourceManager

522.启动成功后,在Linux Shell中查看Hadoop集群的基本统计信息,以文本形式提交查询命令和查询结果到答题框中。
[root@master ~]# hdfs fsck /
Connecting to namenode via http://master:50070/fsck?ugi=root&path=%2F
FSCK started by root (auth:SIMPLE) from / for path / at Mon Jan 21 04:31:09 UTC 2019
/app-logs/ambari-qa/logs/application_1547971095325_0001/slave_45454_1547971260134: Under replicated BP-1577517373- Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/app-logs/ambari-qa/logs/application_1547971095325_0002/master_45454_1547971343610: Under replicated BP-1577517373- Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/app-logs/ambari-qa/logs/application_1547971095325_0002/slave_45454_1547971344834: Under replicated BP-1577517373- Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/hdp/apps/ Under replicated BP-1577517373- Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/hdp/apps/ Under replicated BP-1577517373- Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/hdp/apps/ Under replicated BP-1577517373- Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).

/hdp/apps/ Under replicated BP-1577517373- Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/hdp/apps/ Under replicated BP-1577517373- Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).

/hdp/apps/ Under replicated BP-1577517373- Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/hdp/apps/ Under replicated BP-1577517373- Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/mr-history/done/2019/01/20/000000/job_1547971095325_0002-1547971269040-ambari%2Dqa-word+count-1547971336333-1-1-SUCCEEDED-default-1547971292433.jhist: Under replicated BP-1577517373- Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/mr-history/done/2019/01/20/000000/job_1547971095325_0002_conf.xml: Under replicated BP-1577517373- Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/tmp/id000a0d00_date582019: Under replicated BP-1577517373- Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/tmp/idtest.ambari-qa.1547973953.18.in: Under replicated BP-1577517373- Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/tmp/idtest.ambari-qa.1547973953.18.pig: Under replicated BP-1577517373- Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/ambari-qa/DistributedShell/application_1547971095325_0001/AppMaster.jar: Under replicated BP-1577517373- Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/ambari-qa/DistributedShell/application_1547971095325_0001/shellCommands: Under replicated BP-1577517373- Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/ambari-qa/mapredsmokeinput: Under replicated BP-1577517373- Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).

/user/ambari-qa/mapredsmokeoutput/part-r-00000: Under replicated BP-1577517373- Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
Total size: 502825041 B
Total dirs: 48
Total files: 18
Total symlinks: 0
Total blocks (validated): 19 (avg. block size 26464475 B)
Minimally replicated blocks: 19 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 19 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 5
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 33 (46.478874 %)
Number of data-nodes: 2
Number of racks: 1
FSCK ended at Mon Jan 21 04:31:09 UTC 2019 in 7 milliseconds

The filesystem under path ‘/’ is HEALTHY

[root@master ~]# hadoop fs -ls /1daoyun/file
Found 1 items
-rw-r–r-- 3 root hdfs 24811184 2019-01-12 08:23 /1daoyun/file/BigDataSkills.txt

[root@master ~]# hadoop fsck /1daoyun/file/BigDataSkills.txt
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://master:50070/fsck?ugi=root&path=%2F1daoyun%2Ffile%2FBigDataSkills.txt
FSCK started by root (auth:SIMPLE) from / for path /1daoyun/file/BigDataSkills.txt at Sat Jan 12 08:42:55 CST 2019
/1daoyun/file/BigDataSkills.txt: Under replicated BP-1205401636- Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
Total size: 24811184 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 24811184 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 1 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 1 (33.333332 %)
Number of data-nodes: 2
Number of racks: 1
FSCK ended at Sat Jan 12 08:42:55 CST 2019 in 4 milliseconds

The filesystem under path ‘/1daoyun/file/BigDataSkills.txt’ is HEALTHY
[root@master ~]# hadoop fs -D dfs.replication=2 -putBigDataSkills.txt /1daoyun/file
[root@master ~]# hadoop fsck /1daoyun/file/BigDataSkills.txt DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://master:50070/fsck?ugi=root&path=%2F1daoyun%2Ffile%2FBigDataSkills.txt
FSCK started by root (auth:SIMPLE) from / for path /1daoyun/file/BigDataSkills.txt at Sat Jan 12 08:44:57 CST 2019
.Status: HEALTHY
Total size: 24811184 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 24811184 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 2
Number of racks: 1
FSCK ended at Sat Jan 12 08:44:57 CST 2019 in 0 milliseconds

The filesystem under path ‘/1daoyun/file/BigDataSkills.txt’ is HEALTHY

[hdfs@master root]$ hadoop dfsadmin -allowSnapshot /apps
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Allowing snaphot on /apps succeeded
[hdfs@master root]$ hadoop fs -createSnapshot /apps apps_1daoyun
Created snapshot /apps/.snapshot/apps_1daoyun
[hdfs@master root]$ hadoop fs -ls /apps/.snapshot
Found 1 items
drwxrwxrwx - hdfs hdfs 0 2019-01-12 08:53 /apps/.snapshot/apps_1daoyun

605.HDFS文件系统的/user/root/small-file目录中存在一些小文件,要求使用Hadoop Arachive工具将这些小文件归档成为一个文件,文件名为xiandian-data.tar。归档完成后,查看xiandian-data.tar的列表信息,以文本形式提交以上操作命令和输出结果到答题框中。
[root@master ~]# hadoop archive -archiveName xiandian-data.har -p /user/ambari-qa /user/root
[root@master ~]# hadoop fs -ls /user/root/xiandian-data.har
Found 4 items
-rw-r–r-- 3 root hdfs 0 2019-01-12 09:05 /user/root/xiandian-data.har/_SUCCESS
-rw-r–r-- 3 root hdfs 959 2019-01-12 09:05 /user/root/xiandian-data.har/_index
-rw-r–r-- 3 root hdfs 23 2019-01-12 09:05 /user/root/xiandian-data.har/_masterindex
-rw-r–r-- 3 root hdfs 49747 2019-01-12 09:05 /user/root/xiandian-data.har/part-0

[hdfs@master root]$ hadoop dfsadmin -safemode enter
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Safe mode is ON
[hdfs@master root]$ hadoop dfsadmin -safemode get
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Safe mode is ON

高级 core-site



608.为了防止操作人员误删文件,HDFS文件系统提供了回收站的功能,但过多的垃圾文件会占用大量的存储空间。要求在Linux Shell中使用“vi”命令修改相应的配置文件以及参数信息,完成后,重启相应的服务。以文本形式提交以上操作命令和修改的参数信息到答题框中。
[root@master ~]# vi /etc/hadoop/


[root@master ~]# su - hdfs

Last login: Mon May 8 09:31:52 UTC 2017

[hdfs@master ~]$/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/confstop namenode

[hdfs@master ~]$/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/confstart namenode

[hdfs@master ~]$/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/confstop datanode

[hdfs@master ~]$/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/confstart datanode
[root@master ~]# hadoop fs -find / -name cetc55.txt
find: Permission denied: user=root, access=READ_EXECUTE, inode="/apps/falcon/extensions/mirroring":falcon:users:drwxrwx—
find: Permission denied: user=root, access=READ_EXECUTE, inode="/apps/hbase/staging":hbase:hdfs:drwx–x--x
find: Permission denied: user=root, access=READ_EXECUTE, inode="/ats/done":yarn:hadoop:drwx------
find: Permission denied: user=root, access=READ_EXECUTE, inode="/mr-history/done/2019/01/12":mapred:hadoop:drwxrwx—
find: Permission denied: user=root, access=READ_EXECUTE, inode="/tmp/hive/hive/721658a5-bd0e-4d51-95ce-db0cadd38cfb":hive:hdfs:drwx------
find: Permission denied: user=root, access=READ_EXECUTE, inode="/user/ambari-qa":ambari-qa:hdfs:drwxrwx—
find: Permission denied: user=root, access=READ_EXECUTE, inode="/webhdfs/v1":hive:hadoop:drwx------
[root@master ~]# hadoop fs -mv /user/root/.Trash/Current/cetc55.txt /

Block replication

611.Hadoop集群中的主机在某些情况下会出现宕机或者系统损坏的问题,一旦遇到这些问题,HDFS文件系统中的数据文件难免会产生损坏或者丢失,为了保证HDFS文件系统的可靠性,需要将集群的冗余复制因子修改为5,在Linux Shell中使用“vi”命令修改相应的配置文件以及参数信息,完成后,重启相应的服务。以文本形式提交以上操作命令和修改的参数信息到答题框中。
[root@master ~]# vi /etc/hadoop/

[root@master hadoop-mapreduce]# hadoop jar hadoop-mapreduce-examples.jar pi 5 5
Number of Maps = 5
Samples per Map = 5
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Starting Job
19/01/12 12:11:29 INFO client.RMProxy: Connecting to ResourceManager at slaver/
19/01/12 12:11:29 INFO client.AHSProxy: Connecting to Application History server at slaver/
19/01/12 12:11:31 INFO input.FileInputFormat: Total input paths to process : 5
19/01/12 12:11:31 INFO mapreduce.JobSubmitter: number of splits:5
19/01/12 12:11:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1547258293975_0001
19/01/12 12:11:32 INFO impl.YarnClientImpl: Submitted application application_1547258293975_0001
19/01/12 12:11:32 INFO mapreduce.Job: The url to track the job: http://slaver:8088/proxy/application_1547258293975_0001/
19/01/12 12:11:32 INFO mapreduce.Job: Running job: job_1547258293975_0001
19/01/12 12:11:52 INFO mapreduce.Job: Job job_1547258293975_0001 running in uber mode : false
19/01/12 12:11:52 INFO mapreduce.Job: map 0% reduce 0%
19/01/12 12:12:06 INFO mapreduce.Job: map 40% reduce 0%
19/01/12 12:12:19 INFO mapreduce.Job: map 40% reduce 13%
19/01/12 12:12:20 INFO mapreduce.Job: map 100% reduce 13%
19/01/12 12:12:24 INFO mapreduce.Job: map 100% reduce 100%
19/01/12 12:12:24 INFO mapreduce.Job: Job job_1547258293975_0001 completed successfully
19/01/12 12:12:24 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=116
FILE: Number of bytes written=888813
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1305
HDFS: Number of bytes written=215
HDFS: Number of read operations=23
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=5
Launched reduce tasks=1
Data-local map tasks=5
Total time spent by all maps in occupied slots (ms)=94972
Total time spent by all reduces in occupied slots (ms)=30068
Total time spent by all map tasks (ms)=94972
Total time spent by all reduce tasks (ms)=15034
Total vcore-milliseconds taken by all map tasks=94972
Total vcore-milliseconds taken by all reduce tasks=15034
Total megabyte-milliseconds taken by all map tasks=64770904
Total megabyte-milliseconds taken by all reduce tasks=20506376
Map-Reduce Framework
Map input records=5
Map output records=10
Map output bytes=90
Map output materialized bytes=140
Input split bytes=715
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=140
Reduce input records=10
Reduce output records=0
Spilled Records=20
Shuffled Maps =5
Failed Shuffles=0
Merged Map outputs=5
GC time elapsed (ms)=977
CPU time spent (ms)=11310
Physical memory (bytes) snapshot=2232025088
Virtual memory (bytes) snapshot=15569510400
Total committed heap usage (bytes)=2539126784
Shuffle Errors
File Input Format Counters
Bytes Read=590
File Output Format Counters
Bytes Written=97
Job Finished in 55.522 seconds
Estimated value of Pi is 3.68000000000000000000

[root@master hadoop-mapreduce]# hadoop jar hadoop-mapreduce-examples.jar wordcount /1daoyun/file/BigDataSkills.txt /1daoyun/output
[root@master hadoop-mapreduce]# hadoop fs -cat /1daoyun/output/part-r-00000


[root@master hadoop-mapreduce]# cat /opt/txt/puzzle.dta
8 ? ? ? ? ? ? ? ?
? ? 3 6 ? ? ? ? ?
? 7 ? ? 9 ? 2 ? ?
? 5 ? ? ? 7 ? ? ?
? ? ? ? 4 5 7 ? ?
? ? ? 1 ? ? ? 3 ?
? ? 1 ? ? ? ? 6 8
? ? 8 5 ? ? ? 1 ?
? 9 ? ? ? ? 4 ? ?
[root@master hadoop-mapreduce]# hadoop jar hadoop-mapreduce-examples- sudoku /root/puzzle.dta
Solving /root/puzzle.dta
8 6 1 5 7 2 3 9 4
5 2 4 3 8 9 1 7 6
3 7 9 1 4 6 5 8 2
4 3 6 2 5 8 9 1 7
7 9 8 6 3 1 2 4 5
1 5 2 4 9 7 8 6 3
2 4 7 9 1 5 6 3 8
9 8 5 7 6 3 4 2 1
6 1 3 8 2 4 7 5 9

Found 1 solutions

[root@master hadoop-mapreduce]# hadoop jar hadoop-mapreduce-examples.jar grep /1daoyun/file/BigDataSkills.txt /1daoyun/output Hadoop
[root@master hadoop-mapreduce]# hadoop fs -cat /1daoyun/output/*

616.启动先电大数据平台的Hbase数据库,其中要求使用master节点的RegionServer。在Linux Shell中启动Hbase shell,查看HBase的版本信息。将以上操作命令(相关数据库命令语言请全部使用小写格式)以文本形式提交到答题框。
[root@master ~]# hbase shell
HBase Shell; enter ‘help’ for list of supported commands.
Type “exit” to leave the HBase Shell
Version, r718c773662346de98a8ce6fd3b5f64e279cb87d4, Wed May 31 03:27:31 UTC 2017

hbase(main):004:0> version, r718c773662346de98a8ce6fd3b5f64e279cb87d4, Wed May 31 03:27:31 UTC 2017

617.启动先电大数据平台的Hbase数据库,其中要求使用master节点的RegionServer。在Linux Shell中启动Hbase shell,查看HBase的状态信息。将以上操作命令(相关数据库命令语言请全部使用小写格式)以文本形式提交到答题框。
[root@master ~]# hbase shell
hbase(main):015:0> status
1 active master, 0 backup masters, 1 servers, 0 dead, 4.0000 average load

618.启动先电大数据平台的Hbase数据库,其中要求使用master节点的RegionServer。在Linux Shell中启动Hbase shell,查看进入HBase shell的当前系统用户。将以上操作命令(相关数据库命令语言请全部使用小写格式)以文本形式提交到答题框。
[root@master ~]# hbase shell
hbase(main):019:0> whoami
root (auth:SIMPLE)
groups: root

hbase(main):036:0> create ‘xiandian_user’,‘info’
0 row(s) in 7.4640 seconds
=> Hbase::Table - xiandian_user

hbase(main):045:0> describe ‘xiandian_user’
Table xiandian_user is ENABLED
{NAME => ‘info’, BLOOMFILTER => ‘ROW’, VERSIONS => ‘1’, IN_MEMORY => ‘false’, KE
1 row(s) in 0.4000 seconds

620.开启HBase的安全认证功能,在HBase Shell中设置root用户拥有表xiandian_user的读写与执行的权限,设置完成后,使用相关命令查看其权限信息。将开启HBase的安全认证功能的参数和参数值以及以上操作命令(相关数据库命令语言请全部使用小写格式)和查询结果以文本形式提交到答题框。
参数 hbase.security.authorization
参数值 true
hbase(main):001:0> grant ‘root’,‘RWX’,‘xiandian_user’
hbase(main):002:0> user_permission ‘xiandian_user’


622.在HBase Shell中使用get命令查询xiandian_user表中rowkey为88的info信息,将以上操作命令(相关数据库命令语言请全部使用小写格式)和查询结果以文本形式提交到答题框。

623.在HBase Shell中统计xiandian_user表中的行数,要求统计的行数间隔为100,统计的数据缓存为500,将以上操作命令(相关数据库命令语言请全部使用小写格式)和查询结果以文本形式提交到答题框。

624.进入HBase Shell,在xiandian_user表中插入数据,其rowkey为620,info:age为58,info:name为user620,插入完成后,使用get命令查询插入的信息。将以上操作命令(相关数据库命令语言请全部使用小写格式)和查询结果以文本形式提交到答题框。

625.进入HBase Shell,删除xiandian_user表中rowkey为73,关于info:age的数据,删除后,使用get命令查询rowkey为73的数据信息。将以上操作命令(相关数据库命令语言请全部使用小写格式)和查询结果以文本形式提交到答题框。

626.启动先电大数据平台的Hive数据仓库,启动Hvie 客户端,通过Hive查看hadoop所有文件路径(相关数据库命令语言请全部使用小写格式),将查询结果以文本形式提交到答题框中。
hive> dfs -ls;
Found 4 items
drwx------ - root hdfs 0 2019-01-12 08:31 .Trash
drwxr-xr-x - root hdfs 0 2019-01-13 17:42 .hiveJars
drwx------ - root hdfs 0 2019-01-12 12:34 .staging
drwxr-xr-x - root hdfs 0 2019-01-12 09:05 xiandian-data.har

stname(string) stID(int) class(string) opt_cour(string)
hive> create table xd_phy_course(stname string,stID int,class string,opt_cour string) row format delimited fields terminated by ‘\t’ lines terminated by ‘\n’ stored as textfile;
Time taken: 13.391 seconds
hive> load data local inpath ‘/opt/txt/phy_course_xd.txt’ into table xd_phy_course;
Loading data to table default.xd_phy_course
Table default.xd_phy_course stats: [numFiles=1, numRows=0, totalSize=450, rawDataSize=0]
Time taken: 1.177 seconds
hive> dfs -ls /apps/hive/warehouse;
Found 1 items
drwxrwxrwx - root hadoop 0 2019-01-14 21:53 /apps/hive/warehouse/xd_phy_course

stname(string) stID(int) class(string) opt_cour(string)
hive> create table xd_phy_course(stname string,stID int,class string,opt_cour string) row format delimited fields terminated by ‘\t’ lines terminated by ‘\n’ location ‘/qdaoyun/data/hive’;
Time taken: 0.278 seconds
hive> load data local inpath ‘/opt/txt/phy_course_xd.txt’ into table xd_phy_course;
Loading data to table default.xd_phy_course
Table default.xd_phy_course stats: [numFiles=1, totalSize=450]
Time taken: 0.945 seconds
hive> desc xd_phy_course;
stname string
stid int
class string
opt_cour string
Time taken: 0.495 seconds, Fetched: 4 row(s)

stname(string) stID(int) class(string) opt_cour(string)
hive> create table xd_phy_course(stname string,stID int,class string,opt_cour string) row format delimited fields terminated by ‘\t’ lines terminated by ‘\n’;
Time taken: 0.646 seconds
hive> load data local inpath ‘/opt/txt/phy_course_xd.txt’ into table xd_phy_course;
Loading data to table default.xd_phy_course
Table default.xd_phy_course stats: [numFiles=1, numRows=0, totalSize=450, rawDataSize=0]
Time taken: 0.999 seconds
hive> select * from xd_phy_course where class=‘Software_1403’ and opt_cour=‘volleyball’;
student409 10120408 Software_1403 volleyball
student411 10120410 Software_1403 volleyball
student413 10120412 Software_1403 volleyball
student419 10120418 Software_1403 volleyball
student421 10120420 Software_1403 volleyball
student422 10120421 Software_1403 volleyball
student424 10120423 Software_1403 volleyball
student432 10120431 Software_1403 volleyball
student438 10120437 Software_1403 volleyball
student447 10120446 Software_1403 volleyball
Time taken: 0.98 seconds, Fetched: 10 row(s)

stname(string) stID(int) class(string) opt_cour(string)
hive> create table xd_phy_course(stname string,stID int,class string,opt_cour string) row format delimited fields terminated by ‘\t’ lines terminated by ‘\n’;
Time taken: 0.225 seconds
hive> load data local inpath ‘/opt/txt/phy_course_xd.txt’ into table xd_phy_course;
Loading data to table default.xd_phy_course
Table default.xd_phy_course stats: [numFiles=1, numRows=0, totalSize=450, rawDataSize=0]
Time taken: 0.91 seconds
hive> create table phy_opt_count(opt_cour string,cour_count int) row format delimited fields terminated by ‘\t’ lines terminated by ‘\n’;
Time taken: 0.206 seconds
hive> insert overwrite table phy_opt_count select xd_phy_course.opt_cour,count(distinct xd_phy_course.stID) from xd_phy_course group by xd_phy_course.opt_cour;
Query ID = root_20190115051024_6c7a70fe-a7b0-49a8-b8ab-cc7ee2854c47
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening…
Session re-established.
Status: Running (Executing on YARN cluster with App id application_1547338155253_0005)

Loading data to table default.phy_opt_count
Table default.phy_opt_count stats: [numFiles=1, numRows=1, totalSize=14, rawDataSize=13]
Time taken: 35.618 seconds
hive> select * from phy_opt_count;
volleyball 10
Time taken: 0.094 seconds, Fetched: 1 row(s)

stname(string) stID(int) class(string) opt_cour(string) score(float)
hive> create table phy_course_score_xd(stname string,stID int,class string,opt_cour string,score float) row format delimited fields terminated by ‘\t’ lines terminated by ‘\n’;
Time taken: 0.202 seconds
hive> load data local inpath ‘/opt/txt/phy_course_score_xd.txt’ into table phy_course_score_xd;
Loading data to table default.phy_course_score_xd
Table default.phy_course_score_xd stats: [numFiles=1, numRows=0, totalSize=354, rawDataSize=0]
Time taken: 0.836 seconds
hive> select * from phy_course_score_xd where class=‘Software_1403’ and score>90;
student433 10120432 Software_1403 football 98.0
student444 10120443 Software_1403 swimming 99.0
student445 10120444 Software_1403 tabletennis 97.0
student450 10120449 Software_1403 basketball 97.0
Time taken: 0.087 seconds, Fetched: 4 row(s)

stname(string) stID(int) class(string) opt_cour(string) score(float)
hive> select class,round(avg(score)) from phy_course_score_xd group by class;
Query ID = root_20190115054732_ff2d91be-9eb8-44e1-b824-0fcf736bb512
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1547338155253_0006)

Software_1403 98.0
Software_1403 badminton NULL
Software_1403 tabletennis NULL
Software_1403 volleyball NULL
Time taken: 6.85 seconds, Fetched: 4 row(s)

stname(string) stID(int) class(string) opt_cour(string) score(float)
hive> select class,max(score) from phy_course_score_xd group by class;
Query ID = root_20190115054845_87524d45-31bb-46ea-8ed4-173fc84c1b25
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1547338155253_0006)

Software_1403 99.0
Software_1403 badminton NULL
Software_1403 tabletennis NULL
Software_1403 volleyball NULL
Time taken: 16.028 seconds, Fetched: 4 row(s)


md5(STRING) url(STRING) request_date (STRING) request_time (STRING) ip(STRING)
hive> create table weblog_entries(md5 string,url string,request_data string,request_time string,ip string) row format delimited fields terminated by ‘\t’ lines terminated by ‘\n’ location ‘/data/hive/weblog/’;
Time taken: 0.384 seconds
hive> load data local inpath ‘/opt/txt/weblog_entries.txt’ into table weblog_entries;
Loading data to table default.weblog_entries
Table default.weblog_entries stats: [numFiles=1, totalSize=251130]
Time taken: 0.868 seconds
hive> select concat_ws(’_’,request_data,request_time) from weblog_entries;
Time taken: 0.123 seconds, Fetched: 3000 row(s)


md5(STRING) url(STRING) request_date (STRING) request_time (STRING) ip(STRING)
hive> create table ip_to_country(ip string,country string) row format delimited fields terminated by ‘\t’ lines terminated by ‘\n’ location ‘/data/hive/ip_to_country/’;
Time taken: 0.211 seconds
hive> load data local inpath ‘/opt/txt/ip_to_country.txt’ into table ip_to_country;
Loading data to table default.ip_to_country
Table default.ip_to_country stats: [numFiles=1, totalSize=7552856]
Time taken: 5.443 seconds
hive> select wle.*,itc.country from weblog_entries wle join ip_to_country itc on wle.ip=itc.ip;
Query ID = root_20190115100452_7e9f9e8f-8c18-4546-83cf-3557d8ff9bbf
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening…
Session re-established.
Status: Running (Executing on YARN cluster with App id application_1547338155253_0009)

Time taken: 24.07 seconds

md5(STRING) url(STRING) request_date (STRING) request_time (STRING) ip(STRING)
hive> create table weblog_entries_url_length as select url,request_data,request_time,length(url) as url_length from weblog_entries;
Query ID = root_20190115101234_1b0ca195-da6a-481f-9276-52812e2ff32d
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1547338155253_0009)

Moving data to directory hdfs://master:8020/apps/hive/warehouse/weblog_entries_url_length
Table default.weblog_entries_url_length stats: [numFiles=1, numRows=3000, totalSize=121379, rawDataSize=118379]
Time taken: 8.019 seconds
hive> select * from weblog_entries_url_length;
/apliivnfonuq.html 2012-05-10 21:20:51 18
/cvjcxq.html 2012-05-10 21:34:54 12
/oduuw.html 2012-05-10 21:23:00 11
/uytd.html 2012-05-10 21:10:22 10
/frpnqyqqa.html 2012-05-10 21:18:48 15
/n.html 2012-05-10 21:12:25 7
/qnrxlxqacgiudbtfggcg.html 2012-05-10 21:29:01 26
/sbbiuot.html 2012-05-10 21:13:47 13
/ofxi.html 2012-05-10 21:12:37 10
/hjmdhaoogwqhp.html 2012-05-10 21:34:20 19
/angjbmea.html 2012-05-10 21:27:00 14
/mmdttqsnjfifkihcvqu.html 2012-05-10 21:33:53 25
/eorxuryjadhkiwsf.html 2012-05-10 21:10:19 22
/e.html 2012-05-10 21:12:05 7
/khvc.html 2012-05-10 21:25:58 10
/c.html 2012-05-10 21:34:28 7
Time taken: 0.087 seconds, Fetched: 3000 row(s)

637.在master和slaver节点安装Sqoop Clients,完成后,在master节点查看Sqoop的版本信息,将操作命令和输出结果以文本形式提交到答题框中。
[root@master ~]# sqoop version
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/01/15 10:33:58 INFO sqoop.Sqoop: Running Sqoop version:
git commit id 99af1205a99646445a9c7254ad2770706e1cc6a4
Compiled by jenkins on Wed May 31 03:22:43 UTC 2017

[root@master ~]# sqoop list-databases --connect jdbc:mysql://localhost --username root --password bigdata
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/01/15 10:38:02 INFO sqoop.Sqoop: Running Sqoop version:
19/01/15 10:38:02 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/01/15 10:38:03 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

[root@master ~]# sqoop list-tables --connect jdbc:mysql://localhost/ambari --username root --password bigdata
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/01/15 10:39:31 INFO sqoop.Sqoop: Running Sqoop version:
19/01/15 10:39:31 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/01/15 10:39:31 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

stname VARCHAR(20) stID INT(1) class VARCHAR(20) opt_cour VARCHAR(20)
stname(string) stID(int) class(string) opt_cour(string)
[root@master ~]# mysql -uroot -pbigdata
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 195
Server version: 5.5.44-MariaDB MariaDB Server

Copyright © 2000, 2015, Oracle, MariaDB Corporation Ab and others.

Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.

MariaDB [(none)]> create database xiandian;
Query OK, 1 row affected (0.01 sec)

MariaDB [(none)]> use xiandian;
Database changed
MariaDB [xiandian]> create table xd_phy_course(stname varchar(20),stID int(1),class varchar(20),opt_cour varchar(20));
Query OK, 0 rows affected (0.03 sec)
hive> create table xd_phy_course3(stname string,stID int,class string,opt_cour string) row format delimited fields terminated by ‘\t’ lines terminated by ‘\n’;
Time taken: 2.773 seconds
hive> load data local inpath ‘/opt/txt/phy_course_xd.txt’ into table xd_phy_course3;
Loading data to table default.xd_phy_course3
Table default.xd_phy_course3 stats: [numFiles=1, numRows=0, totalSize=450, rawDataSize=0]
Time taken: 0.853 seconds

stname(string) stID(int) class(string) opt_cour(string)
[root@master ~]# hive

WARNING: Use “yarn jar” to launch YARNapplications.

Logging initialized using configuration in file:/etc/hive/

hive> create table xd_phy_course4 (stnamestring,stID int,class string,opt_cour string) row format delimited fieldsterminated by ‘\t’ lines terminated by ‘\n’;


Time taken: 2.329 seconds

[root@master ~]# sqoop import --connectjdbc:mysql://localhost:3306/xiandian --username root --password bigdata --tablexd_phy_course --hive-import --hive-overwrite --hive-table xd_phy_course4 -m 1–fields-terminated-by ‘\t’ --lines-terminated-by ‘\n’

642.在master节点安装Pig Clients,打开Linux Shell以MapReduce 模式启动它的Grunt,将启动命令和启动结果以文本形式提交到答题框中。
[root@master ~]# pig -x mapreduce
19/01/15 12:11:19 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
19/01/15 12:11:19 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
19/01/15 12:11:19 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2019-01-15 12:11:19,262 [main] INFO org.apache.pig.Main - Apache Pig version (rexported) compiled May 31 2017, 03:39:20
2019-01-15 12:11:19,262 [main] INFO org.apache.pig.Main - Logging error messages to: /root/pig_1547525479260.log
2019-01-15 12:11:19,293 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
2019-01-15 12:11:19,962 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master:8020
2019-01-15 12:11:20,771 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-2b845d6b-dda1-446f-8bb0-4c0f1fccfd40
2019-01-15 12:11:21,272 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://slaver:8188/ws/v1/timeline/
2019-01-15 12:11:21,414 [main] INFO org.apache.pig.backend.hadoop.PigATSClient - Created ATS Hook
643.在master节点安装Pig Clients,打开Linux Shell以Local 模式启动它的Grunt,将启动命令和启动结果以文本形式提交到答题框中。
[root@master ~]# pig -x local
19/01/15 12:10:29 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
19/01/15 12:10:29 INFO pig.ExecTypeProvider: Picked LOCAL as the ExecType
2019-01-15 12:10:29,354 [main] INFO org.apache.pig.Main - Apache Pig version (rexported) compiled May 31 2017, 03:39:20
2019-01-15 12:10:29,355 [main] INFO org.apache.pig.Main - Logging error messages to: /root/pig_1547525429353.log
2019-01-15 12:10:29,394 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
2019-01-15 12:10:29,784 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2019-01-15 12:10:30,188 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-310a210d-52ce-4823-8cab-8129fb97da01
2019-01-15 12:10:30,189 [main] WARN org.apache.pig.PigServer - ATS is disabled since yarn.timeline-service.enabled set to false

644.使用Pig工具在Local模式计算系统日志access_log.txt中的IP的点击数,要求使用GROUP BY语句按照IP进行分组,通过FOREACH 运算符,对关系的列进行迭代,统计每个分组的总行数,最后使用DUMP语句查询统计结果。将查询命令和查询结果以文本形式提交到答题框中。
[root@master ~]# pig -x local
19/01/15 12:10:29 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
19/01/15 12:10:29 INFO pig.ExecTypeProvider: Picked LOCAL as the ExecType
2019-01-15 12:10:29,354 [main] INFO org.apache.pig.Main - Apache Pig version (rexported) compiled May 31 2017, 03:39:20
2019-01-15 12:10:29,355 [main] INFO org.apache.pig.Main - Logging error messages to: /root/pig_1547525429353.log
2019-01-15 12:10:29,394 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
2019-01-15 12:10:29,784 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2019-01-15 12:10:30,188 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-310a210d-52ce-4823-8cab-8129fb97da01
2019-01-15 12:10:30,189 [main] WARN org.apache.pig.PigServer - ATS is disabled since yarn.timeline-service.enabled set to false
grunt> copyFromLocal /opt/txt/access.txt /user/root/input/log1.txt
grunt> A = LOAD ‘/user/root/input/log1.txt’ USING PigStorage (’ ') as (ip,others);
grunt> group_ip = group A by ip;
grunt> result = foreach group_ip generate group,COUNT(A);
grunt> dump result;

645.使用Pig工具计算天气数据集temperature.txt中年度最高气温,要求使用GROUP BY语句按照year进行分组,通过FOREACH 运算符,对关系的列进行迭代,统计每个分组的最大值,最后使用DUMP语句查询计算结果。将以上查询命令和查询结果以文本形式提交到答题框中。
grunt> copyFromLocal /opt/txt/temperature.txt /user/root/temprature.txt
grunt> A = LOAD ‘/user/root/temprature.txt’ USING PigStorage(’ ') AS (year:int,temperature:int);
grunt> B = GROUP A BY year;
grunt> C = FOREACH B GENERATE group,MAX(A.temperature);
grunt> dump C;
Vertex Stats:
VertexId Parallelism TotalTasks InputRecords ReduceInputRecords OutputRecords FileBytesRead FileBytesWritten HdfsBytesRead HdfsBytesWritten Alias Feature Outputs
scope-20 1 1 357 0 357 32 61 2852 0 A,B,C
scope-21 1 1 0 1 1 61 0 0 6 C GROUP_BY hdfs://master:8020/tmp/temp1707840247/tmp-1432654154,

Successfully read 357 records (2852 bytes) from: “/user/root/temprature.txt”

Successfully stored 1 records (6 bytes) in: “hdfs://master:8020/tmp/temp1707840247/tmp-1432654154”

2019-01-15 18:10:37,781 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2019-01-15 18:10:37,781 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1

646.使用Pig工具统计数据集ip_to_country中每个国家的IP地址数。要求使用GROUP BY语句按照国家进行分组,通过FOREACH 运算符,对关系的列进行迭代,统计每个分组的IP地址数目,最后将统计结果保存到/data/pig/output目录中,并查看数据结果。将以上操作命令和查询结果以文本形式提交到答题框中。
grunt> copyFromLocal /opt/txt/ip_to_country.txt /user/root/ip_to_country.txt
grunt> ip_countries = LOAD ‘/user/root/ip_to_country.txt’ AS (ip:chararray,country:chararray);
grunt> country_grpd = GROUP ip_countries BY country;
grunt> country_counts = FOREACH country_grpd GENERATE FLATTEN(group),COUNT(ip_countries) as counts;
grunt> STORE country_counts INTO ‘/data/pig/output’;
Vertex Stats:
VertexId Parallelism TotalTasks InputRecords ReduceInputRecords OutputRecords FileBytesRead FileBytesWritten HdfsBytesRead HdfsBytesWritten Alias Feature Outputs
scope-19 1 1 248284 0 248284 32 1935 3922915 0 country_counts,country_grpd,ip_countries
scope-20 1 1 0 246 246 1935 0 0 1618 country_counts GROUP_BY /data/pig/output,

Successfully read 248284 records (3922915 bytes) from: “/user/root/ip_to_country.txt”

Successfully stored 246 records (1618 bytes) in: “/data/pig/output”

647.在master节点安装Mahout Client,打开Linux Shell运行mahout命令查看Mahout自带的案例程序,将查询结果以文本形式提交到答题框中。
[root@master ~]# mahout
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /usr/hdp/ and HADOOP_CONF_DIR=/usr/hdp/
MAHOUT-JOB: /usr/hdp/
An example program must be given as the first argument.
Valid program names are:
arff.vector: : Generate Vectors from an ARFF file or directory
baumwelch: : Baum-Welch algorithm for unsupervised HMM training
buildforest: : Build the random forest classifier
canopy: : Canopy clustering
cat: : Print a file or resource as the logistic regression models would see it
cleansvd: : Cleanup and verification of SVD output
clusterdump: : Dump cluster output to text
clusterpp: : Groups Clustering Output In Clusters
cmdump: : Dump confusion matrix in HTML or text formats
concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix

[root@master ~]# mkdir 20news
[root@master ~]# tar -xzf 20news-bydate.tar.gz -C20news
[root@master ~]# hadoop fs -mkdir -p/data/mahout/20news/20news-all
[root@master ~]# hadoop fs -put 20news/*/data/mahout/20news/20news-all
[root@master ~]# mahout seqdirectory -i /data/mahout/20news/20news-all -o /data/mahout/20news/output/20news-seq
/01/16 04:36:03 INFO mapreduce.Job: map 100% reduce 0%
19/01/16 04:36:04 INFO mapreduce.Job: Job job_1547338155253_0016 completed successfully
19/01/16 04:36:05 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=151642
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=37878493
HDFS: Number of bytes written=13631587
HDFS: Number of read operations=75388
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=93308
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=93308
Total vcore-milliseconds taken by all map tasks=93308
Total megabyte-milliseconds taken by all map tasks=63636056
Map-Reduce Framework
Map input records=18846
Map output records=18846
Input split bytes=2023490
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=745
CPU time spent (ms)=71930
Physical memory (bytes) snapshot=226930688
Virtual memory (bytes) snapshot=2505920512
Total committed heap usage (bytes)=108003328
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=13631587
19/01/16 04:36:05 INFO driver.MahoutDriver: Program took 134759 ms (Minutes: 2.245983333333333)

[root@master ~]# mkdir 20news
[root@master ~]# tar -xzf 20news-bydate.tar.gz -C20news
[root@master ~]# hadoop fs -mkdir -p /data/mahout/20news/20news-all
[root@master ~]# hadoop fs -put 20news/*/data/mahout/20news/20news-all
[root@master ~]# mahout seqdirectory -i /data/mahout/20news/20news-all -o /data/mahout/20news/output/20news-seq
[root@master ~]# hadoop fs -text /data/mahout/20news/output/20news-seq/part-m-00000 | head -n 20
19/01/16 04:39:47 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
19/01/16 04:39:47 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
19/01/16 04:39:47 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
19/01/16 04:39:47 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
19/01/16 04:39:47 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
/20news-bydate-test/alt.atheism/53068 From: decay@cbnewsj.cb.att.com (dean.kaflowitz)
Subject: Re: about the bible quiz answers
Organization: AT&T
Distribution: na
Lines: 18

In article healta.153.735242337@saturn.wwc.edu, healta@saturn.wwc.edu (Tammy R Healy) writes:

.> #12) The 2 cheribums are on the Ark of the Covenant. When God said make no
.> graven image, he was refering to idols, which were created to be worshipped.
.> The Ark of the Covenant wasn’t wrodhipped and only the high priest could
.> enter the Holy of Holies where it was kept once a year, on the Day of
.> Atonement.

I am not familiar with, or knowledgeable about the original language,
but I believe there is a word for “idol” and that the translator
would have used the word “idol” instead of “graven image” had
the original said “idol.” So I think you’re wrong here, but
then again I could be too. I just suggesting a way to determine
text: Unable to write to output stream.

荐,要求采用基于项目的协同过滤算法,欧几里得距离公式定义,并且每位用户的推荐个数为3,设置非布尔数据,最大偏好值为4,最小偏好值为1,将推荐输出结果保存到output目录中,通过-cat命令查询输出结果part-r-00000中的内容 。将以上执行推荐算法的命令和查询结果以文本形式提交到答题框中。
[hdfs@master ~]$ hadoop fs -mkdir -p/data/mahout/project

[hdfs@master ~]$ hadoop fs -put user-item-score.txt/data/mahout/project

[hdfs@master ~]$ mahout recommenditembased -i/data/mahout/project/ user-item-score.txt -o /data/mahout/project/output -n 3-b false -s SIMILARITY_EUCLIDEAN_DISTANCE --maxPrefsPerUser 4 --minPrefsPerUser1 --maxPrefsInItemSimilarity 4 --tempDir /data/mahout/project/temp

651.在master节点安装启动Flume组件,打开Linux Shell运行flume-ng的帮助命令,查看Flume-ng的用法信息,将查询结果以文本形式提交到答题框中。
[root@master ~]# flume-ng help

652.根据提供的模板log-example.conf文件,使用Flume NG工具收集master节点的系统日志/var/log/secure,将收集的日志信息文件的名称以“xiandian-sec”为前缀,存放于HDFS文件系统的/1daoyun/file/flume目录中,并且定义在HDFS中产生的文件的时间戳为10分钟。进行收集后,查询HDFS文件系统中/1daoyun/file/flume的列表信息。将以上操作命令和结果信息以及修改后的log-example.conf文件内容提交到答题框中。
[root@master ~]# hadoop fs -ls /1daoyun/file/flume

Found 1 items

-rw-r–r-- 3root hdfs 1142 2017-05-08 10:29 /1daoyun/file/flume/xiandian-sec.1494239316323

[root@master ~]# cat log-example.conf

653.根据提供的模板hdfs-example.conf文件,使用Flume NG工具设置master节点的系统路径/opt/xiandian/为实时上传文件至HDFS文件系统的实时路径,设置HDFS文件系统的存储路径为/data/flume/,上传后的文件名保持不变,文件类型为DataStream,然后启动flume-ng agent。将以上操作命令和以及修改后的hdfs-example.conf文件内容提交到答题框中。
[root@master ~]# flume-ng agent --conf-filehdfs-example.conf --name master
[root@master ~]# cat hdfs-example.conf

654.在先电大数据平台部署Spark服务组件,打开Linux Shell启动spark-shell终端,将启动的程序进程信息以文本形式提交到答题框中。
[root@master ~]# spark-shell
Multiple versions of Spark are installed but SPARK_MAJOR_VERSION is not set
Spark1 will be picked by default
19/01/15 21:04:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
19/01/15 21:04:17 INFO spark.SecurityManager: Changing view acls to: root
19/01/15 21:04:17 INFO spark.SecurityManager: Changing modify acls to: root
19/01/15 21:04:17 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
19/01/15 21:04:18 INFO spark.HttpServer: Starting HTTP Server
19/01/15 21:04:18 INFO server.Server: jetty-8.y.z-SNAPSHOT
19/01/15 21:04:18 INFO server.AbstractConnector: Started SocketConnector@
19/01/15 21:04:18 INFO util.Utils: Successfully started service ‘HTTP class server’ on port 42191.
Welcome to
____ __
/ / ___ / /
\ / _ / _ `/ __/ '/
/ .__/_,// /_/_\ version 1.6.3

655.启动spark-shell后,在scala中加载数据“1,2,3,4,5,6,7,8,9,10”,求这些数据的2倍乘积能够被3整除的数字,并通过toDebugString 方法来查看RDD的谱系。将以上操作命令和结果信息以文本形式提交到答题框中。

scala> val number=sc.parallelize(1 to 10)
number: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at :27
scala> val doublenum=number.map(*2)
doublenum: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[1] at map at :29
scala> val threenum=doublenum.filter(
threenum: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[2] at filter at :31
scala> threenum.collect
19/01/15 21:16:38 INFO spark.SparkContext: Starting job: collect at :34
19/01/15 21:16:38 INFO scheduler.DAGScheduler: Got job 0 (collect at :34) with 4 output partitions
19/01/15 21:16:38 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (collect at :34)
res0: Array[Int] = Array(6, 12, 18)
scala> threenum.toDebugString
res5: String =
(4) MapPartitionsRDD[2] at filter at :31 []
| MapPartitionsRDD[1] at map at :29 []
| ParallelCollectionRDD[0] at parallelize at :27 []

656.启动spark-shell后,在scala中加载Key-Value数据“(“A”,1),(“B”,2),(“C”,3),(“A”,4), (“B”,5), (“C”,4), (“A”,3), (“A”,9), (“B”,4), (“D”,5)”,将这些数据以Key为基准进行升序排序,并以Key为基准进行分组。将以上操作命令和结果信息以文本形式提交到答题框中。
scala> val kv1=sc.parallelize(List((“A”,1),(“B”,2),(“C”,3),(“A”,4),(“B”,5),(“C”,4),(“A”,3),(“A”,9),(“B”,4),(“D”,5)))
kv1: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[3] at parallelize at :27
scala> kv1.sortByKey().collect
19/01/15 22:33:03 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool
19/01/15 22:33:03 INFO scheduler.DAGScheduler: ResultStage 3 (collect at :30) finished in 0.140 s
19/01/15 22:33:03 INFO scheduler.DAGScheduler: Job 2 finished: collect at :30, took 0.347574 s
res6: Array[(String, Int)] = Array((A,1), (A,4), (A,3), (A,9), (B,2), (B,5), (B,4), (C,3), (C,4), (D,5))
scala> kv1.groupByKey().collect
19/01/15 22:34:33 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool
19/01/15 22:34:33 INFO scheduler.DAGScheduler: ResultStage 5 (collect at :30) finished in 1.701 s
19/01/15 22:34:33 INFO scheduler.DAGScheduler: Job 3 finished: collect at :30, took 1.873657 s
res7: Array[(String, Iterable[Int])] = Array((D,CompactBuffer(5)), (A,CompactBuffer(1, 4, 3, 9)), (B,CompactBuffer(2, 5, 4)), (C,CompactBuffer(3, 4)))

657.启动spark-shell后,在scala中加载Key-Value数据“(“A”,1),(“B”,3),(“C”,5),(“D”,4), (“B”,7), (“C”,4), (“E”,5), (“A”,8), (“B”,4), (“D”,5)”,将这些数据以Key为基准进行升序排序,并对相同的Key进行Value求和计算。将以上操作命令和结果信息以文本形式提交到答题框中。
scala> val kv2=sc.parallelize(List((“A”,1),(“B”,3),(“C”,5),(“D”,4), (“B”,7), (“C”,4), (“E”,5), (“A”,8), (“B”,4), (“D”,5)))
kv2: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[8] at parallelize at :27
scala> kv2.sortByKey().collect
19/01/15 22:40:15 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 8.0, whose tasks have all completed, from pool
19/01/15 22:40:15 INFO scheduler.DAGScheduler: ResultStage 8 (collect at :30) finished in 0.040 s
19/01/15 22:40:15 INFO scheduler.DAGScheduler: Job 5 finished: collect at :30, took 0.234400 s
res8: Array[(String, Int)] = Array((A,1), (A,8), (B,3), (B,7), (B,4), (C,5), (C,4), (D,4), (D,5), (E,5))
scala> kv2.reduceByKey(+).collect
19/01/15 22:41:32 INFO scheduler.DAGScheduler: ResultStage 10 (collect at :30) finished in 0.020 s
19/01/15 22:41:32 INFO scheduler.DAGScheduler: Job 6 finished: collect at :30, took 0.091262 s
res9: Array[(String, Int)] = Array((D,9), (A,9), (E,5), (B,14), (C,9))

658.启动spark-shell后,在scala中加载Key-Value数据“(“A”,4),(“A”,2),(“C”,3),(“A”,4),(“B”,5),(“C”,3),(“A”,4),以Key为基准进行去重操作,并通过toDebugString 方法来查看RDD的谱系。将以上操作命令和结果信息以文本形式提交到答题框中。
scala> val kv1=sc.parallelize(List((“A”,4),(“A”,2),(“C”,3),(“A”,4),(“B”,5),(“C”,3),(“A”,4)))
kv1: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0] at parallelize at :27

scala> kv1.distinct.collect
19/01/16 05:02:08 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
19/01/16 05:02:08 INFO scheduler.DAGScheduler: ResultStage 1 (collect at :30) finished in 0.072 s
19/01/16 05:02:08 INFO scheduler.DAGScheduler: Job 0 finished: collect at :30, took 0.647297 s
res0: Array[(String, Int)] = Array((A,4), (B,5), (A,2), (C,3))
scala> kv1.toDebugString
res1: String = (4) ParallelCollectionRDD[0] at parallelize at :27 []

scala> val kv5=sc.parallelize(List((“A”,1),(“B”,2),(“C”,3),(“A”,4),(“B”,5)))
kv5: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[4] at parallelize at :27
scala> val kv6=sc.parallelize(List((“A”,1),(“B”,2),(“C”,3),(“A”,4),(“B”,5)))
kv6: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[5] at parallelize at :27
scala> kv5.join(kv6).collect
19/01/16 05:08:02 INFO scheduler.DAGScheduler: ResultStage 4 (collect at :32) finished in 0.070 s
19/01/16 05:08:02 INFO scheduler.DAGScheduler: Job 1 finished: collect at :32, took 0.173849 s
res2: Array[(String, (Int, Int))] = Array((A,(1,1)), (A,(1,4)), (A,(4,1)), (A,(4,4)), (B,(2,2)), (B,(2,5)), (B,(5,2)), (B,(5,5)), (C,(3,3)))

scala> var rdd4 = sc.textFile(“hdfs://”)
19/01/16 05:13:38 INFO storage.MemoryStore: Block broadcast_5 stored as values in memory (estimated size 90.8 KB, free 511.0 MB)
19/01/16 05:13:38 INFO storage.MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 29.9 KB, free 511.0 MB)
19/01/16 05:13:38 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on localhost:60121 (size: 29.9 KB, free: 511.1 MB)
19/01/16 05:13:38 INFO spark.SparkContext: Created broadcast 5 from textFile at :27
rdd4: org.apache.spark.rdd.RDD[String] = hdfs:// MapPartitionsRDD[10] at textFile at :27
scala> rdd4.toDebugString
scala> val words=rdd4.flatMap(.split(" "))
words: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[15] at flatMap at :29
scala> val wordscount=words.map(word => (word,1)).reduceByKey(
scala> wordscount.collec
scala> wordscount.toDebugString

访问时间 用户ID 查询词 该URL在返回结果中的排名 用户点击的顺序号 用户点击的URL
scala> val ardd = sc.textFile("/data/search.txt")
19/01/16 05:30:11 INFO storage.MemoryStore: Block broadcast_9 stored as values in memory (estimated size 349.6 KB, free 509.6 MB)
19/01/16 05:30:11 INFO storage.MemoryStore: Block broadcast_9_piece0 stored as bytes in memory (estimated size 29.9 KB, free 509.5 MB)
19/01/16 05:30:11 INFO storage.BlockManagerInfo: Added broadcast_9_piece0 in memory on localhost:60121 (size: 29.9 KB, free: 511.0 MB)
19/01/16 05:30:11 INFO spark.SparkContext: Created broadcast 9 from textFile at :27
ardd: org.apache.spark.rdd.RDD[String] = /data/search.txt MapPartitionsRDD[22] at textFile at :27
scala> val mapardd = ardd.map((.split(’\t’))).filter(.length >= 6 )
mapardd: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[24] at filter at :29
scala> val filterardd = mapardd.filter((3).toString != “2”).filter((4).toString != “1”)
filterardd: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[26] at filter at :31
scala> filterardd.count


  1. 【300+精选大厂面试题持续分享】大数据运维尖刀面试题专栏(二)

    持续分享有用.有价值.精选的优质大数据面试题 致力于打造全网最全的大数据面试专题题库 11.CDH 集群扩容 10 台机器后,新加入的 Datanode 角色数据相对较少,如何处理 HDFS 的数据分 ...

  2. 【300+精选大厂面试题持续分享】大数据运维尖刀面试题专栏(四)

    持续分享有用.有价值.精选的优质大数据面试题 致力于打造全网最全的大数据面试专题题库 31.如果发现现有集群出现数据倾斜,生产环境中 HBase 出现数据倾斜了该如何解决呢?出现数据倾斜的原因到底是什 ...

  3. 大数据运维工作(Linux,OGG,链路监控,Hadoop运维等)

    大数据运维工程师工作内容 Linux运维手册 1. 启动/关闭集群组件 1.1 负载均衡 1)Nginx 运维命令 Copy to clipboard cd /usr/nginx/sbin #进入 s ...

  4. 大数据运维实战第一课 大话 Hadoop 生态圈

    你好,欢迎来到<大数据运维实战>专栏. 入行以来,我从事大数据运维也有十多年了,期间我做过系统运维.DBA,也做过大数据分析师,最后选择了大数据运维方向,曾设计并管理超过千台.PB 级的数 ...

  5. python大数据运维工程师待遇_大数据开发、运维、数据分析分别是干什么的?哪个薪资最高?...

    玩转大数据首先要明确自己将要学习的方向,没有人能一下子吃透大数据里面所有的东西. 在大数据的世界里面主要有三个学习方向,大数据开发师.大数据运维师.大数据架构师. 哪个好?我不知道你所说的哪个好?指的 ...

  6. 纠结做大数据开发?大数据运维?还是大数据分析?

    经常有同学在后台留言问我,自己觉得大数据行业薪资很高,想往大数据方向发展,但不知道该学哪些知识,应该具备的技能树是啥样的. 迷茫和焦虑都要溢出屏幕了--如果觉得薪资高就业好,想往大数据方向发展,也不是 ...

  7. 2019年超新超全的大数据运维技能图谱

    运维是一个融合多学科(网络.系统.开发.安全.应用架构.存储等)的综合性技术岗位,从最初的网络管理(网管)发展到现在的系统运维工程师.网络运维工程师.安全运维工程师.运维开发工程师等,可以看出,运维的 ...

  8. 阿里巴巴云原生大数据运维平台 SREWorks 正式开源

    简介:阿里巴巴云原生大数据运维平台 SREWorks,沉淀了团队近10年经过内部业务锤炼的 SRE 工程实践,今天正式对外开源,秉承"数据化.智能化"运维思想,帮助运维行业更多的从 ...

  9. python大数据运维工程师待遇_什么是大数据运维工程师

    一.运维三板斧 三板斧可以解决90%以上的故障处理工作.1>.重启 重启有问题的机器或经常,使其正常工作.2>.切换 主备切换或主主切换,链接正常工作的节点.3>.查杀 查杀有问题的 ...


  1. HarmonyOS ScrollView 使用
  2. Android Studio3.x上使用Lombok
  3. win 2008 控制共享文件夹大小_win10如何一键网络共享
  4. 软件架构设计学习总结(14):大型网站技术架构(八)网站的安全架构
  5. php微信分享接口调用,TP5的微信分享接口和JSSDK使用
  6. neo4j 查询同一节点的两个上级_WhatRoute for Mac(互联网流量诊断查询工具)
  7. JAR——pinyin4j-2.5.0
  8. linux 修改ramdisk内容,在Linux下使用RamDisk
  9. RequestResponse
  10. Tomcat修改默认端口号
  11. MPI + OpenMP实现快速排序
  12. vue3子组件修改传值给父组件报警告warn:Component emitted event “update:dialogAddVisible”
  13. Process finished with exit code 1Class not found:
  14. 深度学习之DCN-v2
  15. Ridge和Lasso回归代码实现--Tensorflow部分
  16. 如何利用边缘计算,实现低延时、高质量的互动课堂体验?
  17. 千寻和省cors精度对比_千寻位置的高精度定位服务与GPS定位有什么不同?
  18. java 调用天气预报接口_java调用天气预报接口案例
  19. 有限元分析及运用课程笔记第二章:基于直接刚度法的杆系有限元方法
  20. !important和@important


  1. MultiThread SkinnedMeshRenderer原理及实现
  2. android动态图制作,Android 教程:如何在手机上制作高质量的 GIF 图片
  3. BoomBoomBoom
  4. vue3+ts+el-tabs+keep-alive
  5. SpringBoot部署应用到本地k8s
  6. Altera FPGA 配置方式概述
  7. 1985-2020年全球30米地表覆盖精细分类产品V1.0免费下载,内附链接
  8. 有参函数和无参函数的区别
  9. 大数据新闻推送你怎么看_大数据分析表明:新闻越“假”转发越多!你知道为什么吗?...
  10. 技巧心得:网络工程师考试大纲