文章目录

  • 1. OpenLDAP + Open-Source Ranger Solution Overview
    • 1.1 Solution Architecture
    • 1.2 Ranger in Detail
  • 2. Installation & Integration
    • 2.1 Prerequisites
      • 2.1.1 Create EC2 Instances as Ranger and OpenLDAP Server
      • 2.1.2 Download Installer
      • 2.1.3 Upload SSH Key File
      • 2.1.4 Export Environment-Specific Variables
    • 2.2 All-In-One Installation
      • 2.2.1 Quick Start
      • 2.2.2 Customization
    • 2.3 Step-By-Step Installation
      • 2.3.1 Init EC2
      • 2.3.2 Install OpenLDAP
      • 2.3.3 Install Ranger
      • 2.3.4 Create EMR Cluster
      • 2.3.5 Install Ranger Plugins
      • 2.3.6 Install SSSD
      • 2.3.7 Configure Hue
      • 2.3.8 Create Example Users
  • 3. Verification
    • 3.1 HDFS Access Control Verification
    • 3.2 Hive Access Control Verification
  • 4. Appendix

In previous 2 articles, we introduced emr-native ranger integration solution with OpenLDAP and Window AD, from this article, we turn to introduce open-source ranger integration. This article will discuss “OpenLDAP + Open-Source Ranger”. This article address is https://laurence.blog.csdn.net/article/details/128799548, for reprint please indicate the source.

1. OpenLDAP + Open-Source Ranger Solution Overview

1.1 Solution Architecture

In this solution, OpenLDAP plays authentication provider, all user accounts data store on it, Ranger plays authorization controller, it will sync accounts data from OpenLDAP so as to grant privileges against user accounts from OpenLDAP, meanwhile, emr cluster need install a series of ranger plugins, these plugins will check with ranger server to assure if current user has permission to perform an action. And emr cluster will also sync accounts data from OpenLDAP via SSSD so as a user can login nodes of emr cluster and submit jobs. As end users, they can SSH login nodes of emr cluster with her/his OpenLDAP account, and if Hue is available, they can also login Hue with this account.

1.2 Ranger in Detail

Let’s deep dive into ranger for more details, its architecture looks as following:

The installer will finish following jobs:

① Install MySQL as Policy DB for Ranger;
② Install Solr as Audit Store for Ranger;
③ Install Ranger Admin;
④ Install Ranger UserSync;
⑤ Install HDFS Ranger Plugin;
⑥ Install Hive Ranger Plugin;

2. Installation & Integration

Generally, the installation & integration process can be divided into 3 stages: 1. Prerequisites -> 2. All-In-One Install -> 3. Create EMR Cluster, the following diagram illustrates the progress in detail:

At stage 1, we need do some preparatory works; At stage 2, we start to install and integrate, here are 2 options at this stage: one is all-in-one installation driven by a command-line based workflow, the other is step-by-step installation. For most cases, all-in-one installation is always the best choice, however, sometimes, your installation workflow may be interrupted by unforeseen errors, if you want to continue installing from last failed step, please try step-by-step installation. Or sometimes, you want to re-try a step with different argument values to find the right one, step-by-step is also better choice; At stage 3, we need create an emr cluster. If you already have one, skip this job. In most cases, we need install ranger on an existing cluster not a new cluster, for emr-native ranger, it is impossible to install on an existing cluster (because emr-native ranger plugins can only be installed when creating cluster), but open-source ranger does NOT have this problem, you can be free to install on an existing or new emr cluster.

There is a little bit overlapping on execution sequence between stage 2 and 3. At step 2.4, the installation progress will be pending, the installer will indicate users to create their own cluster and keep monitoring target cluster’s status, once the cluster is ready, the progress will resume and continue to perform rest actions.

As a design principle, the installer does NOT include any actions to create an emr cluster, you should always create your cluster by yourself, because an emr cluster in practice could have any unpredictable settings, i.e., application-specific (hdfs, yarn, etc.) configuration, step scripts, bootstrap scripts and so on, it is unadvised to couple ranger’s installation with emr cluster’s creation.

Notes:

  1. The installer will treat local host as ranger server to install everything of Ranger, for non-ranger operations, i.e., installing OpenLDAP, it will initiate remote operations via SSH. So, you can just stay on ranger server to execute command lines, no need to switch among multiple hosts.

  2. Although it is not required, we suggest you always use FQDN as host address, Both IP and hostname without domain name are not recommended.

2.1 Prerequisites

2.1.1 Create EC2 Instances as Ranger and OpenLDAP Server

First, we need prepare 2 EC2 instances, one as the server of Ranger, the other as the server of OpenLDAP. When creating instances, please select Amazon Linux 2 image and guarantee network connections among instances and the cluster to be created are reachable.

As a best practice, it’s recommended to add ranger server into ElasticMapReduce-master security group, because Ranger is very close to emr cluster, it can be regarded as a non-emr-build-in master service. For OpenLDAP, we have to make sure its ports 389 is reachable from ranger and all nodes of emr cluster to be created, or to be simple, you also add OpenLDAP into ElasticMapReduce-master security group.

2.1.2 Download Installer

After EC2 instances are ready, pick the ranger server, login via ssh, run following commands to download installer package:

sudo yum -y install git
git clone https://github.com/bluishglc/ranger-emr-cli-installer.git

2.1.3 Upload SSH Key File

As mentioned before, the installer is based on local host (ranger server), to perform remote installing actions on OpenLDAP or emr cluster, SSH private key is required, so we should upload it to ranger server, and make a note of the file path, it will be the value of variable SSH_KEY.

2.1.4 Export Environment-Specific Variables

During installing, following environment-specific arguments will be passed more than once, it’s recommended to export them first, then all command lines just refer these variables instead of literals.

export REGION='TO_BE_REPLACED'
export ACCESS_KEY_ID='TO_BE_REPLACED'
export SECRET_ACCESS_KEY='TO_BE_REPLACED'
export SSH_KEY='TO_BE_REPLACED'
export OPENLDAP_HOST='TO_BE_REPLACED'

The following is comments of above variables:

  • REGION: Aws Region, i.e., cn-north-1, us-east-1 and so on.
  • ACCESS_KEY_ID: Aws access key id of your IAM account. Be sure your account has enough privileges, it’s better having admin permissions.
  • SECRET_ACCESS_KEY: Aws secret access key of your IAM account.
  • SSH_KEY: Ssh private key file path on local host you just uploaded
  • OPENLDAP_HOST: FQDN of OpenLDAP server

Please carefully replace above variables’ value according to your environment, and remember to use FQDN as hostname, i.e., OPENLDAP_HOST. The following is a copy of example:

export REGION='cn-north-1'
export ACCESS_KEY_ID='<change-to-your-aws-access-key-id>'
export SECRET_ACCESS_KEY='<change-to-your-aws-secret-access-key>'
export SSH_KEY='/home/ec2-user/key.pem'
export OPENLDAP_HOST='ip-10-0-14-0.cn-north-1.compute.internal'

2.2 All-In-One Installation

2.2.1 Quick Start

Now, let’s start an all-in-one installation, execute this command line:

sudo sh ./ranger-emr-cli-installer/bin/setup.sh install \--region "$REGION" \--access-key-id "$ACCESS_KEY_ID" \--secret-access-key "$SECRET_ACCESS_KEY" \--ssh-key "$SSH_KEY" \--solution 'open-source' \--auth-provider 'openldap' \--openldap-host "$OPENLDAP_HOST" \--openldap-base-dn 'dc=example,dc=com' \--openldap-root-cn 'admin' \--openldap-root-password 'Admin1234!' \--openldap-user-dn-pattern 'uid={0},ou=users,dc=example,dc=com' \--openldap-group-search-filter '(member=uid={0},ou=users,dc=example,dc=com)' \--openldap-user-object-class 'inetOrgPerson' \--example-users 'example-user-1,example-user-2' \--ranger-plugins 'open-source-hdfs,open-source-hive'

For parameters specification of above command line, please refer to appendix. If everything goes well, the command line will execute step from 2.1 to 2.3 in workflow diagram, this may spend 10 minutes or more depending on the bandwidth of your network, then it will suspend and indicate user to enter emr cluster id. If target cluster is existing, we can fill its id immediately, if not, we should switch to emr web console to create it. then, the command line asks users to confirm if let Hue integrate with LDAP or not. if yes, when cluster ready, the installer will update emr configuration with Hue specific settings (this action will overwrite emr existing configuration).

Fill above 2 items, enter “y” to confirm all inputs, the installation process will resume and if target emr cluster is not ready yet, the command line will keep monitoring until it goes into “WAITING” status. The following is a snapshot for this moment of the command line:

When cluster is ready (status is “WAITING”), the command line will continue to execute from steps 2.5 to 2.8 of workflow, and finally end with an “ALL DONE!!” message.

2.2.2 Customization

Now, all-in-one installation is done, next, we introduce more about customization. Generally, this installer follows the principle of “Convention over Configuration”, most parameters are preset by default values, an equivalent version with full parameter list of above command line is as following:

sudo sh ./ranger-emr-cli-installer/bin/setup.sh install \--region "$REGION" \--access-key-id "$ACCESS_KEY_ID" \--secret-access-key "$SECRET_ACCESS_KEY" \--ssh-key "$SSH_KEY" \--solution 'open-source' \--auth-provider 'openldap' \--openldap-host "$OPENLDAP_HOST" \--openldap-base-dn 'dc=example,dc=com' \--openldap-root-cn 'admin' \--openldap-root-password 'Admin1234!' \--openldap-user-dn-pattern 'uid={0},ou=users,dc=example,dc=com' \--openldap-group-search-filter '(member=uid={0},ou=users,dc=example,dc=com)' \--openldap-user-object-class 'inetOrgPerson' \--example-users 'example-user-1,example-user-2' \--ranger-plugins 'open-source-hdfs,open-source-hive' \--java-home '/usr/lib/jvm/java' \--skip-install-mysql 'false' \--skip-install-solr 'false' \--skip-install-openldap 'false' \--skip-configure-hue 'false' \--ranger-host $(hostname -f) \--ranger-version '2.1.0' \--mysql-host $(hostname -f) \--mysql-root-password 'Admin1234!' \--mysql-ranger-db-user-password 'Admin1234!' \--solr-host $(hostname -f) \--ranger-bind-dn 'cn=ranger,ou=services,dc=example,dc=com' \--ranger-bind-password 'Admin1234!' \--hue-bind-dn 'cn=hue,ou=services,dc=example,dc=com' \--hue-bind-password 'Admin1234!' \--sssd-bind-dn 'cn=sssd,ou=services,dc=example,dc=com' \--sssd-bind-password 'Admin1234!' \--restart-interval 30

The full-parameters version gives us a complete perspective of all custom options. In following scenarios, you may change some options’ value:

  1. If you want to change default organization name dc=example,dc=com or default password Admin1234!, please run full-parameters version, and replace them with your own values.

  2. If you need integrate with external facilities, i.e., a centralized OpenLDAP or an existing MySQL, Solr, please add corresponding --skip-xxx-xxx options and set it true.

  3. If you have other pre-defined bind dn for hue, ranger and sssd, please add corresponding --xxx-bind-dn and --xxx-bind-password options to set them. Note that the bind dn for hue, ranger and sssd will be created automatically when installing OpenLDAP, but they are FIXED with naming pattern cn=hue|ranger|sssd,ou=services,<your-base-dn> not the given value of “–xxx-bind-dn” option, so if you assign other dn with “–xxx-bind-dn” option, you MUST create this dn by yourself in advance. The reason this install does NOT create the dn assigned by “–xxx-bind-dn” option is that a dn acutally is a tree path, to create it, we must create all nodes in the path, it is not cost-effective to implement such small but complicated function.

2.3 Step-By-Step Installation

As an alternative, you can also select step-by-step installation instead of all-in-one installation. we give the command line of each step, as for comments for each parameter, please refer to appendix.

2.3.1 Init EC2

This step will finish some fundamental jobs, i.e., install aws cli, jdk, and so on.

sudo sh ./ranger-emr-cli-installer/bin/setup.sh init-ec2 \--region "$REGION" \--access-key-id "$ACCESS_KEY_ID" \--secret-access-key "$SECRET_ACCESS_KEY"

2.3.2 Install OpenLDAP

This step will install OpenLDAP on given OpenLDAP host, as mentioned above, although this action is performed on OpenLDAP server, you DON’T need to login OpenLDAP server, just run the command line on local host (the ranger server).

sudo sh ./ranger-emr-cli-installer/bin/setup.sh install-openldap \--region "$REGION" \--access-key-id "$ACCESS_KEY_ID" \--secret-access-key "$SECRET_ACCESS_KEY" \--ssh-key "$SSH_KEY" \--solution 'open-source' \--auth-provider 'openldap' \--openldap-host "$OPENLDAP_HOST" \--openldap-base-dn 'dc=example,dc=com' \--openldap-root-cn 'admin' \--openldap-root-password 'Admin1234!'

2.3.3 Install Ranger

This step will install all server-side components of Ranger, including MySQL, Solr, Ranger Admin and Ranger UserSync.

sudo sh ./ranger-emr-cli-installer/bin/setup.sh install-ranger \--region "$REGION" \--access-key-id "$ACCESS_KEY_ID" \--secret-access-key "$SECRET_ACCESS_KEY" \--solution 'open-source' \--auth-provider 'openldap' \--openldap-host "$OPENLDAP_HOST" \--openldap-base-dn 'dc=example,dc=com' \--ranger-bind-dn 'cn=ranger,ou=services,dc=example,dc=com' \--ranger-bind-password 'Admin1234!' \--openldap-user-dn-pattern 'uid={0},ou=users,dc=example,dc=com' \--openldap-group-search-filter '(member=uid={0},ou=users,dc=example,dc=com)' \--openldap-user-object-class 'inetOrgPerson'

2.3.4 Create EMR Cluster

For step-by-step installation, there is no interactive process for creating emr cluster, so just feel free to create cluster on emr web console. but we have to wait for the cluster is completely ready (in “WAITING” status), then export following environment-specific variables:

export EMR_CLUSTER_ID='TO_BE_REPLACED'

The following is a copy of example:

export EMR_CLUSTER_ID='j-2S04VJZ5YQHZ4'

2.3.5 Install Ranger Plugins

This step will install hdfs and hive plugins on ranger server side and agent side (EMR nodes). This is different from emr-native ranger solution, for emr-native ranger, EMR will install agent sides on each node automatically, for open-source ranger, we have to do this job by ourselves via this installer.

sudo sh ./ranger-emr-cli-installer/bin/setup.sh install-ranger-plugins \--region "$REGION" \--ssh-key "$SSH_KEY" \--solution 'open-source' \--auth-provider 'openldap' \--ranger-plugins 'open-source-hdfs,open-source-hive' \--emr-cluster-id "$EMR_CLUSTER_ID"

2.3.6 Install SSSD

This step will install and config SSSD on each node of emr cluster. The same to installing OpenLDAP, we should still keep in local host to run the command line, it will perform on remote nodes via SSH.

sudo ./ranger-emr-cli-installer/bin/setup.sh install-sssd \--region "$REGION" \--ssh-key "$SSH_KEY" \--openldap-host "$OPENLDAP_HOST" \--openldap-base-dn 'dc=example,dc=com' \--sssd-bind-dn 'cn=sssd,ou=services,dc=example,dc=com' \--sssd-bind-password 'Admin1234!' \--emr-cluster-id "$EMR_CLUSTER_ID"

2.3.7 Configure Hue

This step will update hue configuration of emr, as highlighted in all-in-one installation , if you have other customized emr configuration, please skip this step, but you can still manually merge generated json file for hue configuration by command line into your own json.

sudo sh ./ranger-emr-cli-installer/bin/setup.sh configure-hue \--region "$REGION" \--auth-provider 'openldap' \--openldap-host "$OPENLDAP_HOST" \--openldap-base-dn 'dc=example,dc=com' \--hue-bind-dn 'cn=hue,ou=services,dc=example,dc=com' \--hue-bind-password 'Admin1234!' \--openldap-user-object-class 'inetOrgPerson' \--emr-cluster-id "$EMR_CLUSTER_ID"

2.3.8 Create Example Users

This step will create 2 example users in order to facilitate following verification.

sudo sh ./ranger-emr-cli-installer/bin/setup.sh add-example-users \--region "$REGION" \--ssh-key "$SSH_KEY" \--solution 'open-source' \--auth-provider 'openldap' \--openldap-host "$OPENLDAP_HOST" \--openldap-base-dn 'dc=example,dc=com' \--openldap-root-cn 'admin' \--openldap-root-password 'Admin1234!' \--example-users 'example-user-1,example-user-2'

3. Verification

After installation & integration is completed, it’s time to check if ranger works or not. The verification jobs are divided into 2 parts which are against hdfs and hive. First, let us login OpenLDAP via a client, i.e., LdapAdmin or Apache Directory Studio, then check out all DN, it should look as following:

Next, open ranger web console, the address is: http://<YOUR-RANGER-HOST>:6080, the default admin account/password is: admin/admin. After login, we should open “Users/Groups/Roles” page first, check if example users on OpenLDAP are already synchronized to ranger as following:

And besides, login the master node of emr cluster, export cluster id, because subsequent command lines need this variable.

# run on master node of emr cluster
export EMR_CLUSTER_ID='TO_BE_REPLACED'

The following is a copy of example:

# run on master node of emr cluster
export EMR_CLUSTER_ID='j-2S04VJZ5YQHZ4'

3.1 HDFS Access Control Verification

Usually, there are a set of pre-defined policies for hdfs plugin after installation as following:

We do NOT configure any HDFS permissions for example-user-1, but if we login Hue with the account example-user-1, you will see it can browse most directories and files on HDFS, this is because most directories and files has a+w permission. Please keep in mind that HDFS r/w/x file mode attributes and ranger-based permissions always take effective at the same time.

To verify if HDFS plugin works, we select “blacklist” mode to test. First, let’s create a directory named /ranger-test on hdfs, and set example-user-1 as its owner:

# run on master node of emr cluster
sudo -u hdfs hdfs dfs -mkdir /ranger-test
sudo -u hdfs hdfs dfs -chown example-user-1:example-group /ranger-test
sudo -u hdfs hdfs dfs -chmod 700 /ranger-test

Next, let’s add a deny-policy which disable example-user-1 read and write ranger-test:

Any policy changes on ranger web console will sync to agent side (emr cluster nodes) within 30 seconds, we can run following commands on master node to check if local policy file is updated:

# run on master node of emr cluster
for i in {1..10}; doprintf "\n%100s\n\n"|tr ' ' '='sudo stat /etc/ranger/HDFS_${EMR_CLUSTER_ID}/policycache/hdfs_HDFS_${EMR_CLUSTER_ID}.jsonsleep 3
done

Once local policy file is up to date, the deny policy become effective, then login Hue with OpenLDAP account “example-user-1” created by installer, open “File Browser”, click root directory “/”, then click “ranger-test” folder, we will get an error message: “Cannot access:/ranger-test”:

Even current user example-user-1 is the owner of this folder, it is still blocked by ranger hdfs plugin, this means hdfs access control is managed by ranger.

Finally, remember to REMOVE the “ranger-test” policy so as example-user-1 has full privileges to access this folder, because following hive verification will re-use this folder.

3.2 Hive Access Control Verification

Usually, there is a set of pre-defined policies for hive plugin after installation, to eliminate interference, keep verification simple, let’s REMOVE them first:

Any policy changes on ranger web console will sync to agent side (emr cluster nodes) within 30 seconds, we can run following commands on master node to check if local policy file is updated:

# run on master node of emr cluster
for i in {1..10}; doprintf "\n%100s\n\n"|tr ' ' '='sudo stat /etc/ranger/HIVE_${EMR_CLUSTER_ID}/policycache/hiveServer2_HIVE_${EMR_CLUSTER_ID}.jsonsleep 3
done

Once local policy file is up to date, removing-all-policies action become effective, then login Hue with OpenLDAP account “example-user-1” created by installer, open hive editor, enter following sql (remember to replace “ranger-test” with your own bucket) to create a test table (change ‘ranger-test’ to your own bucket name):

-- run in hue hive editor
create table ranger_test (id bigint
)
row format delimited
stored as textfile location '/ranger-test';

then, run it and an error occurs:

It shows example-user-1 is blocked by database-related permissions, this proves hive plugin is working, then we go back to ranger, add a hive policy named “all - database, table, column” as following:

It grants example-user-1 all privileges on all databases, tables and columns, then check policy file again on master node with previous command line, once updated, go back to Hue, re-run that sql, it will go well as following:

To double check if example-user-1 has full read & write permissions on the table, we can run following sql:

insert into ranger_test(id) values(1);
insert into ranger_test(id) values(2);
insert into ranger_test(id) values(3);
select * from ranger_test;

The execution result is:

By now, hive access control verifications are passed.

4. Appendix

The following is parameter specification:

Parameter Comment
–region the aws region.
–access-key-id the aws access key id of your IAM account.
–secret-access-key the aws secret access key of your IAM account.
–ssh-key the ssh private key file path.
–solution the solution name, accepted values ‘open-source’ or ‘emr-native’.
–auth-provider the authentication provider, accepted values ‘ad’ or ‘openldap’.
–openldap-host the FQDN of openldap host.
–openldap-base-dn the base dn of openldap, for example: ‘dc=example,dc=com’, change it according to your env.
–openldap-root-cn the cn of root account, for example: ‘admin’, change it according to your env.
–openldap-root-password the password of root account, for example: ‘Admin1234!’, change it according to your env.
–ranger-bind-dn the bind dn for ranger, for example: ‘cn=ranger,ou=services,dc=example,dc=com’, this should be an existing dn on Windows AD / OpenLDAP, change it according to your env.
–ranger-bind-password the password of ranger bind dn, for example: ‘Admin1234!’, change it according to your env.
–openldap-user-dn-pattern the dn pattern for ranger to search users on OpenLDAP, for example: ‘uid={0},ou=users,dc=example,dc=com’, change it according to your env.
–openldap-group-search-filter the filter for ranger to search groups on OpenLDAP, for example: ‘(member=uid={0},ou=users,dc=example,dc=com)’, change it according to your env.
–openldap-user-object-class the user object class for ranger to search users, for example: ‘inetOrgPerson’, change it according to your env.
–hue-bind-dn the bind dn for hue, for example: ‘cn=hue,ou=services,dc=example,dc=com’, this should be an existing dn on Windows AD / OpenLDAP, change it according to your env.
–hue-bind-password the password of hue bind dn, for example: ‘Admin1234!’, change it according to your env.
–example-users the example users to be created on OpenLDAP & Kerberos so as to demo ranger’s feature, this parameter is optional, if omitted, no example users will be created.
–ranger-bind-dn the bind dn for ranger, for example: ‘cn=ranger,ou=services,dc=example,dc=com’, this should be an existing dn on Windows AD / OpenLDAP, change it according to your env.
–ranger-bind-password the password of bind dn, for example: ‘Admin1234!’, change it according to your env.
–hue-bind-dn the bind dn for hue, for example: ‘cn=hue,ou=services,dc=example,dc=com’, this should be an existing dn on Windows AD / OpenLDAP, change it according to your env.
–hue-bind-password the password of hue bind dn, for example: ‘Admin1234!’, change it according to your env.
–sssd-bind-dn the bind dn for sssd, for example: ‘cn=sssd,ou=services,dc=example,dc=com’, this should be an existing dn on Windows AD / OpenLDAP, change it according to your env.
–sssd-bind-password the password of sssd bind dn, for example: ‘Admin1234!’, change it according to your env.
–ranger-plugins the ranger plugins to be installed, comma separated for multiple values. for example: ‘open-source-hdfs,open-source-hive’, change it according to your env.
–skip-configure-hue skip to configure hue, accepted values ‘true’ or ‘false’, dafault value is ‘false’.
–skip-migrate-kerberos-db skip to migrate kerberos database, accepted values ‘true’ or ‘false’, dafault value is ‘false’.

Related Reading:

Apache Ranger and AWS EMR Automated Installation Series
Apache Ranger and AWS EMR Automated Installation Series (1): Solutions Overview
Apache Ranger and AWS EMR Automated Installation Series (2): OpenLDAP + EMR-Native Ranger
Apache Ranger and AWS EMR Automated Installation Series (3): Windows AD + EMR-Native Ranger
Apache Ranger and AWS EMR Automated Installation Series (4): OpenLDAP + Open-Source Ranger
Apache Ranger and AWS EMR Automated Installation Series (5): Windows AD + Open-Source Ranger

Apache Ranger and AWS EMR Automated Installation Series (4): OpenLDAP + Open-Source Ranger相关推荐

  1. Apache Ranger and AWS EMR Automated Installation Series (3): Windows AD + EMR-Native Ranger

    文章目录 1. Solution Overview 1.1 Solution Architecture 1.2 Authentication in Detail 1.3 Authorization i ...

  2. AWS EMR内置Ranger插件使用的IAM Role及其设计策略

    AWS EMR提供三种内置的Ranger插件,分别是:S3(EMRFS),Spark,Hive,如果要启用这些插件,需要创建三个特定的IAM Role,以便相关组件能获得适当的权限.对这三种Role的 ...

  3. aws emr 大数据分析_DataOps —使用AWS Lambda和Amazon EMR的全自动,低成本数据管道

    aws emr 大数据分析 Progression is continuous. Taking a flashback journey through my 25 years career in in ...

  4. Apache Hudi在AWS Glue和AWS EMR上同步元数据的异同

    继我们上篇文章<在AWS Glue中使用Apache Hudi>介绍了如何在Glue中同步Hudi元数据之后,本文我们再简短截说一下Hudi在EMR上的元数据同步. 首先,EMR对于元数据 ...

  5. 在 aws emr 上,将 hbase table A 的数据,对 key 做 hash,写到另外一张 table B

    先 scan 原表,然后 bulkload 到新表. 采坑纪录 1. bulkload 产生 hfile 前,需要先对 hash(key) 做 repartition,在 shuffle 的 read ...

  6. aws emr服务重启

    AWS常用的集群管理服务有: hadoop-yarn-resourcemanager  oozie  hadoop-hdfs-namenode  hive-hcatalog-server  hadoo ...

  7. 修改aws emr系统datanode的ulimit值

    1.hadoop用户登录到EMR集群的core节点 2.执行sudo su命令切换到root用户 3. 执行 echo ' * - nofile 65535' >> /etc/securi ...

  8. 【读书笔记】提高编码效率 —— 《Mac 高效开发指南》

    文章目录 Mac 系统配置 快捷键进阶 光标移动 使用 Option 键 拓展预览程序 qlcolorcode qlstephen qlmarkdown quicklook-json betterzi ...

  9. 使用Apache Hudi + Amazon S3 + Amazon EMR + AWS DMS构建数据湖

    1. 引入 数据湖使组织能够在更短的时间内利用多个源的数据,而不同角色用户可以以不同的方式协作和分析数据,从而实现更好.更快的决策.Amazon Simple Storage Service(amaz ...

最新文章

  1. AlphaGo“兄弟”AlphaFold出世,DeepMind再创记录
  2. 这个寒冬,如何让我们的身价翻倍?
  3. django创建验证码
  4. Visual C++2005库的十项突破性变化
  5. 重装IE后,ASP.NET无法按F5启动项目的解决方法
  6. [开发笔记]-页面切图、CSS前端设计、JS
  7. 成功解决Error:invalid character in identifier
  8. Android ActionBar随ScorllView上下拖动而透明度渐变效果
  9. QT: 使用qtchooser修改ubuntu默认的qmake版本
  10. CodeForces 1009B(思路)
  11. 不容错过!我的Mac装机软件清单,Mac新手看过来!
  12. 如何使用硬盘安装debian8.3?
  13. 北京玉渊潭开启春节模式 五大版块吸引游客
  14. matlab 动画_MATLAB作图实例:51:表面动画
  15. NLP简报(Issue #3)
  16. 机器学习(3)--LR算法
  17. EOF和scanf用法
  18. 阿拉伯数字 0、1、2、3、4、5、6、7、8、9 书写规范
  19. Web爬虫|入门教程之爬虫简介
  20. 计算机实践教程作业桌面管理截图,如何将电脑现在的桌面截图,并保存在考试题目里面...

热门文章

  1. 数字经济时代下,企业税务管理数字化转型如何做?
  2. 工作心路历程系列2:离职大公司入职创业公司心路历程
  3. 佐佐吉牧:SEO算法变化与对策
  4. xp输入法不见了的解决方法
  5. 开学季值得买的蓝牙耳机有哪些?适合学生党买的平价蓝牙耳机
  6. 线路负荷较小时,线路元件是一个感性无功电源。为什么?
  7. PJzhang:kali linux安装金山wps、永中office、sougoupinyin、ibuspiyin
  8. 智印致in 兄弟按需供粉系列打印机新品发布
  9. 为何你会被强插广告?谈HTTPS连接的那些事
  10. 小米商城主页面的实现(HTML+CSS)