大数据软件详细安装方案
2022-09-01 17:54:22 4 举报
AI智能生成
大数据软件详细安装方案
作者其他创作
大纲/内容
Hadoop(Apache)(数据存储及计算)
安装JDK
1.卸载现有JDK
(1)查询是否安装Java软件:
[atguigu@hadoop101 opt]$ rpm -qa | grep java
(2)如果安装的版本低于1.7,卸载该JDK:
[atguigu@hadoop101 opt]$ sudo rpm -e 软件包
(3)查看JDK安装路径:
[atguigu@hadoop101 ~]$ which java
2.导入解压
[atguigu@hadoop101 software]$ tar -zxvf jdk-8u144-linux-x64.tar.gz -C /opt/module/
3.配置环境
[atguigu@hadoop101 software]$ sudo vi /etc/profile
profile
#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_144
export PATH=$PATH:$JAVA_HOME/bin
source /etc/profile
4.检查
java -version
安装Hadoop
1.导入解压
tar -zxvf hadoop-2.7.2.tar.gz -C /opt/module/
2.配置路径
sudo vi /etc/profile
profile
##HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
source /etc/profile
3.检查
hadoop version
完全分布式环境配置
1.虚拟机准备
防火墙
sudo service iptables stop
sudo chkconfig iptables off
静态IP
vim /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
TYPE=Ethernet
ONBOOT=yes
BOOTPROTO=static
NAME="eth0"
IPADDR=192.168.5.101
PREFIX=24
GATEWAY=192.168.5.2
DNS1=192.168.5.2
vim /etc/udev/rules.d/70-persistent-net.rules
主机名
vim /etc/sysconfig/network
hosts目录
vim /etc/hosts
192.168.1.100 hadoop100
192.168.1.101 hadoop101
192.168.1.102 hadoop102
192.168.1.103 hadoop103
192.168.1.104 hadoop104
192.168.1.105 hadoop105
192.168.1.106 hadoop106
192.168.1.107 hadoop107
192.168.1.108 hadoop108
192.168.1.109 hadoop109
配置用户
useradd atguigu
passwd atguigu
vim /etc/sudoers
root ALL=(ALL) ALL
atguigu ALL=(ALL) NOPASSWD:ALL
建立文件夹
mkdir /opt/module /opt/software
chown atguigu:atguigu /opt/module /opt/software
配置分发脚本
cd ~
vi xsync
#!/bin/bash
#1 获取输入参数个数,如果没有参数,直接退出
pcount=$#
if ((pcount==0)); then
echo no args;
exit;
fi
分支主题
#2 获取文件名称
p1=$1
fname=`basename $p1`
echo fname=$fname
分支主题
#3 获取上级目录到绝对路径
pdir=`cd -P $(dirname $p1); pwd`
echo pdir=$pdir
分支主题
#4 获取当前用户名称
user=`whoami`
分支主题
#5 循环
for((host=103; host<105; host++)); do
echo ------------------- hadoop$host --------------
rsync -av $pdir/$fname $user@hadoop$host:$pdir
done
chmod +x xsync
sudo cp xsync /bin
sudo xsync /bin/xsync
2.配置SSH
ssh-keygen -t rsa
ssh-copy-id hadoop158
ssh hadoop103
exit
ssh hadoop104
exit
xsync /home/atguigu/.ssh
3.配置环境变量
环境
hadoop-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
yarn-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
mapred-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
core-site.xml
<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop102:9000</value>
</property>
分支主题
<!-- 指定Hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-2.7.2/data/tmp</value>
</property>
hdfs-site.xml
<!-- 数据的副本数量 -->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 指定Hadoop辅助名称节点主机配置 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop104:50090</value>
</property>
yarn-site.xml
<!-- Site specific YARN configuration properties -->
<!-- Reducer获取数据的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
分支主题
<!-- 指定YARN的ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop103</value>
</property>
<!-- 日志聚集功能使能 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
分支主题
<!-- 日志保留时间设置7天 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
mapred-site.xml
配置
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 历史服务器端地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop104:10020</value>
</property>
<!-- 历史服务器web端地址 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop104:19888</value>
</property>
启动
启动历史服务器:mr-jobhistory-daemon.sh start historyserver
slaves
hadoop102
hadoop103
hadoop104
4.群起并测试
分发
xsync /opt/module/hadoop-2.7.2/etc
格式化
hdfs namenode -format
启动
start-dfs.sh
start-yarn.sh
如果出问题
rm -rf data logs
LZO压缩配置
下载并解压LZO,置入hadoop/share/hadoop/common中
分发同步到各机
增加core-site.xml配置并同步
<property>
<name>io.compression.codecs</name>
<value>
org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.BZip2Codec,
org.apache.hadoop.io.compress.SnappyCodec,
com.hadoop.compression.lzo.LzoCodec,
com.hadoop.compression.lzo.LzopCodec
</value>
</property>
分支主题
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
Hadoop扩容
Linux硬盘扩容与挂载
创建并格式化新分区
fdisk /dev/sda
m #进入帮助引导模式
n #新增分区
p #指定新分区为基本分区
一路回车 #但要记住分区号
w #保存并执行刚才的分区操作
reboot #重启
fdisk -l
mkfs.xfs /dev/sdax,x为分区号
创建路径并挂载盘符
mkdir /newdisk
临时挂载
mount /dev/sdax /newdisk
永久挂载
vim /etc/fstab
/dev/sdax /newdisk ext4 defaults 0 0
赋予权限
chown -R atguigu:atguigu /newdisk
hdfs的扩容
vim /opt/module/hadoop-2.7.2/etc/hadoop/hdfs-site.xml
<property>
<name>dfs.datanode.data.dir</name>
<value>${hadoop.tmp.dir}/dfs/data, /newdisk</value>
</property>
Hadoop(CDH)(数据存储及计算)
虚拟机准备
安装MySQL
卸载MySQL
查看是否安装Mysql
rpm -qa | grep -i mysql
查看MySQL服务是否启动,关闭
sudo service mysql status
sudo service mysql stop
卸载MySQL安装的组件
sudo rpm -e MySQL-server-5.6.24-1.el6.x86_64
sudo rpm -e MySQL-client-5.6.24-1.el6.x86_64
查找并删除MySQL相关的文件
whereis mysql
sudo find / -name mysql
sudo rm -rf /var/lib/mysql
sudo rm -rf /usr/lib64/mysql
安装MySQL
安装启动服务端
rpm -ivh MySQL-server-5.6.24-1.el6.x86_64.rpm
cat /root/.mysql_secret
service mysql status
service mysql start
安装客户端
rpm -ivh MySQL-client-5.6.24-1.el6.x86_64.rpm
mysql -uroot -p
mysql>SET PASSWORD=PASSWORD('000000');
mysql>exit
配置User表
mysql -uroot -p
show databases;
use mysql;
show tables;
desc user;
update user set host='%' where host='localhost';
delete from user where Host='hadoop101';
delete from user where Host='127.0.0.1';
delete from user where Host='::1';
flush privileges;
配置分发脚本
cd ~
vi xsync
#!/bin/bash
#1 获取输入参数个数,如果没有参数,直接退出
pcount=$#
if ((pcount==0)); then
echo no args;
exit;
fi
分支主题
#2 获取文件名称
p1=$1
fname=`basename $p1`
echo fname=$fname
分支主题
#3 获取上级目录到绝对路径
pdir=`cd -P $(dirname $p1); pwd`
echo pdir=$pdir
分支主题
#4 获取当前用户名称
user=`whoami`
分支主题
#5 循环
for((host=103; host<105; host++)); do
echo ------------------- hadoop$host --------------
rsync -av $pdir/$fname $user@hadoop$host:$pdir
done
chmod +x xsync
sudo cp xsync /bin
sudo xsync /bin/xsync
安装JDK
1.卸载现有JDK
(1)查询是否安装Java软件:
[atguigu@hadoop101 opt]$ rpm -qa | grep java
(2)如果安装的版本低于1.7,卸载该JDK:
[atguigu@hadoop101 opt]$ sudo rpm -e 软件包
(3)查看JDK安装路径:
[atguigu@hadoop101 ~]$ which java
2.导入解压
[atguigu@hadoop101 software]$ tar -zxvf jdk-8u144-linux-x64.tar.gz -C /opt/module/
3.配置环境
[atguigu@hadoop101 software]$ sudo vi /etc/profile
profile
#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_144
export PATH=$PATH:$JAVA_HOME/bin
source /etc/profile
4.检查
java -version
关闭SELinux
setenforce 0(临时关闭)
vim /etc/selinux/config
SELINUX=disabled
xsync /etc/selinux/config
下载第三方依赖
yum -y install chkconfig python bind-utils psmisc libxslt zlib sqlite cyrus-sasl-plain cyrus-sasl-gssapi fuse fuse-libs redhat-lsb
安装CM
tar -zxvf /opt/software/cloudera-manager-el6-cm5.12.1_x86_64.tar.gz -C /opt/module/cm/
在各机创建用户
useradd \
--system \
--home=/opt/module/cm/cm-5.12.1/run/cloudera-scm-server \
--no-create-home \
--shell=/bin/false \
--comment "Cloudera SCM User" cloudera-scm
修改Agent配置
vim /opt/module/cm/cm-5.12.1/etc/cloudera-scm-agent/config.ini
server_host=hadoop102
配置数据库
mkdir /usr/share/java/
tar -zxvf mysql-connector-java-5.1.27.tar.gz
cp /opt/software/mysql-libs/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar /usr/share/java/
mv /usr/share/java/mysql-connector-java-5.1.27-bin.jar /usr/share/java/mysql-connector-java.jar
创建CM库并分发
/opt/module/cm/cm-5.12.1/share/cmf/schema/scm_prepare_database.sh mysql cm -hhadoop102 -uroot -p000000 --scm-host hadoop102 scm scm scm
xsync /opt/module/cm
创建Parcel-repo
mkdir -p /opt/cloudera/parcel-repo
chown cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo
拷贝文件到/opt/cloudera/parcel-repo目录下
CDH-5.12.1-1.cdh5.12.1.p0.3-el6.parcel
CDH-5.12.1-1.cdh5.12.1.p0.3-el6.parcel.sha1
manifest.json
mv CDH-5.12.1-1.cdh5.12.1.p0.3-el6.parcel.sha1 CDH-5.12.1-1.cdh5.12.1.p0.3-el6.parcel.sha
创建/opt/cloudera/parcels
mkdir -p /opt/cloudera/parcels
修改权限组
chown cloudera-scm:cloudera-scm /opt/cloudera/parcels
分发
xsync /opt/cloudera/
启动和关闭服务
启动服务
服务节点
/opt/module/cm/cm-5.12.1/etc/init.d/cloudera-scm-server start
工作节点
/opt/module/cm/cm-5.12.1/etc/init.d/cloudera-scm-agent start
访问网站
http://hadoop102:7180(用户名、密码:admin)
关闭服务
工作节点
/opt/module/cm/cm-5.12.1/etc/init.d/cloudera-scm-agent stop
服务节点
/opt/module/cm/cm-5.12.1/etc/init.d/cloudera-scm-server stop
CM集群部署
傻瓜式安装
Hadoop(HDP)(数据存储及计算)
虚拟机准备(三机均需要)
关闭防火墙
chkconfig iptables off
service iptables stop
chkconfig --list iptables
关闭SELINUX
vim /etc/sysconfig/selinux
SELINUX=disabled
安装JDK
SSH免密登陆(三机均需要)
ssh-keygen -t rsa
ssh-copy-id hadoop102
ssh-copy-id hadoop103
ssh-copy-id hadoop104
修改yum源
vim /etc/resolv.conf
nameserver 223.5.5.5
nameserver 223.6.6.6
mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.bk
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-6.repo
yum makecache
安装ntp
yum install -y ntp
chkconfig --list ntpd
chkconfig ntpd on
service ntpd start
关闭Linux的THP服务
vim /etc/grub.conf
transparent_hugepage=never
vim /etc/rc.local
if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
echo never > /sys/kernel/mm/transparent_hugepage/defrag
fi
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi
exit 0
检查
cat /sys/kernel/mm/redhat_transparent_hugepage/enabled
always madvise [never]
配置UMASK
umask 0022
禁止离线更新
vim /etc/yum/pluginconf.d/refresh-packagekit.conf
enabled=0
安装Ambari集群
制作本地源
配置HTTPD服务
chkconfig httpd on
service httpd start
安装工具
yum install yum-utils createrepo yum-plugin-priorities -y
vim /etc/yum/pluginconf.d/priorities.conf
gpgcheck=0
下载并解压ambari-2.5.0.3/HDP-2.6.0.3/HDP-UTILS-1.1.0.21
tar -zxvf /opt/software/ambari-2.5.0.3-centos6.tar.gz -C /var/www/html/
mkdir /var/www/html/hdp
tar -zxvf /opt/software/HDP-2.6.0.3-centos6-rpm.tar.gz -C /var/www/html/hdp
tar -zxvf /opt/software/HDP-UTILS-1.1.0.21-centos6.tar.gz -C /var/www/html/hdp
创建本地源
cd /var/www/html/
createrepo ./
将Ambari存储库文件下载到安装主机上的目录中
wget -nv http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.6.1.5/ambari.repo -O /etc/yum.repos.d/ambari.repo
修改配置文件
vim /etc/yum.repos.d/ambari.repo
#VERSION_NUMBER=2.6.1.5-3
[ambari-2.6.1.5]
name=ambari Version - ambari-2.6.1.5
baseurl=http://hadoop102/ambari/centos6/
gpgcheck=0
gpgkey=http://hadoop102/ambari/centos6/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
enabled=1
vim /var/www/html/hdp/hdp-util.repo
[HDP-UTILS-1.1.0.21]
name=Hortonworks Data Platform Version - HDP-UTILS-1.1.0.21
baseurl=http://hadoop102/hdp/
gpgcheck=0
enabled=1
priority=1
vim /var/www/html/hdp/HDP/centos6/hdp.repo
#VERSION_NUMBER=2.6.0.3-8
[HDP-2.6.0.3]
name=HDP Version - HDP-2.6.0.3
baseurl=http://hadoop102/hdp/HDP/centos6/
gpgcheck=0
gpgkey=http://hadoop102/hdp/HDP/centos6/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
enabled=1
priority=1
分支主题
分支主题
[HDP-UTILS-1.1.0.21]
name=HDP-UTILS Version - HDP-UTILS-1.1.0.21
baseurl=http://hadoop102/hdp/
gpgcheck=0
gpgkey=http://hadoop102/hdp/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
enabled=1
priority=1
yum clean all
yum makecache
检查
http://hadoop102/ambari/centos6/
http://hadoop102/hdp/HDP/centos6/
http://hadoop102/hdp/
安装MySQL
卸载MySQL
查看是否安装Mysql
rpm -qa | grep -i mysql
查看MySQL服务是否启动,关闭
sudo service mysql status
sudo service mysql stop
卸载MySQL安装的组件
sudo rpm -e MySQL-server-5.6.24-1.el6.x86_64
sudo rpm -e MySQL-client-5.6.24-1.el6.x86_64
rpm -e --nodeps mysql-libs-5.1.73-7.el6.x86_64
查找并删除MySQL相关的文件
whereis mysql
sudo find / -name mysql
sudo rm -rf /var/lib/mysql
sudo rm -rf /usr/lib64/mysql
安装MySQL
安装启动服务端
rpm -ivh MySQL-server-5.6.24-1.el6.x86_64.rpm
cat /root/.mysql_secret
service mysql status
service mysql start
安装客户端
rpm -ivh MySQL-client-5.6.24-1.el6.x86_64.rpm
mysql -uroot -p
mysql>SET PASSWORD=PASSWORD('000000');
mysql>exit
配置User表
mysql -uroot -p
show databases;
use mysql;
show tables;
desc user;
update user set host='%' where host='localhost';
delete from user where Host='hadoop101';
delete from user where Host='127.0.0.1';
delete from user where Host='::1';
flush privileges;
安装Ambari
安装ambari-server
yum install ambari-server
拷贝mysql驱动
mkdir /usr/share/java
cp /opt/software/mysql-libs/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar /usr/share/java/mysql-connector-java.jar
cp /usr/share/java/mysql-connector-java.jar /var/lib/ambari-server/resources/mysql-jdbc-driver.jar
vim /etc/ambari-server/conf/ambari.properties
server.jdbc.driver.path=/usr/share/java/mysql-connector-java.jar
在MySQL中创建数据库
create database ambari;
use ambari;
source /var/lib/ambari-server/resources/Ambari-DDL-MySQL-CREATE.sql;
grant all privileges on *.* to 'root'@'%' identified by '000000';
flush privileges;
配置Ambari
ambari-server setup
y
分支主题
3
/opt/module/jdk1.8.0_144
y
3
分支主题
y
启动Ambari
ambari-server start
ambari-server stop
HDP集群部署
集群搭建
http://hadoop102:8080/
Launch Install Wizard
本地库地址
http://hadoop158/hdp/HDP/centos6/
http://hadoop158/hdp/
id.rsa
Target hosts
hadoop158
hadoop161
hadoop162
id_rsa文件位于/root/.ssh下,隐藏文件
安装Hive
mkdir -p /path/to/mysql/
cp /opt/software/mysql-libs/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar /path/to/mysql/mysql-connector-java.jar
ambari-server setup --jdbc-db=mysql --jdbc-driver=/path/to/mysql/mysql-connector-java.jar
安装Hive
创建Hive数据库
create database hive
添加服务,按流程进行
配置HDFS-HA
添加服务
nameservice
nameservice
Ranger
添加服务Ambari Infra
安装Ranger
在MySQL中创建用户并授权
mysql -uroot -p
create database ranger;
CREATE USER 'rangerdba'@'localhost' IDENTIFIED BY 'rangerdba';
GRANT ALL PRIVILEGES ON *.* TO 'rangerdba'@'localhost';
CREATE USER 'rangerdba'@'%' IDENTIFIED BY 'rangerdba';
GRANT ALL PRIVILEGES ON *.* TO 'rangerdba'@'%';
GRANT ALL PRIVILEGES ON *.* TO 'rangerdba'@'localhost' WITH GRANT OPTION;
GRANT ALL PRIVILEGES ON *.* TO 'rangerdba'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;
配置JDBC链接
ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar
添加服务,填写配置
Ranger DB host
Ranger DB username
Ranger DB password
Database Administrator (DBA) password
Ranger Audit
Audit to Solr
SolrCloud
Audit to HDFS
添加插件
Configs
Ranger Plugin
HDFS Ranger Plugin
YARN Ranger Plugin
操作Ranger
进入WebUI
账号密码均为admin
添加权限策略
子主题 1
验证是否生效
su – atguigu
hadoop fs -ls /user
hadoop fs -mkdir /user/test
Zookeeper(监控)
Zookeeper
解压分发
tar -zxvf zookeeper-3.4.10.tar.gz -C /opt/module/
xsync zookeeper-3.4.10/
配置服务编号
mkdir -p zkData
touch myid
vi myid
2
xsync myid
配置zoo.cfg(conf)
mv zoo_sample.cfg zoo.cfg
vim zoo.cfg
dataDir=/opt/module/zookeeper-3.4.10/zkData
#######################cluster##########################
server.2=hadoop102:2888:3888
server.3=hadoop103:2888:3888
server.4=hadoop104:2888:3888
xsync zoo.cfg
server.A=B:C:D
启动
bin/zkServer.sh start
bin/zkServer.sh status
HDFS HA
创建并复制hadoop
mkdir /opt/ha
cp -r hadoop-2.7.2/ /opt/ha/
配置hadoop
环境
hadoop-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
core-site.xml
<configuration>
<!-- 把两个NameNode)的地址组装成一个集群mycluster -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
分支主题
<!-- 指定hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/ha/hadoop-2.7.2/data/tmp</value>
</property>
<!-- 自动故障转移 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop102:2181,hadoop103:2181,hadoop104:2181</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<!-- 完全分布式集群名称 -->
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
分支主题
<!-- 集群中NameNode节点都有哪些 -->
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
分支主题
<!-- nn1的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>hadoop102:9000</value>
</property>
分支主题
<!-- nn2的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>hadoop103:9000</value>
</property>
分支主题
<!-- nn1的http通信地址 -->
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>hadoop102:50070</value>
</property>
分支主题
<!-- nn2的http通信地址 -->
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>hadoop103:50070</value>
</property>
分支主题
<!-- 指定NameNode元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop102:8485;hadoop103:8485;hadoop104:8485/mycluster</value>
</property>
分支主题
<!-- 配置隔离机制,即同一时刻只能有一台服务器对外响应 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
分支主题
<!-- 使用隔离机制时需要ssh无秘钥登录-->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/atguigu/.ssh/id_rsa</value>
</property>
分支主题
<!-- 声明journalnode服务器存储目录-->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/ha/hadoop-2.7.2/data/jn</value>
</property>
分支主题
<!-- 关闭权限检查-->
<property>
<name>dfs.permissions.enable</name>
<value>false</value>
</property>
分支主题
<!-- 访问代理类:client,mycluster,active配置失败自动切换实现方式-->
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!--自动故障转移-->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>
xsync /opt/module/ha
启动
sbin/hadoop-daemon.sh start journalnode
NN1格式化并启动
bin/hdfs namenode -format
sbin/hadoop-daemon.sh start namenode
NN2设置同步并启动
bin/hdfs namenode -bootstrapStandby
sbin/hadoop-daemon.sh start namenode
启动NN1并切换为Active
sbin/hadoop-daemons.sh start datanode
bin/hdfs haadmin -transitionToActive nn1
bin/hdfs haadmin -getServiceState nn1
配置完自动故障转移后再启动
sbin/stop-dfs.sh
bin/zkServer.sh start
bin/hdfs zkfc -formatZK
sbin/start-dfs.sh
Yarn HA
配置yarn-site.xml
<configuration>
分支主题
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
分支主题
<!--启用resourcemanager ha-->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
分支主题
<!--声明两台resourcemanager的地址-->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster-yarn1</value>
</property>
分支主题
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
分支主题
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop102</value>
</property>
分支主题
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop103</value>
</property>
分支主题
<!--指定zookeeper集群的地址-->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop102:2181,hadoop103:2181,hadoop104:2181</value>
</property>
分支主题
<!--启用自动恢复-->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
分支主题
<!--指定resourcemanager的状态信息存储在zookeeper集群-->
<property>
<name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
分支主题
</configuration>
启动hdfs
sbin/hadoop-daemon.sh start journalnode
初始化NN1并启动
bin/hdfs namenode -format
sbin/hadoop-daemon.sh start namenode
NN2同步并启动
bin/hdfs namenode -bootstrapStandby
sbin/hadoop-daemon.sh start namenode
启动所有DN
sbin/hadoop-daemons.sh start datanode
bin/hdfs haadmin -transitionToActive nn1
启动yarn
sbin/start-yarn.sh
sbin/yarn-daemon.sh start resourcemanager
bin/yarn rmadmin -getServiceState rm1
Hive(查询)
安装Hive及配置
tar -zxvf apache-hive-1.2.1-bin.tar.gz -C /opt/module/
mv apache-hive-1.2.1-bin/ hive
mv hive-env.sh.template hive-env.sh
配置hive-env.sh
export HADOOP_HOME=/opt/module/hadoop-2.7.2
export HIVE_CONF_DIR=/opt/module/hive/conf
Hadoop集群配置
必须启动hdfs和yarn
sbin/start-dfs.sh
sbin/start-yarn.sh
在HDFS上创建/tmp和/user/hive/warehouse并修改权限
bin/hadoop fs -mkdir /tmp
bin/hadoop fs -mkdir -p /user/hive/warehouse
bin/hadoop fs -chmod g+w /tmp
bin/hadoop fs -chmod g+w /user/hive/warehouse
安装MySQL
卸载MySQL
查看是否安装Mysql
rpm -qa | grep -i mysql
查看MySQL服务是否启动,关闭
sudo service mysql status
sudo service mysql stop
卸载MySQL安装的组件
sudo rpm -e MySQL-server-5.6.24-1.el6.x86_64
sudo rpm -e MySQL-client-5.6.24-1.el6.x86_64
查找并删除MySQL相关的文件
whereis mysql
sudo find / -name mysql
sudo rm -rf /var/lib/mysql
sudo rm -rf /usr/lib64/mysql
安装MySQL
安装启动服务端
rpm -ivh MySQL-server-5.6.24-1.el6.x86_64.rpm
cat /root/.mysql_secret
service mysql status
service mysql start
安装客户端
rpm -ivh MySQL-client-5.6.24-1.el6.x86_64.rpm
mysql -uroot -p
mysql>SET PASSWORD=PASSWORD('000000');
mysql>exit
配置User表
mysql -uroot -p
show databases;
use mysql;
show tables;
desc user;
update user set host='%' where host='localhost';
delete from user where Host='hadoop102';
delete from user where Host='127.0.0.1';
delete from user where Host='::1';
flush privileges;
Hive元数据配置到MySQL
驱动拷贝
tar -zxvf mysql-connector-java-5.1.27.tar.gz
cp mysql-connector-java-5.1.27-bin.jar /opt/module/hive/lib/
配置Metastore到MySQL
touch /opt/module/hive/conf/hive-site.xml
vi hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop102:3306/metastore?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
分支主题
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
分支主题
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
分支主题
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>000000</value>
<description>password to use against metastore database</description>
</property>
</configuration>
数仓配置
Default数据仓库的最原始位置是在hdfs上的:/user/hive/warehouse路径下。
在仓库目录下,没有对默认的数据库default创建文件夹。如果某张表属于default数据库,直接在数据仓库目录下创建一个文件夹。
hive-site.xml
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
bin/hdfs dfs -chmod g+w /user/hive/warehouse
查询后信息显示配置
hive-site.xml
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
分支主题
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
运行日志信息配置
Hive的log默认存放在/tmp/atguigu/hive.log目录下
修改hive的log存放日志到/opt/module/hive/logs
mv hive-log4j.properties.template hive-log4j.properties
vim hive-log4j.properties
hive.log.dir=/opt/module/hive/logs
Hive配置Tez引擎
hive-env.sh
hive-site.xml
配置Tez
/opt/module/hive/conf
tez-site.xml
上传Tez到集群
hadoop fs -mkdir /tez
hadoop fs -put /opt/module/tez-0.9.1/ /tez
针对Tez被Nodemanager杀死的情况
方案一
yarn-site.xml
方案二
mapred-site.xml
开启Map输出阶段压缩
进入Hive
set hive.exec.compress.intermediate=true;
set mapreduce.map.output.compress=true;
set mapreduce.map.output.compress.codec= org.apache.hadoop.io.compress.SnappyCodec;
开启Reduce输出阶段压缩
进入Hive
set hive.exec.compress.output=true;
set mapreduce.output.fileoutputformat.compress=true;
set mapreduce.output.fileoutputformat.compress.type=BLOCK;
Flume(日志采集)
解压安装配置
tar -zxf apache-flume-1.7.0-bin.tar.gz -C /opt/module/
mv apache-flume-1.7.0-bin flume
mv flume-env.sh.template flume-env.sh
vi flume-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
传到HDFS需要将jar包置入/opt/module/flume/lib中
Kafka(用于实时处理的消息队列)
安装Kafka
tar -zxvf kafka_2.11-0.11.0.0.tgz -C /opt/module/
mv kafka_2.11-0.11.0.0/ kafka
配置文件
mkdir logs
cd config/
vi server.properties
#broker的全局唯一编号,不能重复
broker.id=0
#删除topic功能使能
delete.topic.enable=true
#处理网络请求的线程数量
num.network.threads=3
#用来处理磁盘IO的现成数量
num.io.threads=8
#发送套接字的缓冲区大小
socket.send.buffer.bytes=102400
#接收套接字的缓冲区大小
socket.receive.buffer.bytes=102400
#请求套接字的缓冲区大小
socket.request.max.bytes=104857600
#kafka运行日志存放的路径
log.dirs=/opt/module/kafka/logs
#topic在当前broker上的分区个数
num.partitions=1
#用来恢复和清理data下数据的线程数量
num.recovery.threads.per.data.dir=1
#segment文件保留的最长时间,超时将被删除
log.retention.hours=168
#配置连接Zookeeper集群地址
zookeeper.connect=hadoop102:2181,hadoop103:2181,hadoop104:2181
配置环境变量
sudo vi /etc/profile
#KAFKA_HOME
export KAFKA_HOME=/opt/module/kafka
export PATH=$PATH:$KAFKA_HOME/bin
source /etc/profile
xsync kafka/
/opt/module/kafka/config/server.properties中的broker.id=1、broker.id=2
Kafka Monitor
下载安装包并解压到集群opt/module/kafka-offset-console
创建启动脚本start.sh
#!/bin/bash
java -cp KafkaOffsetMonitor-assembly-0.4.6-SNAPSHOT.jar \
com.quantifind.kafka.offsetapp.OffsetGetterWeb \
--offsetStorage kafka \
--kafkaBrokers hadoop102:9092,hadoop103:9092,hadoop104:9092 \
--kafkaSecurityProtocol PLAINTEXT \
--zk hadoop102:2181,hadoop103:2181,hadoop104:2181 \
--port 8086 \
--refresh 10.seconds \
--retain 2.days \
--dbName offsetapp_kafka &
创建文件夹mobile-logs
启动zk和kf,然后启动kafka monitor
Kafka Manager
下载并解压
修改conf文件
application.conf
kafka-manager.zkhosts="hadoop102:2181,hadoop103:2181,hadoop104:2181"
配置JMX脚本
HBase(NoSQL数据库)
开启Zookeeper
开启HDFS
解压
tar -zxvf hbase-1.3.1-bin.tar.gz -C /opt/module
配置文件
hbase-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
export HBASE_MANAGES_ZK=false
hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop102:9000/hbase</value>
</property>
分支主题
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
分支主题
<!-- 0.98后的新变动,之前版本没有.port,默认端口为60000 -->
<property>
<name>hbase.master.port</name>
<value>16000</value>
</property>
分支主题
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop102,hadoop103,hadoop104</value>
</property>
分支主题
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/opt/module/zookeeper-3.4.10/zkData</value>
</property>
</configuration>
regionservers
hadoop102
hadoop103
hadoop104
发送同步
xsync hbase/
启动
bin/start-hbase.sh
http://hadoop102:16010
Sqoop(数据传递)
解压
tar -zxf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz -C /opt/module/
修改配置文件
重命名
mv sqoop-env-template.sh sqoop-env.sh
vi sqoop-env.sh
export HADOOP_COMMON_HOME=/opt/module/hadoop-2.7.2
export HADOOP_MAPRED_HOME=/opt/module/hadoop-2.7.2
export HIVE_HOME=/opt/module/hive
export ZOOKEEPER_HOME=/opt/module/zookeeper-3.4.10
export ZOOCFGDIR=/opt/module/zookeeper-3.4.10/conf
export HBASE_HOME=/opt/module/hbase
拷贝jdbc驱动
cp mysql-connector-java-5.1.27-bin.jar /opt/module/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib/
验证链接
bin/sqoop list-databases --connect jdbc:mysql://hadoop102:3306/ --username root --password 000000
Oozie(任务调度)CDH版本
安装部署hadoop(cdh版本)
解压hadoop-2.5.0-cdh5.3.6.tar.gz
tar -zxvf hadoop-2.5.0-cdh5.3.6.tar.gz -C /opt/module/cdh
配置环境变量
环境
hadoop-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
yarn-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
mapred-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
core-site.xml
<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop102:8020</value>
</property>
分支主题
<!-- 指定Hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/cdh/hadoop-2.5.0-cdh5.3.6/data/tmp</value>
</property>
<!-- Oozie Server的Hostname -->
<property>
<name>hadoop.proxyuser.atguigu.hosts</name>
<value>*</value>
</property>
分支主题
<!-- 允许被Oozie代理的用户组 -->
<property>
<name>hadoop.proxyuser.atguigu.groups</name>
<value>*</value>
</property>
hdfs-site.xml
<!-- 数据的副本数量 -->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 指定Hadoop辅助名称节点主机配置 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop104:50090</value>
</property>
yarn-site.xml
<!-- Site specific YARN configuration properties -->
<!-- Reducer获取数据的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
分支主题
<!-- 指定YARN的ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop103</value>
</property>
<!-- 日志聚集功能使能 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
分支主题
<!-- 日志保留时间设置7天 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
<!-- 任务历史服务 -->
<property>
<name>yarn.log.server.url</name>
<value>http://hadoop102:19888/jobhistory/logs/</value>
</property>
mapred-site.xml
配置
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 历史服务器端地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop104:10020</value>
</property>
<!-- 历史服务器web端地址 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop104:19888</value>
</property>
启动
启动历史服务器:mr-jobhistory-daemon.sh start historyserver
slaves
hadoop102
hadoop103
hadoop104
同步
xsync cdh
格式化并启动集群
hdfs namenode -format
sbin/start-dfs.sh
sbin/start-yarn.sh
sbin/mr-jobhistory-daemon.sh start historyserver
部署Oozie
安装解压及拷贝
解压Oozie
tar -zxvf /opt/software/cdh/oozie-4.0.0-cdh5.3.6.tar.gz -C /opt/module
解压oozie-hadooplibs-4.0.0-cdh5.3.6.tar.gz
tar -zxvf oozie-hadooplibs-4.0.0-cdh5.3.6.tar.gz -C ../
创建libext
mkdir libext/
拷贝依赖的jar包
cp -ra hadooplibs/hadooplib-2.5.0-cdh5.3.6.oozie-4.0.0-cdh5.3.6/* libext/
cp -a /opt/software/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar ./libext/
拷贝ext-2.2
cp -a /opt/software/cdh/ext-2.2.zip libext/
修改Oozie配置文件
vi oozie-site.xml
属性:oozie.service.JPAService.jdbc.driver
属性值:com.mysql.jdbc.Driver
解释:JDBC的驱动
分支主题
属性:oozie.service.JPAService.jdbc.url
属性值:jdbc:mysql://hadoop102:3306/oozie
解释:oozie所需的数据库地址
分支主题
属性:oozie.service.JPAService.jdbc.username
属性值:root
解释:数据库用户名
分支主题
属性:oozie.service.JPAService.jdbc.password
属性值:000000
解释:数据库密码
分支主题
属性:oozie.service.HadoopAccessorService.hadoop.configurations
属性值:*=/opt/module/CDH/hadoop-2.5.0-cdh5.3.6/etc/hadoop
解释:让Oozie引用Hadoop的配置文件
Mysql创建Oozie数据库
create database oozie;
初始化Oozie
上传Oozie目录下的yarn.tar.gz文件到HDFS
bin/oozie-setup.sh sharelib create -fs hdfs://hadoop131:8020 -locallib oozie-sharelib-4.0.0-cdh5.3.6-yarn.tar.gz
创建oozie.sql文件
bin/ooziedb.sh create -sqlfile oozie.sql -run
打包项目,生成war包
bin/oozie-setup.sh prepare-war
启动
bin/oozied.sh start
网页
http://hadoop131:11000/oozie
Azkaban(任务调度)
安装Azkaban
mkdir /opt/module/azkaban
解压server,executor,及sql-script文件到该目录下
重命名解压文件
mv azkaban-web-2.5.0/ server
mv azkaban-executor-2.5.0/ executor
脚本导入创建数据库
mysql -uroot -p000000
mysql> create database azkaban;
mysql> use azkaban;
mysql> source /opt/module/azkaban/azkaban-2.5.0/create-all-sql-2.5.0.sql
生成密钥对和证书
keytool -keystore keystore -alias jetty -genkey -keyalg RSA
mv keystore /opt/module/azkaban/server/
时间同步配置(若三机同步,可不做)
tzselect
cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
sudo date -s '2018-10-18 16:39:30'
配置文件
Web服务器配置
vim /opt/module/azkaban/server/conf/azkaban.properties
vim /opt/module/azkaban/server/conf/azkaban-users.xml
执行服务器配置
vim /opt/module/azkaban/executor/conf/azkaban.properties
Kettle(ETL工具)
绿色安装,解压即用
Kettle的集群配置
启动hadoop
上传解压kettle安装包
配置文件
/opt/module/data-integration/pwd
vim carte-config-master-8080.xml
<slaveserver>
<name>master</name>
<hostname>hadoop102</hostname>
<port>8080</port>
<master>Y</master>
<username>cluster</username>
<password>cluster</password>
</slaveserver>
carte-config-8081.xml
<masters>
<slaveserver>
<name>master</name>
<hostname>hadoop102</hostname>
<port>8080</port>
<username>cluster</username>
<password>cluster</password>
<master>Y</master>
</slaveserver>
</masters>
<report_to_masters>Y</report_to_masters>
<slaveserver>
<name>slave1</name>
<hostname>hadoop103</hostname>
<port>8081</port>
<username>cluster</username>
<password>cluster</password>
<master>N</master>
</slaveserver>
carte-config-8082.xml
<masters>
<slaveserver>
<name>master</name>
<hostname>hadoop102</hostname>
<port>8080</port>
<username>cluster</username>
<password>cluster</password>
<master>Y</master>
</slaveserver>
</masters>
<report_to_masters>Y</report_to_masters>
<slaveserver>
<name>slave2</name>
<hostname>hadoop104</hostname>
<port>8082</port>
<username>cluster</username>
<password>cluster</password>
<master>N</master>
</slaveserver>
分发
xsync data-integration
启动
./carte.sh hadoop102 8080
./carte.sh hadoop103 8081
./carte.sh hadoop104 8082
访问web
http://hadoop102:8080
ClickHouse(列式数据库,在线处理)
安装前准备
CentOS取消打开文件数限制
vim /etc/security/limits.conf
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
vim /etc/security/limits.d/90-nproc.conf
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
CentOS取消SELINUX
vim /etc/selinux/config
SELINUX=disabled
关闭防火墙
service iptables stop
安装依赖
yum install -y libtool
yum install -y *unixODBC*
yum install libicu.x86_64
重启
单机模式
上传文件到/opt/software/ck
安装4个文件
rpm -ivh clickhouse-common-static-19.7.3.9-1.el6.x86_64.rpm
rpm -ivh clickhouse-server-common-19.7.3.9-1.el6.x86_64.rpm
rpm -ivh clickhouse-server-19.7.3.9-1.el6.x86_64.rpm
rpm -ivh clickhouse-client-19.7.3.9-1.el6.x86_64.rpm
启动ClickServer及client连接
service clickhouse-server start
clickhouse-client
分布式集群安装
同步CentOs配置
xsync /etc/security/limits.conf
xsync /etc/security/limits.d/90-nproc.conf
xsync /etc/selinux/config
三台机器分别修改config.xml(不可分发)
vim /etc/clickhouse-server/config.xml
<listen_host>::</listen_host>
三台机器新建metrika.xml
vim /etc/metrika.xml
<yandex>
<clickhouse_remote_servers>
<clickhouse_cluster>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>hadoop131</host>
<port>9000</port>
</replica>
</shard>
<shard>
<replica>
<internal_replication>true</internal_replication>
<host>hadoop145</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>hadoop146</host>
<port>9000</port>
</replica>
</shard>
</clickhouse_cluster>
</clickhouse_remote_servers>
分支主题
分支主题
<zookeeper-servers>
<node index="1">
<host>hadoop131</host>
<port>2181</port>
</node>
分支主题
<node index="2">
<host>hadoop145</host>
<port>2181</port>
</node>
<node index="3">
<host>hadoop146</host>
<port>2181</port>
</node>
</zookeeper-servers>
分支主题
<macros>
<replica>hadoop102</replica>
</macros>
分支主题
分支主题
<networks>
<ip>::/0</ip>
</networks>
分支主题
分支主题
<clickhouse_compression>
<case>
<min_part_size>10000000000</min_part_size>
分支主题
<min_part_size_ratio>0.01</min_part_size_ratio>
<method>lz4</method>
</case>
</clickhouse_compression>
分支主题
</yandex>
启动
先行启动zookeeper
启动服务
select * from system.clusters
如何卸载
rpm -qa | grep clickhouse
rpm -e 包名
rpm -e clickhouse-client-19.7.3.9-1.el6.x86_64
rpm -e clickhouse-server-19.7.3.9-1.el6.x86_64
rpm -e clickhouse-common-static-19.7.3.9-1.el6.x86_64
rpm -e clickhouse-server-common-19.7.3.9-1.el6.x86_64
DataX(异构数据源离线同步工具)
前置环境
Linux
JDK1.8
Python2.6.x
解压即用
MongoDB(数据库)
上传压缩包到虚拟机并解压
重命名
mv mongodb-linux-x86_64-4.0.10/ mongodb
创建数据库目录
sudo mkdir -p /data/db
sudo chmod 777 -R /data/db/
启动服务
bin/mongod
bin/mongo
docker
安装
docker search mongo
docker pull mongo
docker images mongo
docker run -p 27017:27017 -v $PWD/db:/data/db -d mongo:latest
docker exec -it contianer id bin/bash
mongod
mongo
Elasticsearch和Kibana
Elasticsearch
拉取并运行镜像
docker pull docker.elastic.co/elasticsearch/elasticsearch:7.2.0
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.2.0
docker ps
运行elastic
docker exec -it contaner id /bin/bash
测试是否运行
curl http://localhost:9200
http://localhost:9200
Kibana
拉取并运行镜像
docker pull docker.elastic.co/kibana/kibana:7.2.0
docker run --link YOUR_ELASTICSEARCH_CONTAINER_NAME_OR_ID:elasticsearch -p 5601:5601 docker.elastic.co/kibana/kibana:7.2.0
运行Kibana
bin/kibana
docker exec -it contaner id /bin/bash
测试是否运行
http://localhost:5601
Presto(即席查询)
Presto Server安装
下载并解压,修改名称为 presto
在presto目录下创建 data文件夹和etc文件夹
进入etc文件夹
vim jvm.config
-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
mkdir catalog
vim hive.properties
connector.name=hive-hadoop2
hive.metastore.uri=thrift://hadoop131:9083
分发
进入etc文件夹
vim node.properties
hadoop131
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/opt/module/presto/data
hadoop145
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-fffffffffffe
node.data-dir=/opt/module/presto/data
hadoop146
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-fffffffffffd
node.data-dir=/opt/module/presto/data
vim config.properties
hadoop131
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8881
query.max-memory=50GB
discovery-server.enabled=true
discovery.uri=http://hadoop131:8881
hadoop145
coordinator=false
http-server.http.port=8881
query.max-memory=50GB
discovery.uri=http://hadoop131:8881
hadoop146
coordinator=false
http-server.http.port=8881
query.max-memory=50GB
discovery.uri=http://hadoop131:8881
启动Hive Metastore
nohup bin/hive --service metastore >/dev/null 2>&1 &
后台启动 Presto Server
bin/launcher start
Presto命令行Client安装
上传到presto文档下,赋予权限,更名为prestocli
启动
/prestocli --server hadoop131:8881 --catalog hive --schema default
查询测试
select * from schema.table limit 100
Presto可视化Client安装
下载并解压
进入conf目录
vim yanagishima.properties
jetty.port=7080
presto.datasources=atguigu-presto
presto.coordinator.server.atguigu-presto=http://hadoop131:8881
catalog.atguigu-presto=hive
schema.atguigu-presto=default
sql.query.engines=presto
后台启动
nohup bin/yanagishima-start.sh >y.log 2>&1 &
页面启动
http://hadoop131:7080
Druid(即席查询)
Kylin(即席查询)
解压即可用
需要在/etc/profile中
##HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
分支主题
#HBASE_HOME
export HBASE_HOME=/opt/module/hbase-1.3.1
export PATH=$PATH:$HBASE_HOME/bin
分支主题
#HIVE_HOME
export HIVE_HOME=/opt/module/hive
export PATH=$PATH:$HIVE_HOME/bin
但在使用之前需要开启
hadoop
historyserver
zookeeper
hbase
Spark
Local模式
下载解压即用
Standalone模式
下载解压
更改配置文件
cd conf
mv slaves.template slaves
vim slaves
hadoop131
hadoop145
hadoop146
mv spark-env.sh.template spark-env.sh
vim spark-env.sh
SPARK_MASTER_HOST=hadoop131
SPARK_MASTER_PORT=7077
JobHistoryServer设置
mv spark-defaults.conf.template spark-defaults.conf
vi spark-defaults.conf
spark.eventLog.enabled true
spark.eventLog.dir hdfs://hadoop131:9000/directory
在HDFS上提前创建文件夹
hadoop fs -mkdir /directory
vi spark-env.sh
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080
-Dspark.history.retainedApplications=30
-Dspark.history.fs.logDirectory=hdfs://hadoop131:9000/directory"
HA配置
vi spark-env.sh
注释掉如下内容:
#SPARK_MASTER_HOST=hadoop131
#SPARK_MASTER_PORT=7077
添加上如下内容:
export SPARK_DAEMON_JAVA_OPTS="
-Dspark.deploy.recoveryMode=ZOOKEEPER
-Dspark.deploy.zookeeper.url=hadoop131,hadoop145,hadoop146
-Dspark.deploy.zookeeper.dir=/spark"
分发
xsync spark/
启动
sbin/start-all.sh
hadoop131:8080
sbin/start-master.sh
如遇java_home not set异常
vim spark-config.sh
export JAVA_HOME=XXXX
Yarn模式
修改hadoop的yarn-site.xml
vi yarn-site.xml
<!--是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<!--是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
修改spark-env.sh
vi spark-env.sh
YARN_CONF_DIR=/opt/module/hadoop-2.7.2/etc/hadoop
日志查看
vim spark-default.conf
spark.yarn.historyServer.address=hadoop131:18080
spark.history.ui.port=18080
分发
启动
sbin/start-history-server.sh
Solr
下载解压,更改名称
修改配置文件
vim solr/bin/solr.in.sh
#添加下列指令
ZK_HOST="hadoop102:2181,hadoop103:2181,hadoop104:2181"
SOLR_HOST="hadoop102"
# Sets the port Solr binds to, default is 8983
#可修改端口号
SOLR_PORT=8983
同步后进行微调
xsync solr
vim solr/bin/solr.in.sh
solr_host = 主机名
启动
Atlas
安装流程
下载解压改名
集成外部框架
集成Hbase
更改配置文件
vim atlas-application.properties
atlas.graph.storage.hostname=hadoop102:2181,hadoop103:2181,hadoop104:2181
添加Hbase集群配置
ln -s /opt/module/hbase/conf/ /opt/module/atlas/conf/hbase/
增加Hbase路径
vim atlas-env.sh
export HBASE_CONF_DIR=/opt/module/atlas/conf/hbase/conf
集成Solr
更改配置文件
将Atlas自带的Solr文件夹拷贝到外部Solr集群的各个节点
cp -r /opt/module/atlas/conf/solr /opt/module/solr/
修改拷贝文件名称
cd solr目录下
mv solr atlas_conf
集成Kafka
集成Hive
其他设置
编译Atlas
0 条评论
下一页