Flume

2021-04-12 09:01:25   0  举报





仅支持查看

AI智能生成

Flume笔记

flume

flume架构

模板推荐

作者其他创作

大纲/内容

概述

Cloudera 提供的一个高可用的，高可靠的，分布式的海量日志采集、聚合和传输的软件

运行机制

核心角色是 agent，agent 本身是一个 Java 进程， 一般运行在日志收集节点

Source：采集源，用于跟数据源对接，以获取数据

Sink：下沉地，采集数据的传送目的

Channel：agent 内部的数据传输通道，用于从 source 将数据传递到 sink

传输的是数据封装后的event，它是 Flume 内部数据传输的 最基本单元。如果是文本文件，通常是一行记录，event 也是事务的基本单位

Flume 采集系统结构图

子主题

Flume案例

采集网络端口数据到内存

# example.conf: A single-node Flume configuration # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444 # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1

执行命令：$ bin/flume-ng agent --conf conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console

采集目录数据到hdfs

#配置agent组件 a1.sources=r1 a1.sinks=k1 a1.channels=c1 #配置source a1.sources.r1.type=spooldir a1.sources.r1.spooldir=/var/log/apache/flumeSpool a1.sources.r1.fileHeader=true #配置sink a1.sinks.k1.type=hdfs a1.sinks.k1.hdfs.fileType=DataStream a1.sinks.k1.hdfs.path=/flume/events/%y-%m-%d/%H%M/%S a1.sinks.k1.hdfs.filePrefix=events- a1.sinks.k1.hdfs.round=true a1.sinks.k1.hdfs.roundValue = 10 a1.sinks.k1.hdfs.roundUnit = minute #配置channel a1.channels.c1.type = memory a1.channels.c1.capacity = 10000 a1.channels.c1.transactionCapacity = 10000 a1.channels.c1.byteCapacityBufferPercentage = 20 a1.channels.c1.byteCapacity = 800000 #绑定source sink 到channel a1.sources.r1.channels=c1 a1.sinks.k1.channel=c1

执行命令：$ bin/flume-ng agent --conf conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console

采集文件数据到hdfs

# Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = exec a1.sources.r1.command = tail -F /root/logs/test.log a1.sources.r1.channels = c1 # Describe the sink a1.sinks.k1.type = hdfs a1.sinks.k1.hdfs.path = /flume/tailout/%y-%m-%d/%H%M/ a1.sinks.k1.hdfs.filePrefix = eventsa1.sinks.k1.hdfs.round = true a1.sinks.k1.hdfs.roundValue = 10 a1.sinks.k1.hdfs.roundUnit = minute a1.sinks.k1.hdfs.rollInterval = 3 a1.sinks.k1.hdfs.rollSize = 20 a1.sinks.k1.hdfs.rollCount = 5 a1.sinks.k1.hdfs.batchSize = 1 a1.sinks.k1.hdfs.useLocalTimeStamp = true #生成的文件类型，默认是 Sequencefile，可用 DataStream，则为普通文本 a1.sinks.k1.hdfs.fileType = DataStream # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1

Flume 的 load-balance、failover

load-balance(负载均衡)

涉及flume多级启动的时候建议优先启动远离数据源的

failover（容错）

Flume 拦截器实战案例

日志的采集和汇总

Flume 自定义拦截器

Flume 高阶自定义组件

Flume 自定义 Source （扩展）

Flume 自定义 Sink（扩展））