✨ 关于我 ✨
👨💻 Hi there! 我是 [Jamson],一名热爱编程与技术的狂热者,致力于前后端的全栈独立软件系统开发。通过不断学习和实践,我希望将知识分享给更多的朋友们,和大家一起成长。 💡
📫 联系我
如果你对我的文章有所启发,或者想要交流技术、合作项目,欢迎随时与我联系! 🌟
作者微信: 📱 donegals 📱
“Coding is not just a job; it’s a lifestyle!” 🚀
✨ 期待与你的交流与合作!一起探索更精彩的编程世界! ✨
🌟 关注我不迷路 🌟
Flume-Kafka-Storm-HBase集成demo
- flume:日志收集组件
- kafka:高吞吐量的流数据处理组件
- storm:实时流计算系统
- hbase:面向列的数据库,随机定位实时读写
经典的一个应用场景,flume收集不同来源类型的数据,flume-sink传输数据到kafka主题序列,storm向kafka缓存的主题列表拉取数据进行计算处理,然后数据在bolt计算过程中存储至hbase中。
该demo中使用的四台节点,s1 - s4, 对应ip为192.168.10.201 - 192.168.10.204
一、准备flume配置文件
这里我使用的是文件目录监控方式的source
agent.sources = r1
agent.channels = c1
agent.sinks = k1
# sources
agent.sources.r1.type = spooldir
# 监控的文件目录
agent.sources.r1.spoolDir = /home/centos/soft/flume/flumeSpool
agent.sources.r1.fileHeader = true
# 配置kafkaSink
agent.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
# kafka主题名称(如未手动创建,系统会自动创建)
agent.sinks.k1.kafka.topic = test
agent.sinks.k1.kafka.bootstrap.servers = s1:9092
agent.sinks.k1.kafka.flumeBatchSize = 20
agent.sinks.k1.kafka.producer.acks = 1
agent.sinks.k1.kafka.producer.linger.ms = 1
agent.sinks.k1.kafka.producer.compression.type = snappy
# 内存channels
agent.channels.c1.type=memory
# 将source-channel-sink进行绑定
agent.sources.r1.channels=c1
agent.sinks.k1.channel=c1
二、准备Strom代码
1、引入maven依赖
<!-- kafka -->
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.11</artifactId>
<version>0.9.0.0</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>0.9.0.0</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- storm-kafka -->
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-kafka</artifactId>
<version>1.2.2</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-kafka-client</artifactId>
<version>1.2.2</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- storm -->
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>1.2.2</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- hbase -->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>2.1.3</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
2、App.java
package top.it1002.stormhbase;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.kafka.*;
import org.apache.storm.spout.SchemeAsMultiScheme;
import org.apache.storm.topology.TopologyBuilder;
import java.util.HashMap;
import java.util.Map;
import java.util.UUID;
public class App {
public static void main(String[] args){
// 准备kafka spout
String zkStr = "192.168.10.201:2181,192.168.10.202:2181,192.168.10.203:2181";
BrokerHosts brokerHosts = new ZkHosts(zkStr);
String topic = "test";
SpoutConfig spoutConfig = new SpoutConfig(brokerHosts, topic, "/" + topic, UUID.randomUUID().toString());
spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
KafkaSpout kafkaSpout = new KafkaSpout(spoutConfig);
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("kafaka-spout", kafkaSpout);
builder.setBolt("split-bolt", new SplitBolt()).shuffleGrouping("kafaka-spout");
builder.setBolt("hbase-bolt", new MyHBaseBolt()).shuffleGrouping("split-bolt");
LocalCluster cluster = new LocalCluster();
Config config = new Config();
Map<String, Object> hbaseConf = new HashMap<String, Object>();
hbaseConf.put("hbase.rootdir","hdfs://192.168.10.201:9000/hbase");
hbaseConf.put("hbase.zookeeper.quorum", "192.168.10.201:2181,192.168.10.202:2181,192.168.10.203:2181");
config.put("hbase.conf",hbaseConf);
cluster.submitTopology("kafka-storm-hbase", config, builder.createTopology());
}
}
3、SplitBolt.java
package top.it1002.stormhbase;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import java.util.Map;
public class SplitBolt implements IRichBolt {
private TopologyContext context;
private OutputCollector collector;
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
this.context = context;
this.collector = collector;
}
public void execute(Tuple input) {
String s = input.getString(0);
String[] arr = s.split(" ");
try{
String name = arr[0];
String age = arr[1];
collector.emit(new Values(name, age));
}catch (Exception e){
}
}
public void cleanup() {
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("name", "age"));
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
4、MyHBaseBolt.java
package top.it1002.stormhbase;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Tuple;
import java.util.Map;
public class MyHBaseBolt implements IRichBolt {
private TopologyContext context;
private OutputCollector collector;
private Table tb;
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
this.context = context;
this.collector = collector;
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.rootdir", "hdfs://192.168.10.201:9000/hbase");
// the node of zookeeper
conf.set("hbase.zookeeper.quorum", "192.168.10.201:2181,192.168.10.202:2181");
try {
Connection conn = ConnectionFactory.createConnection(conf);
TableName tableName = TableName.valueOf("it1002:logs");
tb = conn.getTable(tableName);
} catch (Exception e) {
e.printStackTrace();
}
}
public void execute(Tuple input) {
String name = input.getString(0);
String age = input.getString(1);
String rowkey = name + ":" + age;
byte[] row = Bytes.toBytes(rowkey);
Put put = new Put(row);
put.addColumn(Bytes.toBytes("f1"), Bytes.toBytes("name"), Bytes.toBytes(name));
put.addColumn(Bytes.toBytes("f1"), Bytes.toBytes("age"), Bytes.toBytes(age));
try {
tb.put(put);
} catch (Exception e) {
e.printStackTrace();
}
}
public void cleanup() {
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
三、启动集群组件进程
- 启动hadoop:hadoop/sbin/start-all.sh
- 启动zookeeper:zkServer.sh start (在安装有zk的每台机器上执行至少两台启动)
- 启动hbase:hbase/bin/start-hbase.sh
- 启动kafka:kafka-server-start.sh
- 启动storm:
numbus:storm nimbus &
supervisor:storm supervisor & - 启动flume:flume/flume-ng agent -f flume/conf/file_memory_kafka.conf(flume配置文件名称) -n agent(配置文件中agent名称)
启动后进程列表如下: