Flume-Kafka-Storm-HBase集成demo

本文档详细介绍了如何集成Flume、Kafka、Storm和HBase,实现日志数据的实时收集、处理和存储。通过Flume监控文件目录,将数据发送至Kafka,接着Storm从Kafka消费数据进行实时计算,最后将结果存储到HBase中。文中包含各组件的配置、代码编写和启动步骤。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >


✨ 关于我 ✨

👨‍💻 Hi there! 我是 [Jamson],一名热爱编程与技术的狂热者,致力于前后端的全栈独立软件系统开发。通过不断学习和实践,我希望将知识分享给更多的朋友们,和大家一起成长。 💡


📫 联系我

如果你对我的文章有所启发,或者想要交流技术、合作项目,欢迎随时与我联系! 🌟
作者微信: 📱 donegals 📱


“Coding is not just a job; it’s a lifestyle!” 🚀
期待与你的交流与合作!一起探索更精彩的编程世界!
🌟 关注我不迷路 🌟


Flume-Kafka-Storm-HBase集成demo

  • flume:日志收集组件
  • kafka:高吞吐量的流数据处理组件
  • storm:实时流计算系统
  • hbase:面向列的数据库,随机定位实时读写

经典的一个应用场景,flume收集不同来源类型的数据,flume-sink传输数据到kafka主题序列,storm向kafka缓存的主题列表拉取数据进行计算处理,然后数据在bolt计算过程中存储至hbase中。

该demo中使用的四台节点,s1 - s4, 对应ip为192.168.10.201 - 192.168.10.204

一、准备flume配置文件

这里我使用的是文件目录监控方式的source

agent.sources = r1
agent.channels = c1
agent.sinks = k1

# sources
agent.sources.r1.type = spooldir
# 监控的文件目录
agent.sources.r1.spoolDir = /home/centos/soft/flume/flumeSpool
agent.sources.r1.fileHeader = true

# 配置kafkaSink
agent.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
# kafka主题名称(如未手动创建,系统会自动创建)
agent.sinks.k1.kafka.topic = test
agent.sinks.k1.kafka.bootstrap.servers = s1:9092
agent.sinks.k1.kafka.flumeBatchSize = 20
agent.sinks.k1.kafka.producer.acks = 1
agent.sinks.k1.kafka.producer.linger.ms = 1
agent.sinks.k1.kafka.producer.compression.type = snappy

# 内存channels
agent.channels.c1.type=memory

# 将source-channel-sink进行绑定
agent.sources.r1.channels=c1
agent.sinks.k1.channel=c1

二、准备Strom代码

1、引入maven依赖

        <!-- kafka -->
        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka_2.11</artifactId>
            <version>0.9.0.0</version>
            <exclusions>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>log4j</groupId>
                    <artifactId>log4j</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka-clients</artifactId>
            <version>0.9.0.0</version>
            <exclusions>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <!-- storm-kafka -->
        <dependency>
            <groupId>org.apache.storm</groupId>
            <artifactId>storm-kafka</artifactId>
            <version>1.2.2</version>
            <exclusions>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.apache.storm</groupId>
            <artifactId>storm-kafka-client</artifactId>
            <version>1.2.2</version>
            <exclusions>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <!-- storm -->
        <dependency>
            <groupId>org.apache.storm</groupId>
            <artifactId>storm-core</artifactId>
            <version>1.2.2</version>
            <exclusions>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <!-- hbase -->
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-client</artifactId>
            <version>2.1.3</version>
            <exclusions>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
    </dependencies>

2、App.java

package top.it1002.stormhbase;

import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.kafka.*;
import org.apache.storm.spout.SchemeAsMultiScheme;
import org.apache.storm.topology.TopologyBuilder;

import java.util.HashMap;
import java.util.Map;
import java.util.UUID;

public class App {
    public static void main(String[] args){
        // 准备kafka spout
        String zkStr = "192.168.10.201:2181,192.168.10.202:2181,192.168.10.203:2181";
        BrokerHosts brokerHosts = new ZkHosts(zkStr);
        String topic = "test";
        SpoutConfig spoutConfig = new SpoutConfig(brokerHosts, topic, "/" + topic, UUID.randomUUID().toString());
        spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
        KafkaSpout kafkaSpout = new KafkaSpout(spoutConfig);

        TopologyBuilder builder = new TopologyBuilder();
        builder.setSpout("kafaka-spout", kafkaSpout);
        builder.setBolt("split-bolt", new SplitBolt()).shuffleGrouping("kafaka-spout");
        builder.setBolt("hbase-bolt", new MyHBaseBolt()).shuffleGrouping("split-bolt");

        LocalCluster cluster = new LocalCluster();
        Config config = new Config();
        Map<String, Object> hbaseConf = new HashMap<String, Object>();
        hbaseConf.put("hbase.rootdir","hdfs://192.168.10.201:9000/hbase");
        hbaseConf.put("hbase.zookeeper.quorum", "192.168.10.201:2181,192.168.10.202:2181,192.168.10.203:2181");
        config.put("hbase.conf",hbaseConf);
        cluster.submitTopology("kafka-storm-hbase", config, builder.createTopology());
    }
}

3、SplitBolt.java

package top.it1002.stormhbase;

import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;

import java.util.Map;

public class SplitBolt implements IRichBolt {
    private TopologyContext context;
    private OutputCollector collector;

    public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
        this.context = context;
        this.collector = collector;
    }

    public void execute(Tuple input) {
        String s = input.getString(0);
        String[] arr = s.split(" ");
        try{
            String name = arr[0];
            String age = arr[1];
            collector.emit(new Values(name, age));
        }catch (Exception e){
        }
    }

    public void cleanup() {

    }

    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("name", "age"));
    }

    public Map<String, Object> getComponentConfiguration() {
        return null;
    }
}

4、MyHBaseBolt.java

package top.it1002.stormhbase;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Tuple;

import java.util.Map;

public class MyHBaseBolt implements IRichBolt {
    private TopologyContext context;
    private OutputCollector collector;
    private Table tb;

    public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
        this.context = context;
        this.collector = collector;

        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.rootdir", "hdfs://192.168.10.201:9000/hbase");
        // the node of zookeeper
        conf.set("hbase.zookeeper.quorum", "192.168.10.201:2181,192.168.10.202:2181");
        try {
            Connection conn = ConnectionFactory.createConnection(conf);
            TableName tableName = TableName.valueOf("it1002:logs");
            tb = conn.getTable(tableName);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public void execute(Tuple input) {
        String name = input.getString(0);
        String age = input.getString(1);
        String rowkey = name + ":" + age;
        byte[] row = Bytes.toBytes(rowkey);
        Put put = new Put(row);
        put.addColumn(Bytes.toBytes("f1"), Bytes.toBytes("name"), Bytes.toBytes(name));
        put.addColumn(Bytes.toBytes("f1"), Bytes.toBytes("age"), Bytes.toBytes(age));
        try {
            tb.put(put);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public void cleanup() {

    }

    public void declareOutputFields(OutputFieldsDeclarer declarer) {

    }

    public Map<String, Object> getComponentConfiguration() {
        return null;
    }
}

三、启动集群组件进程

  • 启动hadoop:hadoop/sbin/start-all.sh
  • 启动zookeeper:zkServer.sh start (在安装有zk的每台机器上执行至少两台启动)
  • 启动hbase:hbase/bin/start-hbase.sh
  • 启动kafka:kafka-server-start.sh
  • 启动storm:
    numbus:storm nimbus &
    supervisor:storm supervisor &
  • 启动flume:flume/flume-ng agent -f flume/conf/file_memory_kafka.conf(flume配置文件名称) -n agent(配置文件中agent名称)
    启动后进程列表如下:
    it1002

四、启动程序(步骤二中编写的App.java)

五、编辑测试内容,移动至flume配置文件中监控目录下

it1002

六、查看hbase中测试表数据(表需在程序之前手动创建)

it1002

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值