PiflowX-TopN组件

TopN组件

组件说明

按列排序的N个最小值或最大值。

有界性

batch streaming

计算引擎

flink

组件分组

common

端口

Inport:默认端口

outport:默认端口

组件属性

名称展示名称默认值允许值是否必填描述例子
column_listcolumn_list“*”查询字段name,age
partition_listpartition_list分区字段name,age
order_listorder_list排序字段name->asc,age-desc
tableNametableName表名test
topNumtopNumTopN的条⽬数10
conditionsconditions查询条件age > 10 and name = ‘test’
isWindowisWindowfalseSet(“ture”, “false”)是否窗口TopNfalse

TopN示例配置

{
    "flow":{
        "name":"topDemo",
        "uuid":"59e583d074b44985aee2ad70a2547a77",
        "runMode":"DEBUG",
        "paths":[
            {
                "inport":"",
                "from":"JDBCRead",
                "to":"TopN",
                "outport":""
            },
            {
                "inport":"",
                "from":"TopN",
                "to":"ShowChangeLogData",
                "outport":""
            }
        ],
        "environment":{
            "runtimeMode":"batch"
        },
        "engineType":"flink",
        "stops":[
            {
                "name":"JDBCRead",
                "bundle":"cn.piflow.bundle.flink.jdbc.JDBCRead",
                "uuid":"31998ca1fbea45b5a6799a6150300cf6",
                "properties":{
                    "url":"jdbc:mysql://192.168.186.100:3306/test",
                    "username":"root",
                    "fetchSize":"100",
                    "driver":"",
                    "properties":{

                    },
                    "tableName":"source_table",
                    "tableDefinition":{
                        "tableBaseInfo":{
                            "registerTableName":"source_table"
                        },
                        "physicalColumnDefinition":[
                            {
                                "columnName":"name",
                                "columnType":"STRING"
                            },
                            {
                                "columnName":"search_cnt",
                                "columnType":"BIGINT"
                            },
                            {
                                "columnName":"key",
                                "columnType":"STRING"
                            },
                            {
                                "columnName":"row_time",
                                "columnType":"TIMESTAMP"
                            }
                        ],
                        "asSelectStatement":{

                        },
                        "likeStatement":{

                        }
                    },
                    "password":"123456"
                },
                "customizedProperties":{

                }
            },
            {
                "name":"TopN",
                "bundle":"cn.piflow.bundle.flink.common.TopN",
                "uuid":"08726631225d4dcca00dc532a83c7344",
                "properties":{
                    "partition_list":"key",
                    "conditions":"",
                    "order_list":"search_cnt->desc",
                    "column_list":"key, name, search_cnt, row_time",
                    "tableName":"source_table",
                    "topNum":"10",
                    "isWindow":"false"
                },
                "customizedProperties":{

                }
            },
            {
                "name":"ShowChangeLogData",
                "bundle":"cn.piflow.bundle.flink.common.ShowChangeLogData",
                "uuid":"6225e5ac18704ed0aa1c95e3fbe1ce28",
                "properties":{
                    "showNumber":"100"
                },
                "customizedProperties":{

                }
            }
        ]
    }
}
参考示例

Flink SQL TopN语句详解 - 知乎 (zhihu.com)(https://zhuanlan.zhihu.com/p/665480015)

实际案例:取某个搜索关键词下的搜索热度前 10 名的词条数据。

输⼊数据为搜索词条数据的搜索热度数据,当搜索热度发⽣变化时,会将变化后的数据写⼊到数据源的 Kafka 中:

数据源 schema:
​
-- 字段名 备注
-- key 搜索关键词
-- name 搜索热度名称
-- search_cnt 热搜消费热度(⽐如 3000)
-- timestamp 消费词条时间戳
CREATE TABLE source_table (
 name STRING NOT NULL,
 search_cnt BIGINT NOT NULL,
 key STRING NOT NULL,
 row_time timestamp(3),
 WATERMARK FOR row_time AS row_time
) WITH (
 'connector' = 'filesystem', 
 'path' = 'file:///Users/hhx/Desktop/source_table.csv',
 'format' = 'csv'
);
​
A,100,a,2021-11-01 00:01:03
A,200,a,2021-11-02 00:01:03
A,300,a,2021-11-03 00:01:03
B,200,b,2021-11-01 00:01:03
B,300,b,2021-11-02 00:01:03
B,400,b,2021-11-03 00:01:03
C,300,c,2021-11-01 00:01:03
C,400,c,2021-11-02 00:01:03
C,500,c,2021-11-03 00:01:03
D,400,d,2021-11-01 00:01:03
D,500,d,2021-11-02 00:01:03
D,600,d,2021-11-03 00:01:03-- 数据汇 schema:
-- key 搜索关键词
-- name 搜索热度名称
-- search_cnt 热搜消费热度(⽐如 3000)
-- timestamp 消费词条时间戳
CREATE TABLE sink_table (
 key BIGINT,
 name BIGINT,
 search_cnt BIGINT,
 `timestamp` TIMESTAMP(3)
) WITH (
 ...
);-- DML 逻辑
INSERT INTO sink_table
SELECT key, name, search_cnt, row_time as `timestamp`
FROM (
 SELECT key, name, search_cnt, row_time, 
 -- 根据热搜关键词 key 作为 partition key,然后按照 search_cnt 倒排取前 2 名
 ROW_NUMBER() OVER (PARTITION BY key ORDER BY search_cnt desc) AS rownum
 FROM source_table)
WHERE rownum <= 2

演示视频

PiflowX-TopN组件_哔哩哔哩_bilibili

PiflowX-TopN组件

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

PiflowX

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值