背景
最近看看开源的元数据管理项目,比较出名点的有open-metadata、datahub、OpenLineage、atlas。
open-metadata有1千多的贡献者,4.8K的stars,社区现在也比较活跃,支持的数据库类型还蛮多,基本市面上常见的都有支持,项目迭代也比较快,正好最近我自己也在搞数据资产治理这块,打算本地部署一下研究研究。
按照官网的介绍,采用docker部署最简单,打算采用docker部署的方式本地部署一下。但是由于各种不好直接说的原因,国内通过docker部署应用变得极其不方便(就像有个网友说的:没有困难我们要制造困难,然后迎难而上!)。
- open-metadata官网:https://open-metadata.org/
环境准备
我是在虚拟机里面弄了一个centos7来部署。
- centos7修改yum源为阿里源
sudo mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup
sudo curl -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
sudo yum clean all
sudo yum makecache
docker安装
- 安装基础依赖
yum install -y yum-utils device-mapper-persistent-data lvm2 --skip-broken
- 设置docker镜像源
yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
sed -i 's/download.docker.com/mirrors.aliyun.com\/docker-ce/g' /etc/yum.repos.d/docker-ce.repo
yum makecache fast
- 安装docker
yum install -y docker-ce
## 启动docker
sudo systemctl start docker
## 查看docker版本
docker -v
## 查看docker基本信息
docker info
docker-compose安装
docker-compose是docker的服务编排工具,其安装依赖Python3.
- python3安装
yum install -y python3 python3-devel python3-libs python3-tools
python3 -m ensurepip
python3 -m pip install --upgrade pip
python3 -V
Python 3.6.8
- 安装docker-compose
pip install docker-compose
镜像准备
国内docker镜像源几乎全军覆没,阿里还有个免费私有镜像服务可以用(良心啊),允许管理300个镜像,对个人来说也够用了。能够通过github的流水线(cicd)能力自动拉取docker hub的镜像然后推送到阿里私有镜像库。
阿里私有docker镜像服务地址:
https://cr.console.aliyun.com/
可以参看这位老铁的博客,https://blog.csdn.net/weixin_59164654/article/details/139601846。
github也有一个现成的开源项目,可以fork到自己名下配置使用。
- 项目地址
https://github.com/tech-shrimp/docker_image_pusher - 项目使用教学视频
https://www.bilibili.com/video/BV1Py411877t
下载open-metadata docker-compose.yml
可以到open-metadata 的github主页找到最新版本。
https://github.com/open-metadata/OpenMetadata/releases/tag/1.4.3-release
在docker-compose.yml中我们发现有4个镜像文件
- image: docker.getcollate.io/openmetadata/db:1.4.3
- image: docker.elastic.co/elasticsearch/elasticsearch:8.10.2
- image: docker.getcollate.io/openmetadata/server:1.4.3
- image: docker.getcollate.io/openmetadata/ingestion:1.4.3
这4个docker镜像用上面提到的docker_image_pusher方式转储到自己的阿里私有docker仓库中心。
在虚拟机登录阿里私有镜像库拉取转储后的镜像。
## 登录
docker login --username=username@163.com registry.cn-hangzhou.aliyuncs.com
## 拉取镜像
docker pull registry.cn-hangzhou.aliyuncs.com/itclj/db:1.4.3
docker pull registry.cn-hangzhou.aliyuncs.com/itclj/elasticsearch:8.10.2
docker pull registry.cn-hangzhou.aliyuncs.com/itclj/server:1.4.3
docker pull registry.cn-hangzhou.aliyuncs.com/itclj/ingestion:1.4.3
通过阿里私有镜像库拉取镜像还是非常快的,几分钟都拉完了。
- 修改open-metadata 的docker-compose.yml,把镜像名称改为转储后的。
原镜像名称 | 转储后镜像名称 |
---|---|
docker.getcollate.io/openmetadata/db:1.4.3 | registry.cn-hangzhou.aliyuncs.com/itclj/db:1.4.3 |
docker.elastic.co/elasticsearch/elasticsearch:8.10.2 | registry.cn-hangzhou.aliyuncs.com/itclj/elasticsearch:8.10.2 |
docker.getcollate.io/openmetadata/server:1.4.3 | registry.cn-hangzhou.aliyuncs.com/itclj/server:1.4.3 |
docker.getcollate.io/openmetadata/ingestion:1.4.3 | registry.cn-hangzhou.aliyuncs.com/itclj/ingestion:1.4.3 |
- 修改前后的docker-compose.yml比较。
- 修改后的 itclj-docker-compose.yml
# Copyright 2021 Collate
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
version: "3.9"
volumes:
ingestion-volume-dag-airflow:
ingestion-volume-dags:
ingestion-volume-tmp:
es-data:
services:
mysql:
container_name: openmetadata_mysql
image: registry.cn-hangzhou.aliyuncs.com/itclj/db:1.4.3
command: "--sort_buffer_size=10M"
restart: always
environment:
MYSQL_ROOT_PASSWORD: itclj123456
expose:
- 3306
ports:
- "3306:3306"
volumes:
- ./docker-volume/db-data:/var/lib/mysql
networks:
- app_net
healthcheck:
test: mysql --user=root --password=$$MYSQL_ROOT_PASSWORD --silent --execute "use openmetadata_db"
interval: 15s
timeout: 10s
retries: 10
elasticsearch:
container_name: openmetadata_elasticsearch
image: registry.cn-hangzhou.aliyuncs.com/itclj/elasticsearch:8.10.2
environment:
- discovery.type=single-node
- ES_JAVA_OPTS=-Xms1024m -Xmx1024m
- xpack.security.enabled=false
networks:
- app_net
ports:
- "9200:9200"
- "9300:9300"
healthcheck:
test: "curl -s http://localhost:9200/_cluster/health?pretty | grep status | grep -qE 'green|yellow' || exit 1"
interval: 15s
timeout: 10s
retries: 10
volumes:
- es-data:/usr/share/elasticsearch/data
execute-migrate-all:
container_name: execute_migrate_all
image: registry.cn-hangzhou.aliyuncs.com/itclj/server:1.4.3
command: "./bootstrap/openmetadata-ops.sh migrate"
environment:
OPENMETADATA_CLUSTER_NAME: ${
OPENMETADATA_CLUSTER_NAME:-openmetadata}
SERVER_PORT: ${
SERVER_PORT:-8585}
SERVER_ADMIN_PORT: ${
SERVER_ADMIN_PORT:-8586}
LOG_LEVEL: ${
LOG_LEVEL:-INFO}
# Migration
MIGRATION_LIMIT_PARAM: ${
MIGRATION_LIMIT_PARAM:-1200}
# OpenMetadata Server Authentication Configuration
AUTHORIZER_CLASS_NAME: ${
AUTHORIZER_CLASS_NAME:-org.openmetadata.service.security.DefaultAuthorizer}
AUTHORIZER_REQUEST_FILTER: ${
AUTHORIZER_REQUEST_FILTER:-org.openmetadata.service.security.JwtFilter}
AUTHORIZER_ADMIN_PRINCIPALS: ${
AUTHORIZER_ADMIN_PRINCIPALS:-[admin]}
AUTHORIZER_ALLOWED_REGISTRATION_DOMAIN: ${
AUTHORIZER_ALLOWED_REGISTRATION_DOMAIN:-["all"]}
AUTHORIZER_INGESTION_PRINCIPALS: ${
AUTHORIZER_INGESTION_PRINCIPALS:-[ingestion-bot]}
AUTHORIZER_PRINCIPAL_DOMAIN: ${
AUTHORIZER_PRINCIPAL_DOMAIN:-"openmetadata.org"}
AUTHORIZER_ENFORCE_PRINCIPAL_DOMAIN: ${
AUTHORIZER_ENFORCE_PRINCIPAL_DOMAIN:-false}
AUTHORIZER_ENABLE_SECURE_SOCKET: ${
AUTHORIZER_ENABLE_SECURE_SOCKET:-false}
AUTHENTICATION_PROVIDER: ${
AUTHENTICATION_PROVIDER:-basic}
AUTHENTICATION_RESPONSE_TYPE: ${
AUTHENTICATION_RESPONSE_TYPE:-id_token}
CUSTOM_OIDC_AUTHENTICATION_PROVIDER_NAME: ${
CUSTOM_OIDC_AUTHENTICATION_PROVIDER_NAME:-""}
AUTHENTICATION_PUBLIC_KEYS: ${
AUTHENTICATION_PUBLIC_KEYS:-[http://localhost:8585/api/v1/system/config/jwks]}
AUTHENTICATION_AUTHORITY: ${
AUTHENTICATION_AUTHORITY:-https://accounts.google.com}
AUTHENTICATION_CLIENT_ID: ${
AUTHENTICATION_CLIENT_ID:-""}
AUTHENTICATION_CALLBACK_URL: ${
AUTHENTICATION_CALLBACK_URL:-""}
AUTHENTICATION_JWT_PRINCIPAL_CLAIMS: ${
AUTHENTICATION_JWT_PRINCIPAL_CLAIMS:-[email,preferred_username,sub]}
AUTHENTICATION_ENABLE_SELF_SIGNUP: ${
AUTHENTICATION_ENABLE_SELF_SIGNUP:-true}
AUTHENTICATION_CLIENT_TYPE: ${
AUTHENTICATION_CLIENT_TYPE:-public}
#For OIDC Authentication, when client is confidential
OIDC_CLIENT_ID: ${
OIDC_CLIENT_ID:-