由于celery是并发执行任务,当打印日志在同一个文件时,不同进程之间的日志就交错堆叠在了一起,想要查询日志回溯某个问题时,总是非常困难。
如果每条log都能带上当前task的ID,就会方便很多。
核心代码:
from celery._state import get_current_task
task = get_current_task()
if task and task.request:
task_id = task.request.id
task_name = task.name
一、准备环境
建议使用root账户操作,以免安装、配置、运行权限不够。
- 操作系统:推荐Debian系列(或其它Linux,celery4.0开始不再支持windows)
- redis版本:4.0.8 (sudo apt-get install redis-server, sudo service redis-server start)
- python版本:3.6.5
- celery:4.2.0 (使用pip3 install https://github.com/celery/celery/tarball/v4.2.0-136-gc1d0bfe 安装,pip install celery安装的包中的async模块和python关键字冲突)
- redis:3.0.1 (pip install redis)
二、编写程序
import logging
from celery._state import get_current_task
class Formatter(logging.Formatter):
"""Formatter for tasks, adding the task name and id."""
def format(self, record):
task = get_current_task()
if task and task.request:
record.__dict__.update(task_id='%s ' % task.request.id,
task_name='%s ' % task.name)
else:
record.__dict__.setdefault('task_name', '')
record.__dict__.setdefault('task_id', '')
return logging.Formatter.format(self, record)
root_logger = logging.getLogger() # 返回logging.root
root_logger.setLevel(logging.DEBUG)
# 将日志输出到文件
fh = logging.FileHandler('celery_worker.log') # 这里注意不要使用TimedRotatingFileHandler,celery的每个进程都会切分,导致日志丢失
formatter = Formatter('[%(task_name)s%(task_id)s%(process)s %(thread)s %(asctime)s %(pathname)s:%(lineno)s] %(levelname)s: %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
fh.setFormatter(formatter)
fh.setLevel(logging.DEBUG)
root_logger.addHandler(fh)
# 将日志输出到控制台
sh = logging.StreamHandler()
formatter = Formatter('[%(task_name)s%(task_id)s%(process)s %(thread)s %(asctime)s %(pathname)s:%(lineno)s] %(levelname)s: %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
sh.setFormatter(formatter)
sh.setLevel(logging.INFO)
root_logger.addHandler(sh)
class CeleryConfig(object):
BROKER_URL = 'redis://localhost:6379/0'
CELERY_RESULT_BACKEND = 'redis://localhost:6379/0'
CELERY_TASK_SERIALIZER = 'pickle' # " json从4.0版本开始默认json,早期默认为pickle(可以传二进制对象)
CELERY_RESULT_SERIALIZER = 'pickle'
CELERY_ACCEPT_CONTENT = ['json', 'pickle']
CELERY_ENABLE_UTC = True # 启用UTC时区
CELERY_TIMEZONE = 'Asia/Shanghai' # 上海时区
CELERYD_HIJACK_ROOT_LOGGER = False # 拦截根日志配置
CELERYD_MAX_TASKS_PER_CHILD = 1 # 每个进程最多执行1个任务后释放进程(再有任务,新建进程执行,解决内存泄漏)
import logging
from celery import Celery, platforms
import celeryconfig
import detail
platforms.C_FORCE_ROOT = True # 配置里设置了序列化类型为pickle,操作系统开启允许
app = Celery(__name__)
app.config_from_object(celeryconfig.CeleryConfig)
@app.task(bind=True)
def heavy_task(self, seconds=1):
logging.info("I'm heavy_task") # 默认使用logging.root
return detail.process_heavy_task(seconds)
import logging
import time
def process_heavy_task(seconds=1):
logging.info("I'm process_heavy_task") # 默认使用logging.root
time.sleep(seconds)
return True
三、启动并测试
- 新建shell窗口,启动celery服务
# ls
celeyconfig.py detail.py tasks.py
# celery worker -A tasks -l info
- 新建shell窗口,监控日志文件
# ls
celeyconfig.py celery_worker.log detail.py tasks.py
# tail -f celery_worker.log
- 新建shell窗口,调用celery任务
# ls
celeyconfig.py celery_worker.log detail.py tasks.py
# python3
>>> import tasks
>>> t = tasks.heavy_task.delay(3)
>>> t.result
True
四、结果截图