一直知道python在线程进程混用的时候会因为因为子进程复制父进程的锁状态而导致死锁,下面是引自https://stackoverflow.com/questions/24509650/deadlock-with-logging-multiprocess-multithread-python-script的复现代码:
import multiprocessing
import threading
import logging
import time
SLEEP_TIME = 0.000001
logger = logging.getLogger()
ch = logging.StreamHandler()
ch.setFormatter(logging.Formatter('%(asctime)s %(levelname)s %(funcName)s(): %(message)s'))
ch.setLevel(logging.DEBUG)
logger.setLevel(logging.DEBUG)
logger.addHandler(ch)
class LoggingThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def run(self):
while True:
logger.debug('LoggingThread: {}'.format(self))
time.sleep(SLEEP_TIME)
def action(i):
logger.debug('action: {}'.format(i))
def do_parallel_job():
processes = multiprocessing.cpu_count()
pool = multiprocessing.Pool(processes=processes)
for i in range(20):
pool.apply_async(action, args=(i,))
pool.close()
pool.join()
if __name__ == '__main__':
logger.debug('START')
#
# multithread part
#
for _ in range(10):
lt = LoggingThread()
lt.setDaemon(True)
lt.start()
#
# multiprocess part
#
do_parallel_job()
logger.debug('FINISH')
解释是
This is probably bug 6721.
The problem is common in any situation where you have locks, threads and forks. If thread 1 had a lock while thread 2 calls fork, in the forked process, there will only be thread 2 and the lock will be held forever. In your case, that is
logging.StreamHandler.lock
.A fix can be found here (permalink) for the
logging
module. Note that you need to take care of any other locks, too.
今天也遇到了logging莫名卡死的情况:
具体情况是使用一个脚本去启动另一个脚本
import subprocess
def main():
sb_p = subprocess.Popen(
['python', 'test_load.py'],
shell=False,
stdin=subprocess.PIPE,
stderr=subprocess.PIPE,
)
sb_p.wait()
被启动的脚本打日志的时候hang住了。
想了各种办法,最后偶然间发现了规律,被启动的脚本test_load.py总是在输出日志的长度达到一个固定值的时候卡住——4096字节。
一点一点追踪下来,发现StreamHandler在初始化时候如果不指定,会默认把日志打到sys.stderr。
class StreamHandler(Handler):
"""
A handler class which writes logging records, appropriately formatted,
to a stream. Note that this class does not close the stream, as
sys.stdout or sys.stderr may be used.
"""
terminator = '\n'
def __init__(self, stream=None):
"""
Initialize the handler.
If stream is not specified, sys.stderr is used.
"""
Handler.__init__(self)
if stream is None:
stream = sys.stderr
self.stream = stream
而通过subprocess启动脚本的时候,参数stderr=subprocess.PIPE使得日志的输出都被放入管道缓冲区,linux的匿名管道的默认缓冲区大小是4k或者64k:
windows10的没有查到。根据测试结果,应该是4k。
In Linux versions before 2.6.11, the capacity of a pipe was the same as the system page size (e.g., 4096 bytes on i386). Since Linux 2.6.11, the pipe capacity is 16 pages (i.e., 65,536 bytes in a system with a page size of 4096 bytes). Since Linux 2.6.35, the default pipe capacity is 16 pages, but the capacity can be queried and set using the fcntl(2) F_GETPIPE_SZ and F_SETPIPE_SZ operations.
管道缓冲区被写满之后,后面的写入请求就都hang住了。
值得一提的是,github上面的这篇文章对我定位这个问题造成了误导,但跟着思路走下去确实加深了对logging模块的理解。
https://github.com/evanj/app-engine-log-deadlock
Thread/Request A:
- Calls
logging.info
which acquires thelogging.Handler
lock.logging.info
eventually calls the App Engine log hook:AppLogsHandler.emit
.AppLogsHandler.emit
callslogservice.write_record
, which decides to flush the log.logservice.py:382
(in_LogsDequeBuffer._flush
) is:bytes_left -= 1 + group.lengthString(line.ByteSize())
. The call togroup.lengthString
is a Cython function on App Engine Standard. For some reason, it requires the Python module import lock so it blocks.Thread/Request B:
- Calls
import x
: acquiring the Python import lock and calling the module code.- Module x (or something transitively executed by it) calls
logging.info()
.- This tries to acquire the lock on the
logging.Handler
installed by App Engine and blocks.