-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Distribution Name | Ubuntu Focal Fossa
Distribution Version | 20.04
Linux Kernel | 5.4.0
Architecture | AMD64
ZFS Version | 0.8.3-1ubuntu12
SPL Version | 0.8.3-1ubuntu12
As per this question on Ask Ubuntu ZFS, after doing a bunch of big file manipulations, is reporting sizable memory usage for cache purposes as "green" kernel "memory in use" on htop, this case 20GB (after fresh boot this was 5GB):
... and on smem:
tbrowne@RyVe:~$ smem -tw
Area Used Cache Noncache
firmware/hardware 0 0 0
kernel image 0 0 0
kernel dynamic memory 20762532 435044 20327488
userspace memory 2290448 519736 1770712
free memory 42823220 42823220 0
----------------------------------------------------------
65876200 43778000 22098200
This large memory usage on a 64GB system is not a problem as it's mainly being used for cache, as I understand, and indeed, on the same system (as per htop which shows said system only has 2GB of swap), I am able to allocate around a full 64GB to this userspace script in Python3 (must first pip3 install numpy):
import numpy as np
xx = np.random.rand(10000, 12500)
import sys
sys.getsizeof(xx)
# 1000000112
# that's about 1 GB
ll = []
index = 1
while True:
print(index)
ll.append(np.random.rand(10000, 12500))
index = index + 1
In other words, ZFS is not really using up all that RAM, because said RAM is freed nicely for userspace programs when needed. The problem though is, this is not really accurately reported by either htop or the system monitor, or most other memory in use programs. One has to go as far as parsing the output of ps aux to get a real picture (ps aux | awk '{print $6/1024}' | paste -s -d+ - | bc).
Given that ZFS is a very good usage experience on the new Ubuntu 20.04, which is a "mass use distribution" and that many people, notwithstanding the "experimental" tag, will use it, would it be worth ensuring that this cache usage is reported correctly as cache (yellow bars on htop's memory graph) or buffer (blue bars).
Describe how to reproduce the problem
First reboot the system and note memory use on htop. Then run this python3 script, ensuring numpy and pandas are installed:
import numpy as np
import pandas as pd
pd.DataFrame(np.random.rand(10000, 10000)).to_csv("bigrand.csv")
This will create a 1.8GB CSV file full of random numbers. At a bash prompt, concatenate its 40 times to itself to create a 72GB file:
for i in {1..40}; do cat bigrand.csv >> biggest.csv; done
The creation of a huge file in this way will cause ZFS to ramp up its RAM usage as described above, showing it as green bars / kernel memory in use. Again if one does this and then runs the earlier Python script, which allocates 1GB at a time, you'll see that this ZFS cache RAM gets freed in favour of the userspace script, which is consistent with what a cache would do, and therefore since it is a cache, it should show up as such in htop and other similar tools.
The risk is to ZFS's reputation as a RAM hog which is not really true, as I think I've shown here.
