Move nativeFree calls outside of the memory_mutex to avoid deadlock#2124
Move nativeFree calls outside of the memory_mutex to avoid deadlock#2124umar456 merged 1 commit intoarrayfire:masterfrom
Conversation
34259d2 to
0f95fc4
Compare
There was a problem hiding this comment.
Thank you for the code.
This is a little bit off-topic, sorry, but I have a design pattern question. Backend-specific MemoryManagers inherit from the common MemoryManager<T> and pass it their own type as the template parameter. This is done because you prefer to call backend-specific methods like nativeFree using static_cast:
inline void nativeFree(void *ptr)
{
static_cast<T*>(this)->nativeFree(ptr);
}instead of using a virtual nativeFree function.
I thought you wanted to avoid virtual table lookup for performance (I may be wrong, though, considering that in this PR you e.g., store the call to nativeFree in std::function that is performing a similar type of type erasure anyway.). However, the common MemoryManager has a virtual destructor, is there a reason for this? Because you cannot use it purely polymorphically anyway, it always has to be parametrized using the backend-specific manager type.
I am asking mostly out of curiosity.
src/backend/common/MemoryManager.hpp
Outdated
|
|
||
| for (auto &kv : current.free_map) { | ||
| size_t num_ptrs = kv.second.size(); | ||
| //Free memory by popping the last element |
There was a problem hiding this comment.
This comment is not true anymore.
src/backend/common/MemoryManager.hpp
Outdated
| for(auto& p : kv.second) { | ||
| free_ptrs.push_back(p); | ||
| } | ||
| current.total_bytes -= num_ptrs * kv.first; |
There was a problem hiding this comment.
(OCD - two spaces after -=)
src/backend/common/MemoryManager.hpp
Outdated
| // Frees the pointer outside the lock. | ||
| std::unique_ptr<void, std::function<void(void*)>> freed_ptr(nullptr, [this](void* p) { | ||
| this->nativeFree(p); | ||
| }); |
There was a problem hiding this comment.
(OCD - this lambda seems misindented)
| // Pointer not found in locked map | ||
| if (iter == current.locked_map.end()) { | ||
| // Probably came from user, just free it | ||
| freed_ptr.reset(ptr); |
There was a problem hiding this comment.
This is a nice trick with unique_ptr. 👍
src/backend/common/MemoryManager.hpp
Outdated
| std::vector<void *> ptrs; | ||
| ptrs.push_back(ptr); | ||
| current.free_map[bytes] = ptrs; | ||
| } |
There was a problem hiding this comment.
I believe this whole "regular mode" branch (L291-L301) can be replaced with this line:
current.free_map[bytes].push_back(ptr);because operator[] on mapping containers constructs the empty vector (default constructor) for you if the key does not exist.
There was a problem hiding this comment.
I agree with this ^ I was not aware of this quirk of C++ standard containers when I wrote this code.
0f95fc4 to
d04d0e0
Compare
src/backend/common/MemoryManager.hpp
Outdated
| unique_ptr<void, function<void(void*)>> freed_ptr(nullptr, | ||
| [this](void* p) { | ||
| this->nativeFree(p); | ||
| }); |
There was a problem hiding this comment.
Move this into the class definition of MemoryManager ? It could be useful elsewhere.
| // Pointer not found in locked map | ||
| if (iter == current.locked_map.end()) { | ||
| // Probably came from user, just free it | ||
| freed_ptr.reset(ptr); |
src/backend/common/MemoryManager.hpp
Outdated
| std::vector<void *> ptrs; | ||
| ptrs.push_back(ptr); | ||
| current.free_map[bytes] = ptrs; | ||
| } |
There was a problem hiding this comment.
I agree with this ^ I was not aware of this quirk of C++ standard containers when I wrote this code.
src/backend/common/MemoryManager.hpp
Outdated
| bool checkMemoryLimit() | ||
| { | ||
| const memory_info& current = this->getCurrentMemoryInfo(); | ||
| lock_guard_t lock(this->memory_mutex); |
There was a problem hiding this comment.
Why lock here ? Was getCurrentMemoryInfo locking previously ?
There was a problem hiding this comment.
It wasn't locking but you are accessing the memory_info struct and some of the tools were throwing warnings for these accesses. I can remove it if you think its unnecessary but I don't think it hurts.
This is a result of MemoryManager starting off as a polymorphic base class and then being converted to the templated version because we could not guarantee the order of destruction of certain objects which resulted in sgefaults on exits (I don't recall the specifics). I guess the virtual destructor is something that was missed when the conversion was done. |
1810a93 to
031b3e2
Compare
Narrows the scope of the memory_mutex locks to just around the modification and reading of the internal memory data members and elements. This means that actual allocations happen outside of this mutex's lock. This was done because the CPU backend requires that the free operation performs a sync before the pointer is freed. This caused deadlocks when the main thread was waiting on the worker thread and the worker thread called nativeFree. This scenario happens when a buffer node is only referenced in the worker thread and it is removed.
031b3e2 to
6105b16
Compare
…rrayfire#2124) Narrows the scope of the memory_mutex locks to just around the modification and reading of the internal memory data members and elements. This means that actual allocations happen outside of this mutex's lock. This was done because the CPU backend requires that the free operation performs a sync before the pointer is freed. This caused deadlocks when the main thread was waiting on the worker thread and the worker thread called nativeFree. This scenario happens when a buffer node is only referenced in the worker thread and it is removed.
Narrows the scope of the memory_mutex locks to just around the
modification and reading of the internal memory data members and
elements. This means that actual allocations happen outside of this
mutex's lock.
This was done because the CPU backend requires that the free operation
performs a sync before the pointer is freed. This caused deadlocks
when the main thread was waiting on the worker thread and the
worker thread called nativeFree. This scenario happens when a buffer
node is only referenced in the worker thread and it is removed.
Fixes #2115