The initial issue was “fixed” with a workaround using a NuttX feature (or hook) that enables some function be called during the task exit (when CONFIG_SCHED_ONEXIT is enabled). More info: https://github.com/apache/incubator-nuttx/pull/4092/
Unfortunately this workaround stopped to work few days ago when on_exit() function was moved from kernel side to user side: https://github.com/apache/incubator-nuttx/pull/6197
Then my colleague Gustavo decided this issue with a proper solution: https://github.com/apache/incubator-nuttx/pull/6322
He also gave a detailed explanation about the root causes of the issue:
The Wi-Fi library creates a new semaphore for every thread that performs connection operations, so we cannot have a global pointer.
Since these semaphores are thread-local, this motivated the initial implementation based onpthread_key_t
, so that the semaphores were being stored in Thread Local Storage, and then could be destroyed on thread termination. The solution here was based on the one implemented for ESP-IDF, which works as expected.
But on NuttX it resulted in the semaphores not being destroyed. I'll try to explain why.
The Wi-Fi library operates in a dedicated Kernel Thread, namedwifi
. But thepthread_key_t
and the destructor for the semaphores were allocated to the Thread Local Storage of theinit
thread.
Every network-related request from the application will be handled by thewifi
kernel thread and its child pthreads. The issue is that those child pthreads do not belong to the same Task Group from theinit
thread, which is the one whose TLS area contains the semaphore destructor.
So the catch here is that NuttX provides this process-like abstraction which segregates pthreads created from different tasks. So a pthread created from Task B won't be able to share keys of type pthread_key_t
with another pthread from Task A.
That is it, now the leakage is fixed!