LWN's Latest Articles (No Paywall) About Linux Kernel
-
Patching until the COWs come home (part 2)
Part 1 of this series described the copy-on-write (COW) mechanism used to avoid unnecessary copying of pages in memory, then went into the details of a bug in that mechanism that could result in the disclosure of sensitive data. A patch written by Linus Torvalds and merged for the 5.8 kernel appeared to fix that problem without unfortunate side effects elsewhere in the system. But COW is a complicated beast and surprises are not uncommon; this particular story was nowhere near as close to an end as had been thought.
Torvalds's expectations quickly turned out to be overly optimistic. In August 2020, a bug was reported by Peter Xu; it affected userfaultfd(), which is a subsystem for handling page faults in a user-space process. This mechanism allows the handling process to (among other things) write-protect ranges of memory and be notified of attempts to write to that range. One use case for this feature is to prevent pages from being modified while the monitoring process writes their contents to secondary storage. That write can, however, result in a read-only get_user_pages() (GUP) call on the write-protected pages, which should be fine. Remember, though, that Torvalds's fix worked by changing read-only get_user_pages() calls to look like calls for write access; this was done to force the breaking of COW references on the pages in question. In the userfaultfd() case, that generates an unexpected write fault in the monitoring process, with the result that this process hangs.
The initial version of Xu's fix went in the direction of more fine-grained rules for breaking COW by GUP, as had been anticipated in the original fix, and added some userfaultfd()-specific handling. But during the discussion, Torvalds instead proposed a completely different approach, which resulted in another patch set from Xu. These patches essentially revert Torvalds's change and abandon the approach of always breaking COW for GUP calls. Instead, do_wp_page(), which handles write faults to a write-protected page, is modified by commit 09854ba94c6a ("mm: do_wp_page() simplification") to more strictly check if the page is shared by multiple processes.
-
Lockless patterns: some final topics
So far, this series has covered five common lockless patterns in the Linux kernel; those are probably the five that you will most likely encounter when working on Linux. Throughout this series, some details have been left out and some simplifications were made in the name of clarity. In this final installment, I will sort out some of these loose ends and try to answer what is arguably the most important question of all: when should you use the lockless patterns that have been described here?
[...]
ions are. In these cases, applying lockless techniques to the fast path can be valuable.For example, you could give each thread a queue of requests from other threads and manage them through single-consumer linked lists. Perhaps you can trigger the processing of requests using the cross-thread notification pattern from the article on full memory barriers. However, these techniques only make sense because the design of the whole system supports them. In other words, in a system that is designed to avoid scalability bottlenecks, common sub-problems tend to arise and can often be solved efficiently using the patterns that were presented here.
When seeking to improve the scalability of a system with lockless techniques, it is also important to distinguish between lock-free and wait-free algorithms. Lock-free algorithms guarantee that the system as a whole will progress, but do not guarantee that each thread will progress; lock-free algorithms are rarely fair, and if the number of operations per second exceeds a certain threshold, some threads might end up failing so often that the result is a livelock. Wait-free algorithms additionally ensure per-thread progress. Usually this comes with a significant price in terms of complexity, though not always; for example message passing and cross-thread notification are both wait-free.
Looking at the Linux llist primitives, llist_add() is lock-free; on the consumer side, llist_del_first() is lock-free, while llist_del_all() is wait-free. Therefore, llist may not be a good choice if many producers are expected to contend on calls to llist_add(); and using llist_del_all() is likely better than llist_del_first() unless constant-time consumption is an absolute requirement. For some architectures, the instruction set does not allow read-modify-write operations to be written as wait-free code; if that is the case, llist_del_all() will only be lock-free (but still preferable, because it lets the consumer perform fewer accesses to the shared data structure).
In any case, the definitive way to check the performance characteristics of your code is to benchmark it. Intuition and knowledge of some well-known patterns can guide you in both the design and the implementation phase, but be ready to be proven wrong by the numbers.
-
GDB and io_uring
A problem reported when attaching GDB to programs that use io_uring has led to a flurry of potential solutions, and one that was merged into Linux 5.12-rc5. The problem stemmed from a change made in the 5.12 merge window to how the threads used by io_uring were created, such that they became associated with the process using io_uring. Those "I/O threads" were treated specially in the kernel, but that led to the problem with GDB (and likely other ptrace()-using programs). The solution is to treat them like other threads because it turned out that trying to make them special caused more problems than it solved.
Stefan Metzmacher reported the problem to the io-uring mailing list on March 20. He tried to attach GDB to the process of a program using io_uring, but the debugger went "into an endless loop because it can't attach to the io_threads". PF_IO_WORKER threads are used by io_uring for operations that might block; he followed up the bug report with two patch sets that would hide these threads in various ways. The idea behind hiding them is that if GDB cannot see the threads, it will not attempt to attach to them. Prior to 5.12, the threads existed but were not associated with the io_uring-using process, so GDB would not see them.
It is, of course, less than desirable for developers to be unable to run a debugger on code that uses io_uring, especially since io_uring support in their application is likely to be relatively new, thus it may need more in the way of debugging. The maintainer of the io_uring subsystem, Jens Axboe, quickly stepped in to help Metzmacher solve the problem. Axboe posted a patch set that included a way to hide the PF_IO_WORKER threads, along with some tweaks to the signal handling for these threads; in particular, he removed the ability for them to receive signals at all.
- Login or register to post comments
- Printer-friendly version
- 2670 reads
- PDF version
More in Tux Machines
- Highlights
- Front Page
- Latest Headlines
- Archive
- Recent comments
- All-Time Popular Stories
- Hot Topics
- New Members
digiKam 7.7.0 is releasedAfter three months of active maintenance and another bug triage, the digiKam team is proud to present version 7.7.0 of its open source digital photo manager. See below the list of most important features coming with this release. |
Dilution and Misuse of the "Linux" Brand
|
Samsung, Red Hat to Work on Linux Drivers for Future TechThe metaverse is expected to uproot system design as we know it, and Samsung is one of many hardware vendors re-imagining data center infrastructure in preparation for a parallel 3D world. Samsung is working on new memory technologies that provide faster bandwidth inside hardware for data to travel between CPUs, storage and other computing resources. The company also announced it was partnering with Red Hat to ensure these technologies have Linux compatibility. |
today's howtos
|
Recent comments
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago