Language Selection

English French German Italian Portuguese Spanish

Linux Kernel: GPU Blobs, 5.8, 5.9 and ARM32 in Action

Filed under
Linux
  • Netgpu and the hazards of proprietary kernel modules

    On its face, the netgpu patch set appears to add a useful feature: the ability to copy network data directly between a network adapter and a GPU without moving it through the host CPU. This patch set has quickly become an example of how not to get work into the kernel, though; it has no chance of being merged in anything like its current form and has created a backlash designed to keep modules like it from ever working in mainline kernels. It all comes down to one fundamental mistake: basing kernel work on a proprietary kernel module.
    The use case for netgpu appears to be machine-learning applications that consume large amounts of data. The processing of this data is offloaded to a GPU for performance reasons. That GPU must be fed a stream of data, though, that comes from elsewhere on the network; this data follows the usual path of first being read into main memory, then written out to the GPU. The extra copy hurts, as does the memory-bus traffic and the CPU time needed to manage this data movement.

    This overhead could be significantly reduced if the network adapter were to write the data directly into the GPU's memory, which is accessible via the PCI bus. A suitably capable network adapter could place packet data in GPU memory while writing packet headers to normal host memory; that allows the kernel's network stack to do the protocol processing as usual. The netgpu patch exists to support this mode of operation, seemingly yielding improved performance at the cost of losing some functionality; anything that requires looking at the packet payload is going to be hard to support if that data is routed directly to GPU memory.

  • Some statistics from the 5.8 kernel cycle

    Linus Torvalds released the 5.8 kernel on August 2, concluding another nine-week development cycle. By the time the work was done, 16,306 non-merge changesets had been pulled into the mainline repository for this release. That happens to be a record, beating the previous record holder (4.9, released in December 2016) by 92 changesets. It was, in other words, a busy development cycle. It's time for our traditional look into where that work came from to see what might be learned.

    A total of 1,991 developers contributed to 5.8, which is another record; 304 of those developers appeared for the first time in this cycle. The community added over 924,000 lines of code and removed around 371,000 for a net growth of over 553,000 lines of code.

  • FUSE Read/Write Passthrough Updated For Much Better File-System Performance

    Of various criticisms around FUSE for implementing file-systems in user-space, one of the most prolific issues is around the performance generally being much lower than a proper file-system kernel driver. But with the FUSE passthrough functionality that continue to be worked on, there is the potential for much better FUSE file-system performance.

    The ongoing FUSE passthrough work is about allowing the passthrough read/write of files in avoiding at times unnecessary overhead of the user-space FUSE daemon. When operating in FUSE_PASSTHROUGH mode, the daemon can allow on a per-file basis opening in passthrough mode where all read and write operations are forwarded by the kernel directly to the lower file-system rather than to the FUSE daemon running in user-space.

  • Navi 2 Fixes, Other Graphics/Display Fixes Sent In For Linux 5.9

    Following all of the feature updates to the open-source GPU/DRM drivers for Linux 5.9 that included a lot of new material, the first batch of fixes have now been sent in for mainline in addressing early fallout from these many changes.

    Ahead of the Linux 5.9-rc1 release this weekend, an initial batch of Direct Rendering Manager fixes were sent out on Thursday.

  • How the ARM32 Linux kernel decompresses

    This is intended as a comprehensive rundown of how the Linux kernel self-decompresses on ARM 32-bit legacy systems. All machines under arch/arm/* uses this method if they are booted using a compressed kernel, and most of them are using compressed kernels.

  • Walleij: How the ARM32 Linux kernel decompresses

    For those who are into the details: here is a step-by-step guide through the process of decompressing an Arm kernel and getting ready to boot from Linus Walleij.

More in Tux Machines

KDE Plasma 5.18.6 LTS Brings WireGuard VPN, Wayland, and HiDPI Improvements

The KDE Project announced today the general availability of KDE Plasma 5.18.6 LTS as the sixth maintenance update to the long-term supported KDE Plasma 5.18 LTS desktop environment series. KDE Plasma 5.18.6 LTS is here almost five months after the KDE Plasma 5.18.5 update and brings a total of 36 changes that add various improvements to some of the core components and apps of the desktop environment in an attempt to keep the Plasma 5.18 LTS series stable, secure and reliable. Read more

Mesa 20.2.0

  • mesa 20.2.0
    Hi list,
    
    After a long wait, mesa 20.2.0 is now available. This is the first stable
    release of the series, but it's also been a very long time since the last
    release, and as such I'd like to reiterate that those looking for the most
    stable experience will likely want to wait for mesa 20.2.1.
    
    I'm back to the office and finally getting back in the swing of things after a
    long vacation, so expect more regular releases for the 20.2 series from here on
    out.
    
    shortlog
    ========
    
    Alyssa Rosenzweig (1):
          pan/bit: Set d3d=true for CMP tests
    
    Andrey Vostrikov (1):
          egl/x11: Free memory allocated for reply structures on error
    
    Bas Nieuwenhuizen (7):
          radv: Fix threading issue with submission refcounts.
          radv: Avoid deadlock on bo_list.
          spirv: Deal with glslang not setting NonUniform on constructors.
          radeonsi: Work around Wasteland 2 bug.
          spirv: Deal with glslang bug not setting the decoration for stores.
          ac/surface: Fix depth import on GFX6-GFX8.
          st/mesa: Deal with empty textures/buffers in semaphore wait/signal.
    
    Boris Brezillon (1):
          spirv: Add a vtn_get_mem_operands() helper
    
    Danylo Piliaiev (5):
          intel/compiler: Fix pointer arithmetic when reading shader assembly
          glsl: Eliminate assigments to out-of-bounds elements of vector
          nir/lower_io: Eliminate oob writes and return zero for oob reads
          nir/large_constants: Eliminate out-of-bounds writes to large constants
          nir/lower_samplers: Clamp out-of-bounds access to array of samplers
    
    Dave Airlie (2):
          llvmpipe: include gallivm perf flags in shader cache.
          gallivm: disable brilinear for lod bias and explicit lod.
    
    Dylan Baker (7):
          .pick_status.json: Update to ef980ac0c1cd65993ba0c1d20e1c09b45bfef99d
          fix: gallivm: disable brilenear for lod bias and explicit lod.
          .pick_status.json: Update to a1f46d7b6943699e5efb60fbcfdd1450db85adb1
          amd/ac_surface: convert tabs to 3 spaces
          .pick_status.json: Update to 90b98c06493f8a9759e5496d5ec91fb60edf7b92
          .pick_status.json: Update to 472a20c5fc0feda0f074b4ff95fd7c7a6305c8cd
          VERSION: bump for 20.2.0 release
    
    Eric Anholt (4):
          gallium/tgsi_exec: Fix up NumOutputs counting
          freedreno: Make the pack struct have a .qword for wide addresses.
          turnip: Fix truncation of CS shader iovas to 32 bits.
          turnip: Fix truncation of iovas to 32 bits in queries.
    
    Eric Engestrom (1):
          meson: drop leftover PTHREAD_SETAFFINITY_IN_NP_HEADER
    
    Erik Faye-Lund (1):
          mesa: handle GL_FRONT after translating to it
    
    Icecream95 (1):
          pan/mdg: Fix spilling of non-32-bit types
    
    Jason Ekstrand (6):
          intel/fs: Don't copy-propagate stride=0 sources into ddx/ddy
          iris: Re-emit push constants if we have a varying workgroup size
          spirv: Run repair_ssa if there are discard instructions
          nir: More NIR_MAX_VEC_COMPONENTS fixes
          intel/fs/swsb: SCHEDULING_FENCE only emits SYNC_NOP
          radeonsi: Only call nir_lower_var_copies at the end of the opt loop
    
    Jesse Natalie (2):
          nir: More NIR_MAX_VEC_COMPONENTS fixes
          glsl_type: Add packed to structure type comparison for hash map
    
    Jonathan Gray (6):
          anv: use os_get_total_physical_memory()
          util/os_misc: add os_get_available_system_memory()
          anv: use os_get_available_system_memory()
          util/os_misc: os_get_available_system_memory() for OpenBSD
          radv: remove seccomp includes
          vulkan: make VK_TIME_DOMAIN_CLOCK_MONOTONIC_RAW_EXT conditional
    
    Jordan Justen (1):
          anv, iris: Set MediaSamplerDOPClockGateEnable for gen12+
    
    Karol Herbst (1):
          spirv: extract switch parsing into its own function
    
    Lionel Landwerlin (3):
          intel/perf: store query symbol name
          intel/perf: fix raw query kernel metric selection
          intel/compiler: fixup Gen12 workaround for array sizes
    
    Marcin Ślusarz (4):
          anv: refresh cached current batch bo after emitting some commands
          anv: fix minor gen_ioctl(I915_PERF_IOCTL_CONFIG) error handling issue
          intel/perf: split load_oa_metrics
          intel/perf: export performance counters sorted by [group|set] and name
    
    Marek Olšák (2):
          ac/llvm: fix unaligned VS input loads on gfx10.3
          Revert "ac: generate FMA for inexact instructions for radeonsi"
    
    Mauro Rossi (1):
          android: freedreno/common: add libmesa_git_sha1 static dependency
    
    Michel Dänzer (1):
          ci: Use ignore_scheduled_pipelines anchor in .radeonsi-rules
    
    Michel Zou (1):
          swr: fix build with mingw
    
    Mike Blumenkrantz (1):
          zink: reorder create_stream_output_target to fix failure case leak
    
    Nanley Chery (2):
          iris: Fix aux assertion in resource_get_handle
          blorp: Fix alignment test for HIZ_CCS_WT fast-clears
    
    Pierre-Eric Pelloux-Prayer (9):
          mesa/st: introduce PIPE_CAP_NO_CLIP_ON_COPY_TEX
          radeonsi: enable PIPE_CAP_NO_CLIP_ON_COPY_TEX
          ac/llvm: add option to clamp division by zero
          radeonsi,driconf: add clamp_div_by_zero option
          radeonsi: use radeonsi_clamp_div_by_zero for SPECviewperf13, Road Redemption
          glsl: fix per_vertex_accumulator::fields size
          r600/uvd: set dec->bs_ptr = NULL on unmap
          radeon/vcn: set dec->bs_ptr = NULL on unmap
          mesa: fix glUniform* when a struct contains a bindless sampler
    
    Pierre-Loup A. Griffais (2):
          radv: fix null descriptor for dynamic buffers
          radv: fix vertex buffer null descriptors
    
    Qiang Yu (4):
          radeonsi: fix syncobj wait timeout
          radeonsi: fix user fence space when MCBP is enabled
          radeonsi: fix max syncobj wait timeout
          radeonsi: fix user fence GPU address
    
    Rhys Perry (7):
          aco: fix byte_align_scalar for 3 dword vectors
          aco: fix one-off error in Operand(uint16_t)
          nir/opt_if: fix opt_if_merge when destination branch has a jump
          aco: fix v_writelane_b32 with two sgprs
          aco: don't apply constant to SDWA on GFX8
          radv: initialize with expanded cmask if the destination layout needs it
          radv,aco: fix reading primitive ID in FS after TES
    
    Samuel Pitoiset (3):
          aco: handle unaligned loads on GFX10.3
          spirv: fix emitting switch cases that directly jump to the merge block
          radv: fix transform feedback crashes if pCounterBufferOffsets is NULL
    
    Timur Kristóf (1):
          aco: Fix emit_boolean_exclusive_scan in wave32 mode.
    
    Tony Wasserka (3):
          radv: Fix various non-critical integer overflows
          aco: Fix integer overflows when emitting parallel copies during RA
          amd/common: Fix various non-critical integer overflows
    
    Vinson Lee (4):
          freedreno: Fix file descriptor leak.
          svga: Fix unused printf argument.
          freedreno: Check file descriptor before write.
          panfrost: Delete debug allocated syncobj.
    
    
    git tag: mesa-20.2.0
    
  • Mesa 20.2 Released With RADV ACO By Default, Initial RDNA2 Graphics Support

    Mesa 20.2 has managed to release just before the end of the the quarter. This Mesa Q3'2020 graphics driver update is coming out about one month behind schedule but the wait is worthwhile given many open-source OpenGL and Vulkan driver updates. There is new GPU support, RADV is using the ACO shader compiler by default, much better LLVMpipe OpenGL support, new Vulkan extensions, and much more.

  • Open source graphics drivers get a boost with Mesa 20.2.0 out now

    The latest and greatest in open source graphics drivers has released with Mesa 20.2.0, although you should wait on it if you're after a stable experience. As always, the Mesa team suggest waiting on at least the first bug fix release with Mesa 20.2.1 which is usually out within a few weeks. Developer Dylan Baker who announced the new release mentioned to expect some more regular releases for the 20.2 series, as they're back from a long vacation. What's new? Lots, as always. Support for new Vulkan extensions, added support for new GPUs including initial work done for AMD's upcoming RDNA 2 noted as "gfx10.3", expanded GLES 3.2 and OpenGL 4.5 support for LLVMpipe, lots of work on the Panfrost driver for Mali GPUs. You can find some release notes for Mesa 20.2.0 here.

Present Slides in Linux Terminal With This Nifty Python Tool

There are so many amusing and fun stuff you can do in the terminal. Making and presenting slides is just one of them. Read more

Android Leftovers