Language Selection

English French German Italian Portuguese Spanish

Debian

Syndicate content
Planet Debian - https://planet.debian.org/
Updated: 4 hours 24 min ago

Dirk Eddelbuettel: Announcing ‘Introductions to Emacs Speaks Statistics’

Friday 16th of April 2021 12:49:00 AM

A new website containing introductory videos and slide decks is now available for your perusal at ess-intro.github.io. It provides a series of introductions to the excellent Emacs Speaks Statistics (ESS) mode for the Emacs editor.

This effort started following my little tips, tricks, tools and toys series of short videos and slide decks “for the command-line and R, broadly-speaking”. Which I had mentioned to friends curious about Emacs, and on the ess-help mailing list. And lo and behold, over the fall and winter sixteen of us came together in one GitHub org and are now proud to present the initial batch of videos about first steps, installing, using with spaceemacs, customizing, and org-mode with ESS. More may hopefully fellow, the group is open and you too can join: see the main repo and its wiki.

This is in fact the initial announcement post, so it is flattering that we have already received over 350 views, four comments and twenty-one likes.

We hope it proves to be a useful starting point for some of you. The Emacs editor is quite uniquely powerful, and coupled with ESS makes for a rather nice environment for programming with data, or analysing, visualising, exploring, … data. But we are not zealots: there are many editors and environments under the sun, and most people are perfectly happy with their choice, which is wonderful. We also like ours, and sometimes someone asks ‘tell me more’ or ‘how do I start’. We hope this series satisifies this initial curiousity and takes it from here.

With that, my thanks to Frédéric, Alex, Tyler and Greg for the initial batch, and for everybody else in the org who chipped in with comments and suggestion. We hope it grows from here, so happy Emacsing with R from us!

Ian Jackson: Dreamwidth blocking many RSS readers and aggregators

Thursday 15th of April 2021 11:08:44 PM
There is a serious problem with Dreamwidth, which is impeding access for many RSS reader tools.

This started at around 0500 UTC on Wednesday morning, according to my own RSS reader cron job. A friend found #43443 in the DW ticket tracker, where a user of a minority web browser found they were blocked.

Local tests demonstrated that Dreamwidth had applied blocking by the HTTP User-Agent header, and were rejecting all user-agents not specifically permitted. Today, this rule has been relaxed and unknown user-agents are permitted. But user-agents for general http client libraries are still blocked.

I'm aware of three unresolved tickets about this: #43444 #43445 #43447

We're told there by a volunteer member of Dreamwidth's support staff that this has been done deliberately for "blocking automated traffic". I'm sure the volunteer is just relaying what they've been told by whoever is struggling to deal with what I suppose is probably a spam problem. But it's still rather unsatisfactory.

I have suggested in my own ticket that a good solution might be to apply the new block only to posting and commenting (eg, maybe, by applying it only to HTTP POST requests). If the problem is indeed spam then that ought to be good enough, and would still let RSS readers work properly.

I'm told that this new blocking has been done by "implementing" (actually, configuring or enabling) "some AWS rules for blocking automated traffic". I don't know what facilities AWS provides. This kind of helplessness is of course precisely the kind of thing that the Free Software movement is against and precisely the kind of thing that proprietary services like AWS produce.

I don't know if this blog entry will appear on planet.debian.org and on other people's reader's and aggregators. I think it will at least be seen by other Dreamwidth users. I thought I would post here in the hope that other Dreamwidth users might be able to help get this fixed. At the very least other Dreamwidth blog owners need to know that many of their readers may not be seeing their posts at all.

If this problem is not fixed I will have to move my blog. One of the main points of having a blog is publishing it via RSS. RSS readers are of course based on general http client libraries and many if not most RSS readers have not bothered to customise their user-agent. Those are currently blocked.



comments

Russell Coker: Basics of Linux Kernel Debugging

Wednesday 14th of April 2021 02:27:57 PM

Firstly a disclaimer, I’m not an expert on this and I’m not trying to instruct anyone who is aiming to become an expert. The aim of this blog post is to help someone who has a single kernel issue they want to debug as part of doing something that’s mostly not kernel coding. I welcome comments about the second step to kernel debugging for the benefit of people who need more than this (which might include me next week). Also suggestions for people who can’t use a kvm/qemu debugger would be good.

Below is a command to run qemu with GDB. It should be run from the Linux kernel source directory. You can add other qemu options for a blog device and virtual networking if necessary, but the bug I encountered gave an oops from the initrd so I didn’t need to go further. The “nokaslr” is to avoid address space randomisation which deliberately makes debugging tasks harder (from a certain perspective debugging a kernel and compromising a kernel are fairly similar). Loading the bzImage is fine, gdb can map that to the different file it looks at later on.

qemu-system-x86_64 -kernel arch/x86/boot/bzImage -initrd ../initrd-$KERN_VER -curses -m 2000 -append "root=/dev/vda ro nokaslr" -gdb tcp::1200

The command to run GDB is “gdb vmlinux“, when at the GDB prompt you can run the command “target remote localhost:1200” to connect to the GDB server port 1200. Note that there is nothing special about port 1200, it was given in an example I saw and is as good as any other port. It is important that you run GDB against the “vmlinux” file in the main directory not any of the several stripped and packaged files, GDB can’t handle a bzImage file but that’s OK, it ends up much the same in RAM.

When the “target remote” command is processed the kernel will be suspended by the debugger, if you are looking for a bug early in the boot you may need to be quick about this. Using “qemu-system-x86_64” instead of “kvm” slows things down and can help in that regard. The bug I was hunting happened 1.6 seconds after kernel load with KVM and 7.8 seconds after kernel load with qemu. I am not aware of all the implications of the kvm vs qemu decision on debugging. If your bug is a race condition then trying both would be a good strategy.

After the “target remote” command you can debug the kernel just like any other program.

If you put a breakpoint on print_modules() that will catch the operation of printing an Oops which can be handy.

Related posts:

  1. Finding Corrupt Files that cause a Kernel Error There is a BTRFS bug in kernel 3.13 which is...
  2. Kernel Security vs Uptime For best system security you want to apply kernel security...
  3. Debugging as a Demonstration Sport I was watching So You Think You Can Dance [1]...

Dirk Eddelbuettel: RcppArmadillo 0.10.4.0.0 on CRAN: New Upstream ‘Plus’

Wednesday 14th of April 2021 12:22:00 AM

Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. RcppArmadillo integrates this library with the R environment and language–and is widely used by (currently) 852 other packages on CRAN.

This new release brings us the just release Armadillo 10.4.0. Upstream moves at a speed that is a little faster than the cadence CRAN likes. We release RcppArmadillo 0.10.2.2.0 on March 9; and upstream 10.3.0 came out shortly thereafter. We aim to accomodate CRAN with (roughly) monthly (or less frequent) releases) so by the time we were ready 10.4.0 had just come out.

As it turns, the full testing had a benefit. Among the (currently) 852 CRAN packages using RcppArmadillo, two were failing tests. This is due to a subtle, but important point. Early on we realized that it would be beneficial if the standard R control over random-number creation and seeding affected Armadillo too, which Conrad accomodated kindly with an optional RNG interface—which RcppArmadillo supplies. With recent changes he made, the R side saw normally-distributed draws (via the Armadillo interface) changed, which lead to the two changes. All hail unit tests. So I mentioned this to Conrad, and with the usual Chicago-Brisbane time difference late my evening a fix was in my inbox. The CRAN upload was then halted as I had missed that due to other changes he had made random draws from a Gamma would now call std::rand() which CRAN flags. Another email to Brisbane, another late (one-line) fix back and all was good. We still encountered one package with an error but flagged this as internal to that package’s setup, so Uwe let RcppArmadillo onto CRAN, I contacted that package’s maintainer—who was very receptive and a change should be forthcoming. So with all that we have 0.10.4.0.0 on CRAN giving us Armadillo 10.4.0.

The full set of changes follows. As Armadillo 10.3.0 was not uploaded to CRAN, its changes are included too.

Changes in RcppArmadillo version 0.10.4.0.0 (2021-04-12)
  • Upgraded to Armadillo release 10.4.0 (Pressure Cooker)

    • faster handling of triangular matrices by log_det()

    • added log_det_sympd() for log determinant of symmetric positive matrices

    • added ARMA_WARN_LEVEL configuration option, to control the degree of emitted warning messages

    • reduced the default degree of warning messages, so that failed decompositions, failed saving/loading, etc, no longer emit warnings

  • Apply one upstream corrections for arma::randn draws when using alternative (here R) generator, and arma::randg.

Changes in RcppArmadillo version 0.10.3.0.0 (2021-03-10)
  • Upgraded to Armadillo release 10.3 (Sunrise Chaos)

    • faster handling of symmetric positive definite matrices by pinv()

    • expanded .save() / .load() for dense matrices to handle coord_ascii format

    • for out of bounds access, element accessors now throw the more nuanced std::out_of_range exception, instead of only std::logic_error

    • improved quality of random numbers

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

François Marier: Deleting non-decryptable restic snapshots

Tuesday 13th of April 2021 03:19:03 AM

Due to what I suspect is disk corruption error due to a faulty RAM module or network interface on my GnuBee, my restic backup failed with the following error:

$ restic check using temporary cache in /var/tmp/restic-tmp/restic-check-cache-854484247 repository b0b0516c opened successfully, password is correct created new cache in /var/tmp/restic-tmp/restic-check-cache-854484247 create exclusive lock for repository load indexes check all packs check snapshots, trees and blobs error for tree 4645312b: decrypting blob 4645312b443338d57295550f2f4c135c34bda7b17865c4153c9b99d634ae641c failed: ciphertext verification failed error for tree 2c3248ce: decrypting blob 2c3248ce5dc7a4bc77f03f7475936041b6b03e0202439154a249cd28ef4018b6 failed: ciphertext verification failed Fatal: repository contains errors

I started by locating the snapshots which make use of these corrupt trees:

$ restic find --tree 4645312b repository b0b0516c opened successfully, password is correct Found tree 4645312b443338d57295550f2f4c135c34bda7b17865c4153c9b99d634ae641c ... path /usr/include/boost/spirit/home/support/auxiliary ... in snapshot 41e138c8 (2021-01-31 08:35:16) Found tree 4645312b443338d57295550f2f4c135c34bda7b17865c4153c9b99d634ae641c ... path /usr/include/boost/spirit/home/support/auxiliary ... in snapshot e75876ed (2021-02-28 08:35:29) $ restic find --tree 2c3248ce repository b0b0516c opened successfully, password is correct Found tree 2c3248ce5dc7a4bc77f03f7475936041b6b03e0202439154a249cd28ef4018b6 ... path /usr/include/boost/spirit/home/support/char_encoding ... in snapshot 41e138c8 (2021-01-31 08:35:16) Found tree 2c3248ce5dc7a4bc77f03f7475936041b6b03e0202439154a249cd28ef4018b6 ... path /usr/include/boost/spirit/home/support/char_encoding ... in snapshot e75876ed (2021-02-28 08:35:29)

and then deleted them:

$ restic forget 41e138c8 e75876ed repository b0b0516c opened successfully, password is correct [0:00] 100.00% 2 / 2 files deleted $ restic prune repository b0b0516c opened successfully, password is correct counting files in repo building new index for repo [13:23] 100.00% 58964 / 58964 packs repository contains 58964 packs (1417910 blobs) with 278.913 GiB processed 1417910 blobs: 0 duplicate blobs, 0 B duplicate load all snapshots find data that is still in use for 20 snapshots [1:15] 100.00% 20 / 20 snapshots found 1364852 of 1417910 data blobs still in use, removing 53058 blobs will remove 0 invalid files will delete 942 packs and rewrite 1358 packs, this frees 6.741 GiB [10:50] 31.96% 434 / 1358 packs rewritten hash does not match id: want 9ec955794534be06356655cfee6abe73cb181f88bb86b0cd769cf8699f9f9e57, got 95d90aa48ffb18e6d149731a8542acd6eb0e4c26449a4d4c8266009697fd1904 github.com/restic/restic/internal/repository.Repack github.com/restic/restic/internal/repository/repack.go:37 main.pruneRepository github.com/restic/restic/cmd/restic/cmd_prune.go:242 main.runPrune github.com/restic/restic/cmd/restic/cmd_prune.go:62 main.glob..func19 github.com/restic/restic/cmd/restic/cmd_prune.go:27 github.com/spf13/cobra.(*Command).execute github.com/spf13/cobra/command.go:852 github.com/spf13/cobra.(*Command).ExecuteC github.com/spf13/cobra/command.go:960 github.com/spf13/cobra.(*Command).Execute github.com/spf13/cobra/command.go:897 main.main github.com/restic/restic/cmd/restic/main.go:98 runtime.main runtime/proc.go:204 runtime.goexit runtime/asm_amd64.s:1374

As you can see above, the prune command failed due to a corrupt pack and so I followed the process I previously wrote about and identified the affected snapshots using:

$ restic find --pack 9ec955794534be06356655cfee6abe73cb181f88bb86b0cd769cf8699f9f9e57

before deleting them with:

$ restic forget 031ab8f1 1672a9e1 1f23fb5b 2c58ea3a 331c7231 5e0e1936 735c6744 94f74bdb b11df023 dfa17ba8 e3f78133 eefbd0b0 fe88aeb5 repository b0b0516c opened successfully, password is correct [0:00] 100.00% 13 / 13 files deleted $ restic prune repository b0b0516c opened successfully, password is correct counting files in repo building new index for repo [13:37] 100.00% 60020 / 60020 packs repository contains 60020 packs (1548315 blobs) with 283.466 GiB processed 1548315 blobs: 129812 duplicate blobs, 4.331 GiB duplicate load all snapshots find data that is still in use for 8 snapshots [0:53] 100.00% 8 / 8 snapshots found 1219895 of 1548315 data blobs still in use, removing 328420 blobs will remove 0 invalid files will delete 6232 packs and rewrite 1275 packs, this frees 36.302 GiB [23:37] 100.00% 1275 / 1275 packs rewritten counting files in repo [11:45] 100.00% 52822 / 52822 packs finding old index files saved new indexes as [a31b0fc3 9f5aa9b5 db19be6f 4fd9f1d8 941e710b 528489d9 fb46b04a 6662cd78 4b3f5aad 0f6f3e07 26ae96b2 2de7b89f 78222bea 47e1a063 5abf5c2d d4b1d1c3 f8616415 3b0ebbaa] remove 23 old index files [0:00] 100.00% 23 / 23 files deleted remove 7507 old packs [0:08] 100.00% 7507 / 7507 files deleted done

And with 13 of my 21 snapshots deleted, the checks now pass:

$ restic check using temporary cache in /var/tmp/restic-tmp/restic-check-cache-407999210 repository b0b0516c opened successfully, password is correct created new cache in /var/tmp/restic-tmp/restic-check-cache-407999210 create exclusive lock for repository load indexes check all packs check snapshots, trees and blobs no errors were found

This represents a significant amount of lost backup history, but at least it's not all of it.

Shirish Agarwal: what to write

Tuesday 13th of April 2021 03:14:35 AM

First up, I am alive and well. I have been receiving calls from friends for quite sometime but now that I have become deaf, it is a pain and the hearing aids aren’t all that useful. But moreover, where we have been finding ourselves each and every day sinking lower and lower feels absurd as to what to write and not write about India. Thankfully, I ran across this piece which does tell in far more detail than I ever could. The only interesting and somewhat positive news I had is from south of India otherwise sad days, especially for the poor. The saddest story is that this time Covid has reached alarming proportions in India and surprise, surprise this time the villain for many is my state of Maharashtra even though it hasn’t received its share of GST proceeds for last two years and this was Kerala’s perspective, different state, different party, different political ideology altogether.

Kerala Finance Minister Thomas Issac views on GST, October 22, 2020 Indian Express.

I briefly also share the death of somewhat liberal Film censorship in India unlike Italy which abolished film censorship altogether. I don’t really want spend too much on how we have become No. 2 in Covid cases in the world and perhaps death also. Many people still believe in herd immunity but don’t really know what it means. So without taking too much time and effort, bid adieu. May post when I’m hopefully emotionally feeling better, stronger

Russell Coker: Yama

Monday 12th of April 2021 11:00:59 PM

I’ve just setup the Yama LSM module on some of my Linux systems. Yama controls ptrace which is the debugging and tracing API for Unix systems. The aim is to prevent a compromised process from using ptrace to compromise other processes and cause more damage. In most cases a process which can ptrace another process which usually means having capability SYS_PTRACE (IE being root) or having the same UID as the target process can interfere with that process in other ways such as modifying it’s configuration and data files. But even so I think it has the potential for making things more difficult for attackers without making the system more difficult to use.

If you put “kernel.yama.ptrace_scope = 1” in sysctl.conf (or write “1” to /proc/sys/kernel/yama/ptrace_scope) then a user process can only trace it’s child processes. This means that “strace -p” and “gdb -p” will fail when run as non-root but apart from that everything else will work. Generally “strace -p” (tracing the system calls of another process) is of most use to the sysadmin who can do it as root. The command “gdb -p” and variants of it are commonly used by developers so yama wouldn’t be a good thing on a system that is primarily used for software development.

Another option is “kernel.yama.ptrace_scope = 3” which means no-one can trace and it can’t be disabled without a reboot. This could be a good option for production servers that have no need for software development. It wouldn’t work well for a small server where the sysadmin needs to debug everything, but when dozens or hundreds of servers have their configuration rolled out via a provisioning tool this would be a good setting to include.

See Documentation/admin-guide/LSM/Yama.rst in the kernel source for the details.

When running with capability SYS_PTRACE (IE root shell) you can ptrace anything else and if necessary disable Yama by writing “0” to /proc/sys/kernel/yama/ptrace_scope .

I am enabling mode 1 on all my systems because I think it will make things harder for attackers while not making things more difficult for me.

Also note that SE Linux restricts SYS_PTRACE and also restricts cross-domain ptrace access, so the combination with Yama makes things extra difficult for an attacker.

Yama is enabled in the Debian kernels by default so it’s very easy to setup for Debian users, just edit /etc/sysctl.d/whatever.conf and it will be enabled on boot.

Related posts:

  1. Defense in Depth and Sudo My blog post about logging in as root and whether...
  2. SE Linux and Decrypted Data There is currently a discussion on the Debian-security mailing list...
  3. Secure Boot and Protecting Against Root There has been a lot of discussion recently about the...

Steinar H. Gunderson: Squirrel!

Monday 12th of April 2021 10:30:22 PM

“All comments on this article will now be moderated. The bar to pass moderation will be high, it's really time to think about something else. Did you all see that we have an exciting article on spinlocks?” Poor LWN <3

SPINLOCK!

Clint Adams: Unrivaled technical acumen

Monday 12th of April 2021 07:53:20 PM

Who are the voting members of the Free Software Foundation?

Posted on 2021-04-12 Tags: barks

Russell Coker: Riverdale

Monday 12th of April 2021 12:35:51 PM

I’ve been watching the show Riverdale on Netflix recently. It’s an interesting modern take on the Archie comics. Having watched Josie and the Pussycats in Outer Space when I was younger I was anticipating something aimed towards a similar audience. As solving mysteries and crimes was apparently a major theme of the show I anticipated something along similar lines to Scooby Doo, some suspense and some spooky things, but then a happy ending where criminals get arrested and no-one gets hurt or killed while the vast majority of people are nice. Instead the first episode has a teen being murdered and Ms Grundy being obsessed with 15yo boys and sleeping with Archie (who’s supposed to be 15 but played by a 20yo actor).

Everyone in the show has some dark secret. The filming has a dark theme, the sky is usually overcast and it’s generally gloomy. This is a significant contrast to Veronica Mars which has some similarities in having a young cast, a sassy female sleuth, and some similar plot elements. Veronica Mars has a bright theme and a significant comedy element in spite of dealing with some dark issues (murder, rape, child sex abuse, and more). But Riverdale is just dark. Anyone who watches this with their kids expecting something like Scooby Doo is in for a big surprise.

There are lots of interesting stylistic elements in the show. Lots of clothing and uniform designs that seem to date from the 1940’s. It seems like some alternate universe where kids have smartphones and laptops while dressing in the style of the 1940s. One thing that annoyed me was construction workers using tools like sledge-hammers instead of excavators. A society that has smart phones but no earth-moving equipment isn’t plausible.

On the upside there is a racial mix in the show that more accurately reflects American society than the original Archie comics and homophobia is much less common than in most parts of our society. For both race issues and gay/lesbian issues the show treats them in an accurate way (portraying some bigotry) while the main characters aren’t racist or homophobic.

I think it’s generally an OK show and recommend it to people who want a dark show. It’s a good show to watch while doing something on a laptop so you can check Wikipedia for the references to 1940s stuff (like when Bikinis were invented). I’m half way through season 3 which isn’t as good as the first 2, I don’t know if it will get better later in the season or whether I should have stopped after season 2.

I don’t usually review fiction, but the interesting aesthetics of the show made it deserve a review.

Related posts:

  1. Android Screen Saving Just over a year ago I bought a Samsung Galaxy...

Russell Coker: Storage Trends 2021

Monday 12th of April 2021 10:01:43 AM
The Viability of Small Disks

Less than a year ago I wrote a blog post about storage trends [1]. My main point in that post was that disks smaller than 2TB weren’t viable then and 2TB disks wouldn’t be economically viable in the near future.

Now MSY has 2TB disks for $72 and 2TB SSD for $245, saving $173 if you get a hard drive (compared to saving $240 10 months ago). Given the difference in performance and noise 2TB hard drives won’t be worth using for most applications nowadays.

NVMe vs SSD

Last year NVMe prices were very comparable for SSD prices, I was hoping that trend would continue and SSDs would go away. Now for sizes 1TB and smaller NVMe and SSD prices are very similar, but for 2TB the NVMe prices are twice that of SSD – presumably partly due to poor demand for 2TB NVMe. There are also no NVMe devices larger than 2TB on sale at MSY (a store which caters to home stuff not special server equipment) but SSDs go up to 8TB.

It seems that NVMe is only really suitable for workstation storage and for cache etc on a server. So SATA SSDs will be around for a while.

Small Servers

There are a range of low end servers which support a limited number of disks. Dell has 2 disk servers and 4 disk servers. If one of those had 8TB SSDs you could have 8TB of RAID-1 or 24TB of RAID-Z storage in a low end server. That covers the vast majority of servers (small business or workgroup servers tend to have less than 8TB of storage).

Larger Servers

Anandtech has an article on Seagates roadmap to 120TB disks [2]. They currently sell 20TB disks using HAMR technology

Currently the biggest disks that MSY sells are 10TB for $395, which was also the biggest disk they were selling last year. Last year MSY only sold SSDs up to 2TB in size (larger ones were available from other companies at much higher prices), now they sell 8TB SSDs for $949 (4* capacity increase in less than a year). Seagate is planning 30TB disks for 2023, if SSDs continue to increase in capacity by 4* per year we could have 128TB SSDs in 2023. If you needed a server with 100TB of storage then having 2 or 3 SSDs in a RAID array would be much easier to manage and faster than 4*30TB disks in an array.

When you have a server with many disks you can expect to have more disk failures due to vibration. One time I built a server with 18 disks and took disks from 2 smaller servers that had 4 and 5 disks. The 9 disks which had been working reliably for years started having problems within weeks of running in the bigger server. This is one of the many reasons for paying extra for SSD storage.

Seagate is apparently planning 50TB disks for 2026 and 100TB disks for 2030. If that’s the best they can do then SSD vendors should be able to sell larger products sooner at prices that are competitive. Matching hard drive prices is not required, getting to less than 4* the price should be enough for most customers.

The Anandtech article is worth reading, it mentions some interesting features that Seagate are developing such as having 2 actuators (which they call Mach.2) so the drive can access 2 different tracks at the same time. That can double the performance of a disk, but that doesn’t change things much when SSDs are more than 100* faster. Presumably the Mach.2 disks will be SAS and incredibly expensive while providing significantly less performance than affordable SATA SSDs.

Computer Cases

In my last post I speculated on the appearance of smaller cases designed to not have DVD drives or 3.5″ hard drives. Such cases still haven’t appeared apart from special purpose machines like the NUC that were available last year.

It would be nice if we could get a new industry standard for smaller power supplies. Currently power supplies are expected to be almost 5 inches wide (due to the expectation of a 5.25″ DVD drive mounted horizontally). We need some industry standards for smaller PCs that aren’t like the NUC, the NUC is very nice, but most people who build their own PC need more space than that. I still think that planning on USB DVD drives is the right way to go. I’ve got 4PCs in my home that are regularly used and CDs and DVDs are used so rarely that sharing a single DVD drive among all 4 wouldn’t be a problem.

Conclusion

I’m tempted to get a couple of 4TB SSDs for my home server which cost $487 each, it currently has 2*500G SSDs and 3*4TB disks. I would have to remove some unused files but that’s probably not too hard to do as I have lots of old backups etc on there. Another possibility is to use 2*4TB SSDs for most stuff and 2*4TB disks for backups.

I’m recommending that all my clients only use SSDs for their storage. I only have one client with enough storage that disks are the only option (100TB of storage) but they moved all the functions of that server to AWS and use S3 for the storage. Now I don’t have any clients doing anything with storage that can’t be done in a better way on SSD for a price difference that’s easy for them to afford.

Affordable SSD also makes RAID-1 in workstations more viable. 2 disks in a PC is noisy if you have an office full of them and produces enough waste heat to be a reliability issue (most people don’t cool their offices adequately on weekends). 2 SSDs in a PC is no problem at all. As 500G SSDs are available for $73 it’s not a significant cost to install 2 of them in every PC in the office (more cost for my time than hardware). I generally won’t recommend that hard drives be replaced with SSDs in systems that are working well. But if a machine runs out of space then replacing it with SSDs in a RAID-1 is a good choice.

Moore’s law might cover SSDs, but it definitely doesn’t cover hard drives. Hard drives have fallen way behind developments of most other parts of computers over the last 30 years, hopefully they will go away soon.

Related posts:

  1. Storage Trends In considering storage trends for the consumer side I’m looking...
  2. New Storage Developments Eweek has an article on a new 1TB Seagate drive....
  3. Finding Storage Performance Problems Here are some basic things to do when debugging storage...

Jelmer Vernooij: The upstream ontologist

Sunday 11th of April 2021 10:40:00 PM

The Debian Janitor is an automated system that commits fixes for (minor) issues in Debian packages that can be fixed by software. It gradually started proposing merges in early December. The first set of changes sent out ran lintian-brush on sid packages maintained in Git. This post is part of a series about the progress of the Janitor.

The upstream ontologist is a project that extracts metadata about upstream projects in a consistent format. It does this with a combination of heuristics and reading ecosystem-specific metadata files, such as Python’s setup.py, rust’s Cargo.toml as well as e.g. scanning README files.

Supported Data Sources

It will extract information from a wide variety of sources, including:

Supported Fields

Fields that it currently provides include:

  • Homepage: homepage URL
  • Name: name of the upstream project
  • Contact: contact address of some sort of the upstream (e-mail, mailing list URL)
  • Repository: VCS URL
  • Repository-Browse: Web URL for viewing the VCS
  • Bug-Database: Bug database URL (for web viewing, generally)
  • Bug-Submit: URL to use to submit new bugs (either on the web or an e-mail address)
  • Screenshots: List of URLs with screenshots
  • Archive: Archive used - e.g. SourceForge
  • Security-Contact: e-mail or URL with instructions for reporting security issues
  • Documentation: Link to documentation on the web:
  • Wiki: Wiki URL
  • Summary: one-line description of the project
  • Description: longer description of the project
  • License: Single line license description (e.g. "GPL 2.0") as declared in the metadata[1]
  • Copyright: List of copyright holders
  • Version: Current upstream version
  • Security-MD: URL to markdown file with security policy

All data fields have a “certainty” associated with them (“certain”, “confident”, “likely” or “possible”), which gets set depending on how the data was derived or where it was found. If multiple possible values were found for a specific field, then the value with the highest certainty is taken.

Interface

The ontologist provides a high-level Python API as well as two command-line tools that can write output in two different formats:

For example, running guess-upstream-metadata on dulwich:

% guess-upstream-metadata <string>:2: (INFO/1) Duplicate implicit target name: "contributing". Name: dulwich Repository: https://www.dulwich.io/code/ X-Security-MD: https://github.com/dulwich/dulwich/tree/HEAD/SECURITY.md X-Version: 0.20.21 Bug-Database: https://github.com/dulwich/dulwich/issues X-Summary: Python Git Library X-Description: | This is the Dulwich project. It aims to provide an interface to git repos (both local and remote) that doesn't call out to git directly but instead uses pure Python. X-License: Apache License, version 2 or GNU General Public License, version 2 or later. Bug-Submit: https://github.com/dulwich/dulwich/issues/new Lintian-Brush

lintian-brush can update DEP-12-style debian/upstream/metadata files that hold information about the upstream project that is packaged as well as the Homepage in the debian/control file based on information provided by the upstream ontologist. By default, it only imports data with the highest certainty - you can override this by specifying the --uncertain command-line flag.

[1]Obviously this won't be able to describe the full licensing situation for many projects. Projects like scancode-toolkit are more appropriate for that.

Vishal Gupta: Sikkim 101 for Backpackers

Sunday 11th of April 2021 02:04:00 PM

Host to Kanchenjunga, the world’s third-highest mountain peak and the endangered Red Panda, Sikkim is a state in northeastern India. Nestled between Nepal, Tibet (China), Bhutan and West Bengal (India), the state offers a smorgasbord of cultures and cuisines. That said, it’s hardly surprising that the old spice route meanders through western Sikkim, connecting Lhasa with the ports of Bengal. Although the latter could also be attributed to cardamom (kali elaichi), a perennial herb native to Sikkim, which the state is the second-largest producer of, globally. Lastly, having been to and lived in India, all my life, I can confidently say Sikkim is one of the cleanest & safest regions in India, making it ideal for first-time backpackers.

Brief History
  • 17th century: The Kingdom of Sikkim is founded by the Namgyal dynasty and ruled by Buddhist priest-kings known as the Chogyal.
  • 1890: Sikkim becomes a princely state of British India.
  • 1947: Sikkim continues its protectorate status with the Union of India, post-Indian-independence.
  • 1973: Anti-royalist riots take place in front of the Chogyal's palace, by Nepalis seeking greater representation.
  • 1975: Referendum leads to the deposition of the monarchy and Sikkim joins India as its 22nd state.
Languages

  • Official: English, Nepali, Sikkimese/Bhotia and Lepcha
  • Though Hindi and Nepali share the same script (Devanagari), they are not mutually intelligible. Yet, most people in Sikkim can understand and speak Hindi.

Ethnicity

  • Nepalis: Migrated in large numbers (from Nepal) and soon became the dominant community
  • Bhutias: People of Tibetan origin. Major inhabitants in Northern Sikkim.
  • Lepchas: Original inhabitants of Sikkim

Food
  • Tibetan/Nepali dishes (mostly consumed during winter)
    • Thukpa: Noodle soup, rich in spices and vegetables. Usually contains some form of meat. Common variations: Thenthuk and Gyathuk
    • Momos: Steamed or fried dumplings, usually with a meat filling.
    • Saadheko: Spicy marinated chicken salad.
    • Gundruk Soup: A soup made from Gundruk, a fermented leafy green vegetable.
    • Sinki : A fermented radish tap-root product, traditionally consumed as a base for soup and as a pickle. Eerily similar to Kimchi.
  • While pork and beef are pretty common, finding vegetarian dishes is equally easy.
  • Staple: Dal-Bhat with Subzi. Rice is a lot more common than wheat (rice) possibly due to greater carb content and proximity to West Bengal, India’s largest producer of Rice.
  • Good places to eat in Gangtok
    • Hamro Bhansa Ghar, Nimtho (Nepali)
    • Taste of Tibet
    • Dragon Wok (Chinese & Japanese)

Buddhism in Sikkim
  • Bayul Demojong (Sikkim), is the most sacred Land in the Himalayas as per the belief of the Northern Buddhists and various religious texts.
  • Sikkim was blessed by Guru Padmasambhava, the great Buddhist saint who visited Sikkim in the 8th century and consecrated the land.
  • However, Buddhism is said to have reached Sikkim only in the 17th century with the arrival of three Tibetan monks viz. Rigdzin Goedki Demthruchen, Mon Kathok Sonam Gyaltshen & Rigdzin Legden Je at Yuksom. Together, they established a Buddhist monastery.
  • In 1642 they crowned Phuntsog Namgyal as the first monarch of Sikkim and gave him the title of Chogyal, or Dharma Raja.
  • The faith became popular through its royal patronage and soon many villages had their own monastery.
  • Today Sikkim has over 200 monasteries.
Major monasteries
  • Rumtek Monastery, 20Km from Gangtok
  • Lingdum/Ranka Monastery, 17Km from Gangtok
  • Phodong Monastery, 28Km from Gangtok
  • Ralang Monastery, 10Km from Ravangla
  • Tsuklakhang Monastery, Royal Palace, Gangtok
  • Enchey Monastery, Gangtok
  • Tashiding Monastery, 35Km from Ravangla

Reaching Sikkim
  • Gangtok, being the capital, is easiest to reach amongst other regions, by public transport and shared cabs.
  • By Air:
    • Pakyong (PYG) :
      • Nearest airport from Gangtok (about 1 hour away)
      • Tabletop airport
      • Reserved cabs cost around INR 1200.
      • As of Apr 2021, the only flights to PYG are from IGI (Delhi) and CCU (Kolkata).
    • Bagdogra (IXB) :
      • About 20 minutes from Siliguri and 4 hours from Gangtok.
      • Larger airport with flights to most major Indian cities.
      • Reserved cabs cost about INR 3000. Shared cabs cost about INR 350.
  • By Train:
    • New Jalpaiguri (NJP) :
      • About 20 minutes from Siliguri and 4 hours from Gangtok.
      • Reserved cabs cost about INR 3000. Shared cabs from INR 350.
  • By Road:
    • NH10 connects Siliguri to Gangtok
    • If you can’t find buses plying to Gangtok directly, reach Siliguri and then take a cab to Gangtok.
  • Sikkim Nationalised Transport Div. also runs hourly buses between Siliguri and Gangtok and daily buses on other common routes. They’re cheaper than shared cabs.
  • Wizzride also operates shared cabs between Siliguri/Bagdogra/NJP, Gangtok and Darjeeling. They cost about the same as shared cabs but pack in half as many people in “luxury cars” (Innova, Xylo, etc.) and are hence more comfortable.

Gangtok
  • Time needed: 1D/1N
  • Places to visit:
    • Hanuman Tok
    • Ganesh Tok
    • Tashi View Point [6,800ft]
    • MG Marg
    • Sikkim Zoo
    • Gangtok Ropeway
    • Enchey Monastery
    • Tsuklakhang Palace & Monastery
  • Hostels: Tagalong Backpackers (would strongly recommend), Zostel Gangtok
  • Places to chill: Travel Cafe, Café Live & Loud and Gangtok Groove
  • Places to shop: Lal Market and MG Marg
Getting Around
  • Taxis operate on a reserved or shared basis. In case of the latter, you can pool with other commuters your taxis will pick up and drop en-route.
  • Naturally shared taxis only operate on popular routes. The easiest way to get around Gangtok is to catch a shared cab from MG Marg.
  • Reserved taxis for Gangtok sightseeing cost around INR 1000-1500, depending upon the spots you’d like to see
  • Key taxi/bus stands :
    • Deorali stand: For Darjeeling, Siliguri, Kalimpong
    • Vajra stand: For North & East Sikkim (Tsomgo Lake & Nathula)
    • Rumtek taxi: For Ravangla, Pelling, Namchi, Geyzing, Jorethang and Singtam.

Exploring Gangtok on an MTB

North Sikkim
  • The easiest & most economical way to explore North Sikkim is the 3D/2N package offered by shared-cab drivers.
  • This includes food, permits, cab rides and accommodation (1N in Lachen and 1N in Lachung)
  • The accommodation on both nights are at homestays with bare necessities, so keep your hopes low.
  • In the spirit of sustainable tourism, you’ll be asked to discard single-use plastic bottles, so please carry a bottle that you can refill along the way.
  • Zero Point and Gurdongmer Lake are snow-capped throughout the year

3D/2N Shared-cab Package Itinerary
  • Day 1
    • Gangtok (10am) - Chungthang - Lachung (stay)
  • Day 2
    • Pre-lunch : Lachung (6am) - Yumthang Valley [12,139ft] - Zero Point - Lachung [15,300ft]
    • Post-lunch : Lachung - Chungthang - Lachen (stay)
  • Day 3
    • Pre-lunch : Lachen (5am) - Kala Patthar - Gurdongmer Lake [16,910ft] - Lachen
    • Post-lunch : Lachen - Chungthang - Gangtok (7pm)
  • This itinerary is idealistic and depends on the level of snowfall.
  • Some drivers might switch up Day 2 and 3 itineraries by visiting Lachen and then Lachung, depending upon the weather.
  • Areas beyond Lachen & Lachung are heavily militarized since the Indo-China border is only a few miles away.

East Sikkim Zuluk and Silk Route
  • Time needed: 2D/1N
  • Zuluk [9,400ft] is a small hamlet with an excellent view of the eastern Himalayan range including the Kanchenjunga.
  • Was once a transit point to the historic Silk Route from Tibet (Lhasa) to India (West Bengal).
  • The drive from Gangtok to Zuluk takes at least four hours. Hence, it makes sense to spend the night at a homestay and space out your trip to Zuluk

Tsomgo Lake and Nathula
  • Time Needed : 1D
  • A Protected Area Permit is required to visit these places, due to their proximity to the Chinese border
  • Tsomgo/Chhangu Lake [12,313ft]
    • Glacial lake, 40 km from Gangtok.
    • Remains frozen during the winter season.
    • You can also ride on the back of a Yak for INR 300
  • Baba Mandir
    • An old temple dedicated to Baba Harbhajan Singh, a Sepoy in the 23rd Regiment, who died in 1962 near the Nathu La during Indo – China war.
  • Nathula Pass [14,450ft]
    • Located on the Indo-Tibetan border crossing of the Old Silk Route, it is one of the three open trading posts between India and China.
    • Plays a key role in the Sino-Indian Trade and also serves as an official Border Personnel Meeting(BPM) Point.
    • May get cordoned off by the Indian Army in event of heavy snowfall or for other security reasons.

West Sikkim
  • Time needed: 3N/1N
  • Hostels at Pelling : Mochilerro Ostillo
Itinerary Day 1: Gangtok - Ravangla - Pelling
  • Leave Gangtok early, for Ravangla through the Temi Tea Estate route.
  • Spend some time at the tea garden and then visit Buddha Park at Ravangla
  • Head to Pelling from Ravangla
Day 2: Pelling sightseeing
  • Hire a cab and visit Skywalk, Pemayangtse Monastery, Rabdentse Ruins, Kecheopalri Lake, Kanchenjunga Falls.
Day 3: Pelling - Gangtok/Siliguri
  • Wake up early to catch a glimpse of Kanchenjunga at the Pelling Helipad around sunrise
  • Head back to Gangtok on a shared-cab
  • You could take a bus/taxi back to Siliguri if Pelling is your last stop.

Darjeeling
  • In my opinion, Darjeeling is lovely for a two-day detour on your way back to Bagdogra/Siliguri and not any longer (unless you’re a Bengali couple on a honeymoon)
  • Once a part of Sikkim, Darjeeling was ceded to the East India Company after a series of wars, with Sikkim briefly receiving a grant from EIC for “gifting” Darjeeling to the latter
  • Post-independence, Darjeeling was merged with the state of West Bengal.
Itinerary Day 1 :
  • Take a cab from Gangtok to Darjeeling (shared-cabs cost INR 300 per seat)
  • Reach Darjeeling by noon and check in to your Hostel. I stayed at Hideout.
  • Spend the evening visiting either a monastery (or the Batasia Loop), Nehru Road and Mall Road.
  • Grab dinner at Glenary whilst listening to live music.

Day 2:
  • Wake up early to catch the sunrise and a glimpse of Kanchenjunga at Tiger Hill. Since Tiger Hill is 10km from Darjeeling and requires a permit, book your taxi in advance.
  • Alternatively, if you don’t want to get up at 4am or shell out INR1500 on the cab to Tiger Hill, walk to the Kanchenjunga View Point down Mall Road
  • Next, queue up outside Keventers for breakfast with a view in a century-old cafe
  • Get a cab at Gandhi Road and visit a tea garden (Happy Valley is the closest) and the Ropeway. I was lucky to meet 6 other backpackers at my hostel and we ended up pooling the cab at INR 200 per person, with INR 1400 being on the expensive side, but you could bargain.
  • Get lunch, buy some tea at Golden Tips, pack your bags and hop on a shared-cab back to Siliguri. It took us about 4hrs to reach Siliguri, with an hour to spare before my train.
  • If you’ve still got time on your hands, then check out the Peace Pagoda and the Darjeeling Himalayan Railway (Toy Train). At INR 1500, I found the latter to be too expensive and skipped it.

Tips and hacks
  • Download offline maps, especially when you’re exploring Northern Sikkim.
  • Food and booze are the cheapest in Gangtok. Stash up before heading to other regions.
  • Keep your Aadhar/Passport handy since you need permits to travel to North & East Sikkim.
  • In rural areas and some cafes, you may get to try Rhododendron Wine, made from Rhododendron arboreum a.k.a Gurans. Its production is a little hush-hush since the flower is considered holy and is also the National Flower of Nepal.
  • If you don’t want to invest in a new jacket, boots or a pair of gloves, you can always rent them at nominal rates from your hotel or little stores around tourist sites.
  • Check the weather of a region before heading there. Low visibility and precipitation can quite literally dampen your experience.
  • Keep your itinerary flexible to accommodate for rest and impromptu plans.
  • Shops and restaurants close by 8pm in Sikkim and Darjeeling. Plan for the same.
Carry…
  • a couple of extra pairs of socks (woollen, if possible)
  • a pair of slippers to wear indoors
  • a reusable water bottle
  • an umbrella
  • a power bank
  • a couple of tablets of Diamox. Helps deal with altitude sickness
  • extra clothes and wet bags since you may not get a chance to wash/dry your clothes
  • a few passport size photographs
Shared-cab hacks
  • Intercity rides can be exhausting. If you can afford it, pay for an additional seat.
  • Call shotgun on the drives beyond Lachen and Lachung. The views are breathtaking.
  • Return cabs tend to be cheaper (WB cabs travelling from SK and vice-versa)
Cost
  • My median daily expenditure (back when I went to Sikkim in early March 2021) was INR 1350.
  • This includes stay (bunk bed), food, wine and transit (shared cabs)
  • In my defence, I splurged on food, wine and extra seats in shared cabs, but if you’re on a budget, you could easily get by on INR 1 - 1.2k per day.
  • For a 9-day trip, I ended up shelling out nearly INR 15k, including 2AC trains to & from Kolkata
  • Note : Summer (March to May) and Autumn (October to December) are peak seasons, and thereby more expensive to travel around.
Souvenirs and things you should buy

Buddhist souvenirs :
  • Colourful Prayer Flags (great for tying on bikes or behind car windshields)
  • Miniature Prayer/Mani Wheels
  • Lucky Charms, Pendants and Key Chains
  • Cham Dance masks and robes
  • Singing Bowls
  • Common symbols: Om mani padme hum, Ashtamangala, Zodiac signs
Handicrafts & Handlooms
  • Tibetan Yak Wool shawls, scarfs and carpets
  • Sikkimese Ceramic cups
  • Thangka Paintings
Edibles
  • Darjeeling Tea (usually brewed and not boiled)
  • Wine (Arucha Peach & Rhododendron)
  • Dalle Khursani (Chilli) Paste and Pickle

Header Icon made by Freepik from www.flaticon.com is licensed by CC 3.0 BY

Jonathan Dowland: 2020 in short fiction

Sunday 11th of April 2021 10:53:33 AM

Following on from 2020 in Fiction: In 2020 I read a couple of collections of short fiction from some of my favourite authors.

I started the year with Christopher Priest's Episodes. The stories within are collected from throughout his long career, and vary in style and tone. Priest wrote new little prologues and epilogues for each of the stories, explaining the context in which they were written. I really enjoyed this additional view into their construction.

By contrast, Adam Robert's Adam Robots presents the stories on their own terms. Each of the stories is written in a different mode: one as golden-age SF, another as a kind of Cyberpunk, for example, although they all blend or confound sub-genres to some degree. I'm not clever enough to have decoded all their secrets on a first read, and I would have appreciated some "Cliff's Notes” on any deeper meaning or intent.

Ted Chiang's Exhalation was up to the fantastic standard of his earlier collection and had some extremely thoughtful explorations of philosophical ideas. All the stories are strong but one stuck in my mind the longest: Omphalos)…

With my daughter I finished three of Terry Pratchett's short story collections aimed at children: Dragon at Crumbling Castle; The Witch's Vacuum Cleaner and The Time-Travelling Caveman. If you are a Pratchett fan and you've overlooked these because they're aimed at children, take another look. The quality varies, but there are some true gems in these. Several stories take place in common settings, either the town of Blackbury, in Gritshire (and the adjacent Even Moor), or the Welsh border-town of Llandanffwnfafegettupagogo. The sad thing was knowing that once I'd finished them (and the fourth, Father Christmas's Fake Beard) that was it: there will be no more.

8/31 of the "books" I read in 2020 were issues of Interzone. Counting them as "books" for my annual reading goal has encouraged me to read full issues, whereas before I would likely have only read a couple of stories from each issue. Reading full issues has rekindled the enjoyment I got out of it when I first discovered the magazine at the turn of the Century. I am starting to recognise stories by authors that have written stories in other issues, as well as common themes from the current era weaving their way into the work (Trump, Brexit, etc.) No doubt the Pandemic will leave its mark on 2021's stories.

Junichi Uekawa: Wrote a timezone checker page.

Sunday 11th of April 2021 02:06:56 AM
Wrote a timezone checker page. timezone. Shows the current time in blue line. I haven't made anything configurable but will think about it later.

Charles Plessy: Debian Bullseye: more open

Saturday 10th of April 2021 10:21:40 PM

Debian Bullseye will provide the command /usr/bin/open for your greatest comfort at the command line. On a system with a graphical desktop environment, the command should have a similar result as when opening a document from a mouse-and-click file browser.

Technically, /usr/bin/open is a symbolic link managed by update-alternatives to point towards xdg-open if available and otherwise run-mailcap.

Kentaro Hayashi: Grow your ideas for Debian Project

Saturday 10th of April 2021 01:13:01 PM

There may be some "If it could be ..." ideas for Debian Project. If idea is concreate and worth to make things forward, it should make a proposal for Project Funding.

salsa.debian.org

But it is a just an idea, or no afford to act as an executor role, that idea will not be achieved.

I thought that It needs an incubator - complemental project.

salsa.debian.org

I've salvaged an idea from closed MR Add proposal about "Formalize reimbursement process" (!5) · Merge Requests · Freexian SARL / Project Funding · GitLab

I'm not confident whether mechanism works, but Debian needs change.

Michael Prokop: A Ceph war story

Friday 9th of April 2021 12:39:55 PM

It all started with the big bang! We nearly lost 33 of 36 disks on a Proxmox/Ceph Cluster; this is the story of how we recovered them.

At the end of 2020, we eventually had a long outstanding maintenance window for taking care of system upgrades at a customer. During this maintenance window, which involved reboots of server systems, the involved Ceph cluster unexpectedly went into a critical state. What was planned to be a few hours of checklist work in the early evening turned out to be an emergency case; let’s call it a nightmare (not only because it included a big part of the night). Since we have learned a few things from our post mortem and RCA, it’s worth sharing those with others. But first things first, let’s step back and clarify what we had to deal with.

The system and its upgrade

One part of the upgrade included 3 Debian servers (we’re calling them server1, server2 and server3 here), running on Proxmox v5 + Debian/stretch with 12 Ceph OSDs each (65.45TB in total), a so-called Proxmox Hyper-Converged Ceph Cluster.

First, we went for upgrading the Proxmox v5/stretch system to Proxmox v6/buster, before updating Ceph Luminous v12.2.13 to the latest v14.2 release, supported by Proxmox v6/buster. The Proxmox upgrade included updating corosync from v2 to v3. As part of this upgrade, we had to apply some configuration changes, like adjust ring0 + ring1 address settings and add a mon_host configuration to the Ceph configuration.

During the first two servers’ reboots, we noticed configuration glitches. After fixing those, we went for a reboot of the third server as well. Then we noticed that several Ceph OSDs were unexpectedly down. The NTP service wasn’t working as expected after the upgrade. The underlying issue is a race condition of ntp with systemd-timesyncd (see #889290). As a result, we had clock skew problems with Ceph, indicating that the Ceph monitors’ clocks aren’t running in sync (which is essential for proper Ceph operation). We initially assumed that our Ceph OSD failure derived from this clock skew problem, so we took care of it. After yet another round of reboots, to ensure the systems are running all with identical and sane configurations and services, we noticed lots of failing OSDs. This time all but three OSDs (19, 21 and 22) were down:

% sudo ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 65.44138 root default -2 21.81310 host server1 0 hdd 1.08989 osd.0 down 1.00000 1.00000 1 hdd 1.08989 osd.1 down 1.00000 1.00000 2 hdd 1.63539 osd.2 down 1.00000 1.00000 3 hdd 1.63539 osd.3 down 1.00000 1.00000 4 hdd 1.63539 osd.4 down 1.00000 1.00000 5 hdd 1.63539 osd.5 down 1.00000 1.00000 18 hdd 2.18279 osd.18 down 1.00000 1.00000 20 hdd 2.18179 osd.20 down 1.00000 1.00000 28 hdd 2.18179 osd.28 down 1.00000 1.00000 29 hdd 2.18179 osd.29 down 1.00000 1.00000 30 hdd 2.18179 osd.30 down 1.00000 1.00000 31 hdd 2.18179 osd.31 down 1.00000 1.00000 -4 21.81409 host server2 6 hdd 1.08989 osd.6 down 1.00000 1.00000 7 hdd 1.08989 osd.7 down 1.00000 1.00000 8 hdd 1.63539 osd.8 down 1.00000 1.00000 9 hdd 1.63539 osd.9 down 1.00000 1.00000 10 hdd 1.63539 osd.10 down 1.00000 1.00000 11 hdd 1.63539 osd.11 down 1.00000 1.00000 19 hdd 2.18179 osd.19 up 1.00000 1.00000 21 hdd 2.18279 osd.21 up 1.00000 1.00000 22 hdd 2.18279 osd.22 up 1.00000 1.00000 32 hdd 2.18179 osd.32 down 1.00000 1.00000 33 hdd 2.18179 osd.33 down 1.00000 1.00000 34 hdd 2.18179 osd.34 down 1.00000 1.00000 -3 21.81419 host server3 12 hdd 1.08989 osd.12 down 1.00000 1.00000 13 hdd 1.08989 osd.13 down 1.00000 1.00000 14 hdd 1.63539 osd.14 down 1.00000 1.00000 15 hdd 1.63539 osd.15 down 1.00000 1.00000 16 hdd 1.63539 osd.16 down 1.00000 1.00000 17 hdd 1.63539 osd.17 down 1.00000 1.00000 23 hdd 2.18190 osd.23 down 1.00000 1.00000 24 hdd 2.18279 osd.24 down 1.00000 1.00000 25 hdd 2.18279 osd.25 down 1.00000 1.00000 35 hdd 2.18179 osd.35 down 1.00000 1.00000 36 hdd 2.18179 osd.36 down 1.00000 1.00000 37 hdd 2.18179 osd.37 down 1.00000 1.00000

Our blood pressure increased slightly! Did we just lose all of our cluster? What happened, and how can we get all the other OSDs back?

We stumbled upon this beauty in our logs:

kernel: [ 73.697957] XFS (sdl1): SB stripe unit sanity check failed kernel: [ 73.698002] XFS (sdl1): Metadata corruption detected at xfs_sb_read_verify+0x10e/0x180 [xfs], xfs_sb block 0xffffffffffffffff kernel: [ 73.698799] XFS (sdl1): Unmount and run xfs_repair kernel: [ 73.699199] XFS (sdl1): First 128 bytes of corrupted metadata buffer: kernel: [ 73.699677] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 62 00 XFSB..........b. kernel: [ 73.700205] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ kernel: [ 73.700836] 00000020: 62 44 2b c0 e6 22 40 d7 84 3d e1 cc 65 88 e9 d8 bD+.."@..=..e... kernel: [ 73.701347] 00000030: 00 00 00 00 00 00 40 08 00 00 00 00 00 00 01 00 ......@......... kernel: [ 73.701770] 00000040: 00 00 00 00 00 00 01 01 00 00 00 00 00 00 01 02 ................ ceph-disk[4240]: mount: /var/lib/ceph/tmp/mnt.jw367Y: mount(2) system call failed: Structure needs cleaning. ceph-disk[4240]: ceph-disk: Mounting filesystem failed: Command '['/bin/mount', '-t', u'xfs', '-o', 'noatime,inode64', '--', '/dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.cdda39ed-5 ceph/tmp/mnt.jw367Y']' returned non-zero exit status 32 kernel: [ 73.702162] 00000050: 00 00 00 01 00 00 18 80 00 00 00 04 00 00 00 00 ................ kernel: [ 73.702550] 00000060: 00 00 06 48 bd a5 10 00 08 00 00 02 00 00 00 00 ...H............ kernel: [ 73.702975] 00000070: 00 00 00 00 00 00 00 00 0c 0c 0b 01 0d 00 00 19 ................ kernel: [ 73.703373] XFS (sdl1): SB validate failed with error -117.

The same issue was present for the other failing OSDs. We hoped, that the data itself was still there, and only the mounting of the XFS partitions failed. The Ceph cluster was initially installed in 2017 with Ceph jewel/10.2 with the OSDs on filestore (nowadays being a legacy approach to storing objects in Ceph). However, we migrated the disks to bluestore since then (with ceph-disk and not yet via ceph-volume what’s being used nowadays). Using ceph-disk introduces these 100MB XFS partitions containing basic metadata for the OSD.

Given that we had three working OSDs left, we decided to investigate how to rebuild the failing ones. Some folks on #ceph (thanks T1, ormandj + peetaur!) were kind enough to share how working XFS partitions looked like for them. After creating a backup (via dd), we tried to re-create such an XFS partition on server1. We noticed that even mounting a freshly created XFS partition failed:

synpromika@server1 ~ % sudo mkfs.xfs -f -i size=2048 -m uuid="4568c300-ad83-4288-963e-badcd99bf54f" /dev/sdc1 meta-data=/dev/sdc1 isize=2048 agcount=4, agsize=6272 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=0 data = bsize=4096 blocks=25088, imaxpct=25 = sunit=128 swidth=64 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=1608, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 synpromika@server1 ~ % sudo mount /dev/sdc1 /mnt/ceph-recovery SB stripe unit sanity check failed Metadata corruption detected at 0x433840, xfs_sb block 0x0/0x1000 libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x1000 cache_node_purge: refcount was 1, not zero (node=0x1d3c400) SB stripe unit sanity check failed Metadata corruption detected at 0x433840, xfs_sb block 0x18800/0x1000 libxfs_writebufr: write verifer failed on xfs_sb bno 0x18800/0x1000 SB stripe unit sanity check failed Metadata corruption detected at 0x433840, xfs_sb block 0x0/0x1000 libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x1000 SB stripe unit sanity check failed Metadata corruption detected at 0x433840, xfs_sb block 0x24c00/0x1000 libxfs_writebufr: write verifer failed on xfs_sb bno 0x24c00/0x1000 SB stripe unit sanity check failed Metadata corruption detected at 0x433840, xfs_sb block 0xc400/0x1000 libxfs_writebufr: write verifer failed on xfs_sb bno 0xc400/0x1000 releasing dirty buffer (bulk) to free list!releasing dirty buffer (bulk) to free list!releasing dirty buffer (bulk) to free list!releasing dirty buffer (bulk) to free list!found dirty buffer (bulk) on free list!bad magic number bad magic number Metadata corruption detected at 0x433840, xfs_sb block 0x0/0x1000 libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x1000 releasing dirty buffer (bulk) to free list!mount: /mnt/ceph-recovery: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error.

Ouch. This very much looked related to the actual issue we’re seeing. So we tried to execute mkfs.xfs with a bunch of different sunit/swidth settings. Using ‘-d sunit=512 -d swidth=512‘ at least worked then, so we decided to force its usage in the creation of our OSD XFS partition. This brought us a working XFS partition. Please note, sunit must not be larger than swidth (more on that later!).

Then we reconstructed how to restore all the metadata for the OSD (activate.monmap, active, block_uuid, bluefs, ceph_fsid, fsid, keyring, kv_backend, magic, mkfs_done, ready, require_osd_release, systemd, type, whoami). To identify the UUID, we can read the data from ‘ceph --format json osd dump‘, like this for all our OSDs (Zsh syntax ftw!):

synpromika@server1 ~ % for f in {0..37} ; printf "osd-$f: %s\n" "$(sudo ceph --format json osd dump | jq -r ".osds[] | select(.osd==$f) | .uuid")" osd-0: 4568c300-ad83-4288-963e-badcd99bf54f osd-1: e573a17a-ccde-4719-bdf8-eef66903ca4f osd-2: 0e1b2626-f248-4e7d-9950-f1a46644754e osd-3: 1ac6a0a2-20ee-4ed8-9f76-d24e900c800c [...]

Identifying the corresponding raw device for each OSD UUID is possible via:

synpromika@server1 ~ % UUID="4568c300-ad83-4288-963e-badcd99bf54f" synpromika@server1 ~ % readlink -f /dev/disk/by-partuuid/"${UUID}" /dev/sdc1

The OSD’s key ID can be retrieved via:

synpromika@server1 ~ % OSD_ID=0 synpromika@server1 ~ % sudo ceph auth get osd."${OSD_ID}" -f json 2>/dev/null | jq -r '.[] | .key' AQCKFpZdm0We[...]

Now we also need to identify the underlying block device:

synpromika@server1 ~ % OSD_ID=0 synpromika@server1 ~ % sudo ceph osd metadata osd."${OSD_ID}" -f json | jq -r '.bluestore_bdev_partition_path' /dev/sdc2

With all of this, we reconstructed the keyring, fsid, whoami, block + block_uuid files. All the other files inside the XFS metadata partition are identical on each OSD. So after placing and adjusting the corresponding metadata on the XFS partition for Ceph usage, we got a working OSD – hurray! Since we had to fix yet another 32 OSDs, we decided to automate this XFS partitioning and metadata recovery procedure.

We had a network share available on /srv/backup for storing backups of existing partition data. On each server, we tested the procedure with one single OSD before iterating over the list of remaining failing OSDs. We started with a shell script on server1, then adjusted the script for server2 and server3. This is the script, as we executed it on the 3rd server.

Thanks to this, we managed to get the Ceph cluster up and running again. We didn’t want to continue with the Ceph upgrade itself during the night though, as we wanted to know exactly what was going on and why the system behaved like that. Time for RCA!

Root Cause Analysis

So all but three OSDs on server2 failed, and the problem seems to be related to XFS. Therefore, our starting point for the RCA was, to identify what was different on server2, as compared to server1 + server3. My initial assumption was that this was related to some firmware issues with the involved controller (and as it turned out later, I was right!). The disks were attached as JBOD devices to a ServeRAID M5210 controller (with a stripe size of 512). Firmware state:

synpromika@server1 ~ % sudo storcli64 /c0 show all | grep '^Firmware' Firmware Package Build = 24.16.0-0092 Firmware Version = 4.660.00-8156 synpromika@server2 ~ % sudo storcli64 /c0 show all | grep '^Firmware' Firmware Package Build = 24.21.0-0112 Firmware Version = 4.680.00-8489 synpromika@server3 ~ % sudo storcli64 /c0 show all | grep '^Firmware' Firmware Package Build = 24.16.0-0092 Firmware Version = 4.660.00-8156

This looked very promising, as server2 indeed runs with a different firmware version on the controller. But how so? Well, the motherboard of server2 got replaced by a Lenovo/IBM technician in January 2020, as we had a failing memory slot during a memory upgrade. As part of this procedure, the Lenovo/IBM technician installed the latest firmware versions. According to our documentation, some OSDs were rebuilt (due to the filestore->bluestore migration) in March and April 2020. It turned out that precisely those OSDs were the ones that survived the upgrade. So the surviving drives were created with a different firmware version running on the involved controller. All the other OSDs were created with an older controller firmware. But what difference does this make?

Now let’s check firmware changelogs. For the 24.21.0-0097 release we found this:

- Cannot create or mount xfs filesystem using xfsprogs 4.19.x kernel 4.20(SCGCQ02027889) - xfs_info command run on an XFS file system created on a VD of strip size 1M shows sunit and swidth as 0(SCGCQ02056038)

Our XFS problem certainly was related to the controller’s firmware. We also recalled that our monitoring system reported different sunit settings for the OSDs that were rebuilt in March and April. For example, OSD 21 was recreated and got different sunit settings:

WARN server2.example.org Mount options of /var/lib/ceph/osd/ceph-21 WARN - Missing: sunit=1024, Exceeding: sunit=512

We compared the new OSD 21 with an existing one (OSD 25 on server3):

synpromika@server2 ~ % systemctl show var-lib-ceph-osd-ceph\\x2d21.mount | grep sunit Options=rw,noatime,attr2,inode64,sunit=512,swidth=512,noquota synpromika@server3 ~ % systemctl show var-lib-ceph-osd-ceph\\x2d25.mount | grep sunit Options=rw,noatime,attr2,inode64,sunit=1024,swidth=512,noquota

Thanks to our documentation, we could compare execution logs of their creation:

% diff -u ceph-disk-osd-25.log ceph-disk-osd-21.log -synpromika@server2 ~ % sudo ceph-disk -v prepare --bluestore /dev/sdj --osd-id 25 +synpromika@server3 ~ % sudo ceph-disk -v prepare --bluestore /dev/sdi --osd-id 21 [...] -command_check_call: Running command: /sbin/mkfs -t xfs -f -i size=2048 -- /dev/sdj1 -meta-data=/dev/sdj1 isize=2048 agcount=4, agsize=6272 blks [...] +command_check_call: Running command: /sbin/mkfs -t xfs -f -i size=2048 -- /dev/sdi1 +meta-data=/dev/sdi1 isize=2048 agcount=4, agsize=6336 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=0, rmapbt=0, reflink=0 -data = bsize=4096 blocks=25088, imaxpct=25 - = sunit=128 swidth=64 blks +data = bsize=4096 blocks=25344, imaxpct=25 + = sunit=64 swidth=64 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=4096 blocks=1608, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 [...]

So back then, we even tried to track this down but couldn’t make sense of it yet. But now this sounds very much like it is related to the problem we saw with this Ceph/XFS failure. We follow Occam’s razor, assuming the simplest explanation is usually the right one, so let’s check the disk properties and see what differs:

synpromika@server1 ~ % sudo blockdev --getsz --getsize64 --getss --getpbsz --getiomin --getioopt /dev/sdk 4685545472 2398999281664 512 4096 524288 262144 synpromika@server2 ~ % sudo blockdev --getsz --getsize64 --getss --getpbsz --getiomin --getioopt /dev/sdk 4685545472 2398999281664 512 4096 262144 262144

See the difference between server1 and server2 for identical disks? The getiomin option now reports something different for them:

synpromika@server1 ~ % sudo blockdev --getiomin /dev/sdk 524288 synpromika@server1 ~ % cat /sys/block/sdk/queue/minimum_io_size 524288 synpromika@server2 ~ % sudo blockdev --getiomin /dev/sdk 262144 synpromika@server2 ~ % cat /sys/block/sdk/queue/minimum_io_size 262144

It doesn’t make sense that the minimum I/O size (iomin, AKA BLKIOMIN) is bigger than the optimal I/O size (ioopt, AKA BLKIOOPT). This leads us to Bug 202127 – cannot mount or create xfs on a 597T device, which matches our findings here. But why did this XFS partition work in the past and fails now with the newer kernel version?

The XFS behaviour change

Now given that we have backups of all the XFS partition, we wanted to track down, a) when this XFS behaviour was introduced, and b) whether, and if so how it would be possible to reuse the XFS partition without having to rebuild it from scratch (e.g. if you would have no working Ceph OSD or backups left).

Let’s look at such a failing XFS partition with the Grml live system:

root@grml ~ # grml-version grml64-full 2020.06 Release Codename Ausgehfuahangl [2020-06-24] root@grml ~ # uname -a Linux grml 5.6.0-2-amd64 #1 SMP Debian 5.6.14-2 (2020-06-09) x86_64 GNU/Linux root@grml ~ # grml-hostname grml-2020-06 Setting hostname to grml-2020-06: done root@grml ~ # exec zsh root@grml-2020-06 ~ # dpkg -l xfsprogs util-linux Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-==============-============-============-========================================= ii util-linux 2.35.2-4 amd64 miscellaneous system utilities ii xfsprogs 5.6.0-1+b2 amd64 Utilities for managing the XFS filesystem

There it’s failing, no matter which mount option we try:

root@grml-2020-06 ~ # mount ./sdd1.dd /mnt mount: /mnt: mount(2) system call failed: Structure needs cleaning. root@grml-2020-06 ~ # dmesg | tail -30 [...] [ 64.788640] XFS (loop1): SB stripe unit sanity check failed [ 64.788671] XFS (loop1): Metadata corruption detected at xfs_sb_read_verify+0x102/0x170 [xfs], xfs_sb block 0xffffffffffffffff [ 64.788671] XFS (loop1): Unmount and run xfs_repair [ 64.788672] XFS (loop1): First 128 bytes of corrupted metadata buffer: [ 64.788673] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 62 00 XFSB..........b. [ 64.788674] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 64.788675] 00000020: 32 b6 dc 35 53 b7 44 96 9d 63 30 ab b3 2b 68 36 2..5S.D..c0..+h6 [ 64.788675] 00000030: 00 00 00 00 00 00 40 08 00 00 00 00 00 00 01 00 ......@......... [ 64.788675] 00000040: 00 00 00 00 00 00 01 01 00 00 00 00 00 00 01 02 ................ [ 64.788676] 00000050: 00 00 00 01 00 00 18 80 00 00 00 04 00 00 00 00 ................ [ 64.788677] 00000060: 00 00 06 48 bd a5 10 00 08 00 00 02 00 00 00 00 ...H............ [ 64.788677] 00000070: 00 00 00 00 00 00 00 00 0c 0c 0b 01 0d 00 00 19 ................ [ 64.788679] XFS (loop1): SB validate failed with error -117. root@grml-2020-06 ~ # mount -t xfs -o rw,relatime,attr2,inode64,sunit=1024,swidth=512,noquota ./sdd1.dd /mnt/ mount: /mnt: wrong fs type, bad option, bad superblock on /dev/loop1, missing codepage or helper program, or other error. 32 root@grml-2020-06 ~ # dmesg | tail -1 [ 66.342976] XFS (loop1): stripe width (512) must be a multiple of the stripe unit (1024) root@grml-2020-06 ~ # mount -t xfs -o rw,relatime,attr2,inode64,sunit=512,swidth=512,noquota ./sdd1.dd /mnt/ mount: /mnt: mount(2) system call failed: Structure needs cleaning. 32 root@grml-2020-06 ~ # dmesg | tail -14 [ 66.342976] XFS (loop1): stripe width (512) must be a multiple of the stripe unit (1024) [ 80.751277] XFS (loop1): SB stripe unit sanity check failed [ 80.751323] XFS (loop1): Metadata corruption detected at xfs_sb_read_verify+0x102/0x170 [xfs], xfs_sb block 0xffffffffffffffff [ 80.751324] XFS (loop1): Unmount and run xfs_repair [ 80.751325] XFS (loop1): First 128 bytes of corrupted metadata buffer: [ 80.751327] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 62 00 XFSB..........b. [ 80.751328] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 80.751330] 00000020: 32 b6 dc 35 53 b7 44 96 9d 63 30 ab b3 2b 68 36 2..5S.D..c0..+h6 [ 80.751331] 00000030: 00 00 00 00 00 00 40 08 00 00 00 00 00 00 01 00 ......@......... [ 80.751331] 00000040: 00 00 00 00 00 00 01 01 00 00 00 00 00 00 01 02 ................ [ 80.751332] 00000050: 00 00 00 01 00 00 18 80 00 00 00 04 00 00 00 00 ................ [ 80.751333] 00000060: 00 00 06 48 bd a5 10 00 08 00 00 02 00 00 00 00 ...H............ [ 80.751334] 00000070: 00 00 00 00 00 00 00 00 0c 0c 0b 01 0d 00 00 19 ................ [ 80.751338] XFS (loop1): SB validate failed with error -117.

Also xfs_repair doesn’t help either:

root@grml-2020-06 ~ # xfs_info ./sdd1.dd meta-data=./sdd1.dd isize=2048 agcount=4, agsize=6272 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=0, rmapbt=0 = reflink=0 data = bsize=4096 blocks=25088, imaxpct=25 = sunit=128 swidth=64 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=1608, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 root@grml-2020-06 ~ # xfs_repair ./sdd1.dd Phase 1 - find and verify superblock... bad primary superblock - bad stripe width in superblock !!! attempting to find secondary superblock... ..............................................................................................Sorry, could not find valid secondary superblock Exiting now.

With the “SB stripe unit sanity check failed” message, we could easily track this down to the following commit fa4ca9c:

% git show fa4ca9c5574605d1e48b7e617705230a0640b6da | cat commit fa4ca9c5574605d1e48b7e617705230a0640b6da Author: Dave Chinner <dchinner@redhat.com> Date: Tue Jun 5 10:06:16 2018 -0700 xfs: catch bad stripe alignment configurations When stripe alignments are invalid, data alignment algorithms in the allocator may not work correctly. Ensure we catch superblocks with invalid stripe alignment setups at mount time. These data alignment mismatches are now detected at mount time like this: XFS (loop0): SB stripe unit sanity check failed XFS (loop0): Metadata corruption detected at xfs_sb_read_verify+0xab/0x110, xfs_sb block 0xffffffffffffffff XFS (loop0): Unmount and run xfs_repair XFS (loop0): First 128 bytes of corrupted metadata buffer: 0000000091c2de02: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 10 00 XFSB............ 0000000023bff869: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000000cdd8c893: 17 32 37 15 ff ca 46 3d 9a 17 d3 33 04 b5 f1 a2 .27...F=...3.... 000000009fd2844f: 00 00 00 00 00 00 00 04 00 00 00 00 00 00 06 d0 ................ 0000000088e9b0bb: 00 00 00 00 00 00 06 d1 00 00 00 00 00 00 06 d2 ................ 00000000ff233a20: 00 00 00 01 00 00 10 00 00 00 00 01 00 00 00 00 ................ 000000009db0ac8b: 00 00 03 60 e1 34 02 00 08 00 00 02 00 00 00 00 ...`.4.......... 00000000f7022460: 00 00 00 00 00 00 00 00 0c 09 0b 01 0c 00 00 19 ................ XFS (loop0): SB validate failed with error -117. And the mount fails. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> diff --git fs/xfs/libxfs/xfs_sb.c fs/xfs/libxfs/xfs_sb.c index b5dca3c8c84d..c06b6fc92966 100644 --- fs/xfs/libxfs/xfs_sb.c +++ fs/xfs/libxfs/xfs_sb.c @@ -278,6 +278,22 @@ xfs_mount_validate_sb( return -EFSCORRUPTED; } + if (sbp->sb_unit) { + if (!xfs_sb_version_hasdalign(sbp) || + sbp->sb_unit > sbp->sb_width || + (sbp->sb_width % sbp->sb_unit) != 0) { + xfs_notice(mp, "SB stripe unit sanity check failed"); + return -EFSCORRUPTED; + } + } else if (xfs_sb_version_hasdalign(sbp)) { + xfs_notice(mp, "SB stripe alignment sanity check failed"); + return -EFSCORRUPTED; + } else if (sbp->sb_width) { + xfs_notice(mp, "SB stripe width sanity check failed"); + return -EFSCORRUPTED; + } + + if (xfs_sb_version_hascrc(&mp->m_sb) && sbp->sb_blocksize < XFS_MIN_CRC_BLOCKSIZE) { xfs_notice(mp, "v5 SB sanity check failed");

This change is included in kernel versions 4.18-rc1 and newer:

% git describe --contains fa4ca9c5574605d1e48 v4.18-rc1~37^2~14

Now let’s try with an older kernel version (4.9.0), using old Grml 2017.05 release:

root@grml ~ # grml-version grml64-small 2017.05 Release Codename Freedatensuppe [2017-05-31] root@grml ~ # uname -a Linux grml 4.9.0-1-grml-amd64 #1 SMP Debian 4.9.29-1+grml.1 (2017-05-24) x86_64 GNU/Linux root@grml ~ # lsb_release -a No LSB modules are available. Distributor ID: Debian Description: Debian GNU/Linux 9.0 (stretch) Release: 9.0 Codename: stretch root@grml ~ # grml-hostname grml-2017-05 Setting hostname to grml-2017-05: done root@grml ~ # exec zsh root@grml-2017-05 ~ # root@grml-2017-05 ~ # xfs_info ./sdd1.dd xfs_info: ./sdd1.dd is not a mounted XFS filesystem 1 root@grml-2017-05 ~ # xfs_repair ./sdd1.dd Phase 1 - find and verify superblock... bad primary superblock - bad stripe width in superblock !!! attempting to find secondary superblock... ..............................................................................................Sorry, could not find valid secondary superblock Exiting now. 1 root@grml-2017-05 ~ # mount ./sdd1.dd /mnt root@grml-2017-05 ~ # mount -t xfs /root/sdd1.dd on /mnt type xfs (rw,relatime,attr2,inode64,sunit=1024,swidth=512,noquota) root@grml-2017-05 ~ # ls /mnt activate.monmap active block block_uuid bluefs ceph_fsid fsid keyring kv_backend magic mkfs_done ready require_osd_release systemd type whoami root@grml-2017-05 ~ # xfs_info /mnt meta-data=/dev/loop1 isize=2048 agcount=4, agsize=6272 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1 spinodes=0 rmapbt=0 = reflink=0 data = bsize=4096 blocks=25088, imaxpct=25 = sunit=128 swidth=64 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=1608, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0

Mounting there indeed works! Now, if we mount the filesystem with new and proper sunit/swidth settings using the older kernel, it should rewrite them on disk:

root@grml-2017-05 ~ # mount -t xfs -o sunit=512,swidth=512 ./sdd1.dd /mnt/ root@grml-2017-05 ~ # umount /mnt/

And indeed, mounting this rewritten filesystem then also works with newer kernels:

root@grml-2020-06 ~ # mount ./sdd1.rewritten /mnt/ root@grml-2020-06 ~ # xfs_info /root/sdd1.rewritten meta-data=/dev/loop1 isize=2048 agcount=4, agsize=6272 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=0, rmapbt=0 = reflink=0 data = bsize=4096 blocks=25088, imaxpct=25 = sunit=64 swidth=64 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=1608, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 root@grml-2020-06 ~ # mount -t xfs /root/sdd1.rewritten on /mnt type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=512,swidth=512,noquota)

FTR: The ‘sunit=512,swidth=512‘ from the xfs mount option is identical to xfs_info’s output ‘sunit=64,swidth=64‘ (because mount.xfs’s sunit value is given in 512-byte block units, see man 5 xfs, and the xfs_info output reported here is in blocks with a block size (bsize) of 4096, so ‘sunit = 512*512 := 64*4096‘).

mkfs uses minimum and optimal sizes for stripe unit and stripe width; you can check this e.g. via (note that server2 with fixed firmware version reports proper values, whereas server3 with broken controller firmware reports non-sense):

synpromika@server2 ~ % for i in /sys/block/sd*/queue/ ; do printf "%s: %s %s\n" "$i" "$(cat "$i"/minimum_io_size)" "$(cat "$i"/optimal_io_size)" ; done [...] /sys/block/sdc/queue/: 262144 262144 /sys/block/sdd/queue/: 262144 262144 /sys/block/sde/queue/: 262144 262144 /sys/block/sdf/queue/: 262144 262144 /sys/block/sdg/queue/: 262144 262144 /sys/block/sdh/queue/: 262144 262144 /sys/block/sdi/queue/: 262144 262144 /sys/block/sdj/queue/: 262144 262144 /sys/block/sdk/queue/: 262144 262144 /sys/block/sdl/queue/: 262144 262144 /sys/block/sdm/queue/: 262144 262144 /sys/block/sdn/queue/: 262144 262144 [...] synpromika@server3 ~ % for i in /sys/block/sd*/queue/ ; do printf "%s: %s %s\n" "$i" "$(cat "$i"/minimum_io_size)" "$(cat "$i"/optimal_io_size)" ; done [...] /sys/block/sdc/queue/: 524288 262144 /sys/block/sdd/queue/: 524288 262144 /sys/block/sde/queue/: 524288 262144 /sys/block/sdf/queue/: 524288 262144 /sys/block/sdg/queue/: 524288 262144 /sys/block/sdh/queue/: 524288 262144 /sys/block/sdi/queue/: 524288 262144 /sys/block/sdj/queue/: 524288 262144 /sys/block/sdk/queue/: 524288 262144 /sys/block/sdl/queue/: 524288 262144 /sys/block/sdm/queue/: 524288 262144 /sys/block/sdn/queue/: 524288 262144 [...]

This is the underlying reason why the initially created XFS partitions were created with incorrect sunit/swidth settings. The broken firmware of server1 and server3 was the cause of the incorrect settings – they were ignored by old(er) xfs/kernel versions, but treated as an error by new ones.

Make sure to also read the XFS FAQ regarding “How to calculate the correct sunit,swidth values for optimal performance”. We also stumbled upon two interesting reads in RedHat’s knowledge base: 5075561 + 2150101 (requires an active subscription, though) and #1835947.

Am I affected? How to work around it?

To check whether your XFS mount points are affected by this issue, the following command line should be useful:

awk '$3 == "xfs"{print $2}' /proc/self/mounts | while read mount ; do echo -n "$mount " ; xfs_info $mount | awk '$0 ~ "swidth"{gsub(/.*=/,"",$2); gsub(/.*=/,"",$3); print $2,$3}' | awk '{ if ($1 > $2) print "impacted"; else print "OK"}' ; done

If you run into the above situation, the only known solution to get your original XFS partition working again, is to boot into an older kernel version again (4.17 or older), mount the XFS partition with correct sunit/swidth settings and then boot back into your new system (kernel version wise).

Lessons learned
  • document everything and ensure to have all relevant information available (including actual times of changes, used kernel/package/firmware/… versions. The thorough documentation was our most significant asset in this case, because we had all the data and information we needed during the emergency handling as well as for the post mortem/RCA)
  • if something changes unexpectedly, dig deeper
  • know who to ask, a network of experts pays off
  • including timestamps in your shell makes reconstruction easier (the more people and documentation involved, the harder it gets to wade through it)
  • keep an eye on changelogs/release notes
  • apply regular updates and don’t forget invisible layers (e.g. BIOS, controller/disk firmware, IPMI/OOB (ILO/RAC/IMM/…) firmware)
  • apply regular reboots, to avoid a possible delta becoming bigger (which makes debugging harder)

Thanks: Darshaka Pathirana, Chris Hofstaedtler and Michael Hanscho.

Looking for help with your IT infrastructure? Let us know!

Sean Whitton: consfigurator-live-build

Thursday 8th of April 2021 11:35:00 PM

One of my goals for Consfigurator is to make it capable of installing Debian to my laptop, so that I can stop booting to GRML and manually partitioning and debootstrapping a basic system, only to then turn to configuration management to set everything else up. My configuration management should be able to handle the partitioning and debootstrapping, too.

The first stage was to make Consfigurator capable of debootstrapping a basic system, chrooting into it, and applying other arbitrary configuration, such as installing packages. That’s been in place for some weeks now. It’s sophisticated enough to avoid starting up newly installed services, but I still need to add some bind mounting.

Another significant piece is teaching Consfigurator how to partition block devices. That’s quite tricky to do in a sufficiently general way – I want to cleanly support various combinations of LUKS, LVM and regular partitions, including populating /etc/crypttab and /etc/fstab. I have some ideas about how to do it, but it’ll probably take a few tries to get the abstractions right.

Let’s imagine that code is all in place, such that Consfigurator can be pointed at a block device and it will install a bootable Debian system to it. Then to install Debian to my laptop I’d just need to take my laptop’s disk drive out and plug it into another system, and run Consfigurator on that system, as root, pointed at the block device representing my laptop’s disk drive. For virtual machines, it would be easy to write code which loop-mounts an empty disk image, and then Consfigurator could be pointed at the loop-mounted block device, thereby making the disk image file bootable.

This is adequate for virtual machines, or small single-board computers with tiny storage devices (not that I actually use any of those, but I want Consfigurator to be able to make disk images for them!). But it’s not much good for my laptop. I casually referred to taking out my laptop’s disk drive and connecting it to another computer, but this would void my laptop’s warranty. And Consfigurator would not be able to update my laptop’s NVRAM, as is needed on UEFI systems.

What’s wanted here is a live system which can run Consfigurator directly on the laptop, pointed at the block device representing its physical disk drive. Ideally this live system comes with a chroot with the root filesystem for the new Debian install already built, so that network access is not required, and all Consfigurator has to do is partition the drive and copy in the contents of the chroot. The live system could be set up to automatically start doing that upon boot, but another option is to just make Consfigurator itself available to be used interactively. The user boots the live system, starts up Emacs, starts up Lisp, and executes a Consfigurator deployment, supplying the block device representing the laptop’s disk drive as an argument to the deployment. Consfigurator goes off and partitions that drive, copies in the contents of the chroot, and executes grub-install to make the laptop bootable. This is also much easier to debug than a live system which tries to start partitioning upon boot. It would look something like this:

;; melete.silentflame.com is a Consfigurator host object representing the ;; laptop, including information about the partitions it should have (deploy-these :local ... (chroot:partitioned-and-installed melete.silentflame.com "/srv/chroot/melete" "/dev/nvme0n1"))

Now, building live systems is a fair bit more involved than installing Debian to a disk drive and making it bootable, it turns out. While I want Consfigurator to be able to completely replace the Debian Installer, I decided that it is not worth trying to reimplement the relevant parts of the Debian Live tool suite, because I do not need to make arbitrary customisations to any live systems. I just need to have some packages installed and some files in place. Nevertheless, it is worth teaching Consfigurator how to invoke Debian Live, so that the customisation of the chroot which isn’t just a matter of passing options to lb_config(1) can be done with Consfigurator. This is what I’ve ended up with – in Consfigurator’s source code:

(defpropspec image-built :lisp (config dir properties) "Build an image under DIR using live-build(7), where the resulting live system has PROPERTIES, which should contain, at a minimum, a property from CONSFIGURATOR.PROPERTY.OS setting the Debian suite and architecture. CONFIG is a list of arguments to pass to lb_config(1), not including the '-a' and '-d' options, which Consfigurator will supply based on PROPERTIES. This property runs the lb_config(1), lb_bootstrap(1), lb_chroot(1) and lb_binary(1) commands to build or rebuild the image. Rebuilding occurs only when changes to CONFIG or PROPERTIES mean that the image is potentially out-of-date; e.g. if you just add some new items to PROPERTIES then in most cases only lb_chroot(1) and lb_binary(1) will be re-run. Note that lb_chroot(1) and lb_binary(1) both run after applying PROPERTIES, and might undo some of their effects. For example, to configure /etc/apt/sources.list, you will need to use CONFIG not PROPERTIES." (:desc (declare (ignore config properties)) #?"Debian Live image built in ${dir}") (let* (...) ;; ... `(eseqprops ;; ... (on-change (eseqprops (on-change (file:has-content ,auto/config ,(auto/config config) :mode #o755) (file:does-not-exist ,@clean) (%lbconfig ,dir) (%lbbootstrap t ,dir)) (%lbbootstrap nil ,dir) (deploys ((:chroot :into ,chroot)) ,host)) (%lbchroot ,dir) (%lbbinary ,dir)))))

Here, %lbconfig is a property running lb_config(1), %lbbootstrap one which runs lb_bootstrap(1), etc. Those properties all just change directory to the right place and run the command, essentially, with a little extra code to handle failed debootstraps and the like.

The ON-CHANGE and ESEQPROPS combinators work together to sequence the interaction of the Debian Live suite and Consfigurator.

  • In the innermost ON-CHANGE expression: create the file auto/config and populate it with the call to lb_config(1) that we need to make, as described in the Debian Live manual, chapter 6.

    • If doing so resulted in a change to the auto/config file – e.g. the user added some more options – ensure that lb_config(1) and lb_bootstrap(1) both get rerun.
  • Now in the inner ESEQPROPS expression, use DEPLOYS to configure the chroot, essentially by forking into the chroot and recursively reinvoking Consfigurator.

  • Finally, if any of the above resulted in a change being made, call lb_chroot(1) and lb_binary(1).

This way, we only rebuild the chroot if the configuration changed, and we only rebuild the image if the chroot changed.

Now over in my personal consfig:

(try-register-data-source :git-snapshot :name "consfig" :repo #P"src/cl/consfig/" ...) (defproplist hybrid-live-iso-built :lisp () "Build a Debian Live system in /srv/live/spw. Typically this property is not applied in a DEFHOST form, but rather run as needed at the REPL. The reason for this is that otherwise the whole image will get rebuilt each time a commit is made to my dotfiles repo or to my consfig." (:desc "Sean's Debian Live system image built") (live-build:image-built. '("--archive-areas" "main contrib non-free" ...) "/srv/live/spw" (os:debian-stable "buster" :amd64) (basic-props) (apt:installed "whatever" "you" "want") (git:snapshot-extracted "/etc/skel/src" "dotfiles") (file:is-copy-of "/etc/skel/.bashrc" "/etc/skel/src/dotfiles/.bashrc") (git:snapshot-extracted "/root/src/cl" "consfig")))

The first argument to LIVE-BUILD:IMAGE-BUILT. is additional arguments to lb_config(1). The third argument onwards are the properties for the live system. The cool thing is GIT:SNAPSHOT-EXTRACTED – the calls to this ensure that a copy of my Emacs configuration and my consfig end up in the live image, ready to be used interactively to install Debian, as described above. I’ll need to add something like (chroot:host-chroot-bootstrapped melete.silentflame.com "/srv/chroot/melete") too.

As with everything Consfigurator-related, Joey Hess’s Propellor is the giant upon whose shoulders I’m standing.

Thorsten Alteholz: My Debian Activities in March 2021

Thursday 8th of April 2021 10:11:59 AM

FTP master

Things never turn out the way you expect, so this month I was only able to accept 38 packages and rejected none. Due to the freeze, the overall number of packages that got accepted was 88.

Debian LTS

This was my eighty-first month that I did some work for the Debian LTS initiative, started by Raphael Hertzog at Freexian.

This month my all in all workload has been 30h. During that time I did LTS and normal security uploads of:

  • [DLA 2606-1] lxml security update for one CVE
  • [DSA 4880-1] lxml security update for one CVE
  • [DLA 2611-1] ldb security update for two CVEs
  • [DLA 2612-1] leptonlib security update for four CVEs

I also prepared debdiffs for unstable and/or buster for leptonlib and libebml, which for one reason or another did not result in an upload yet.

Last but not least I did some days of frontdesk duties.

Debian ELTS

This month was the thirty-third ELTS month.

During my allocated time I uploaded:

  • ELA-388-1 for zeromq3
  • ELA-390-1 for lxml
  • ELA-391-1 for jasper
  • ELA-393-1 for ldb
  • ELA-394-1 for leptonlib

Last but not least I did some days of frontdesk duties.

Other stuff

On my neverending golang challenge I uploaded (or sponsored for thola dependencies):
golang-github-tombuildsstuff-giovanni, golang-github-apparentlymart-go-userdirs, golang-github-apparentlymart-go-shquot, golang-github-likexian-gokit, olang-gopkg-mail.v2, golang-gopkg-redis.v5, golang-github-facette-natsort, golang-github-opentracing-contrib-go-grpc, golang-github-felixge-fgprof, golang-ithub-gogo-status, golang-github-leanovate-gopter, golang-github-opentracing-basictracer-go, golang-github-lightstep-lightstep-tracer-common, golang-github-o-sourcemap-sourcemap, golang-github-igm-pubsub, golang-github-igm-sockjs-go, golang-github-centrifugal-protocol, golang-github-mna-redisc, golang-github-fzambia-eagle, golang-github-centrifugal-centrifuge, golang-github-chromedp-sysutil, golang-github-client9-misspell, golang-github-knq-snaker, cdproto-gen, golang-github-mattermost-xml-roundtrip-validator, golang-github-crewjam-saml, ssllabs-scan, golang-uber-automaxprocs, golang-uber-goleak, golang-github-k0kubun-go-ansi, golang-github-schollz-progressbar, golang-github-komkom-toml, golang-github-labstack-echo, golang-github-inexio-go-monitoringplugin

More in Tux Machines

Noise With Blanket

Videos/Audiocasts/Shows: Linux Journal Expats, Linux Experiment, and Krita Artwork

  • You Should Open Source Now, Ask Me How!

    Katherine Druckman chats with Petros Koutoupis and Kyle Rankin about FOSS (Free and Open Source Software), the benefits of contributing to the projects you use, and why you should be a FOSS fan as well.

  • System76 starts their own desktop environment, Arch goes the easy route - Linux & Open Source news

    This time, we have System76 working on their own desktop environment based on GNOME, Arch Linux adding a guided installer, Google winning its court case against Oracle on the use of Java in Android, and Facebook is leaking data online, again. Become a channel member to get access to a weekly patroncast and vote on the next topics I'll cover

  • Timelapse: inking a comic page in Krita (uncommented)

    An uncommented timelapse while inking this page 6 of episode 34 of my webcomic Pepper&Carrot ( https://www.peppercarrot.com/ ). During the process, I thought about activating the recorder and I even put a webcam so you can see what I'm doing on the tablet too. I'm not doing it for everypages; because you can imagine the weight on disk about saving around 10h of videos like this; and also how it is not multi-tasking: when I record, you don't see me open the door to get the mail of the postman, you don't see me cleaning temporary accident of a cat bringing back a mouse at home, you don't see me typing to solve a merge request issue to merge a translation of Pepper&Carrot.

Kernel Leftovers

  • [Intel-gfx] [RFC 00/28] Old platform/gen kconfig options series
  • Patches Resubmitted For Linux With Selectable Intel Graphics Platform Support

    Back in early 2018 were patches proposed for selectable platform support when building Intel's kernel graphics driver so users/distributions if desired could disable extremely old hardware support and/or cater kernel builds for specific Intel graphics generations. Three years later those patches have been re-proposed. The patches then and now are about allowing selectable Intel graphics "Gen" support at kernel configure/build time so that say the i8xx support could be removed or other specific generations of Intel graphics handled by the i915 kernel driver. This disabling could be done if phasing out older hardware support, seeking smaller kernel images, or other similar purposes. The patches don't change any default support levels but leaves things as-is and simply provides the knobs for disabling select generations of hardware.

  • Linux Kernel Runtime Guard 0.9.0 Is Released

    Linux Kernel Runtime Guard (LKRG) is a security module for the Linux kernel developed by Openwall. The latest release adds compatibility with Linux kernels up to soon to be released 5.12, support for building LKRG into kernel images, support for old 32-bit x86 machines and more. Loading the LKRG 0.9.0 module will cause a kernel panic and a complete halt if SELinux is enabled.

  • Hans de Goede: Logitech G15 and Z-10 LCD-screen support under Linux

    A while ago I worked on improving Logitech G15 LCD-screen support under Linux. I recently got an email from someone who wanted to add support for the LCD panel in the Logitech Z-10 speakers to lcdproc, asking me to describe the process I went through to improve G15 support in lcdproc and how I made it work without requiring the unmaintained g15daemon code.

Devuan 4.0 Alpha Builds Begin For Debian 11 Without Systemd

Debian 11 continues inching closer towards release and it looks like the developers maintaining the "Devuan" fork won't be far behind with their re-base of the distribution focused on init system freedom. The Devuan fork of Debian remains focused on providing Debian GNU/Linux without systemd. Devuan Beowulf 3.1 is their latest release based on Debian 10 while Devuan Chimaera is in the works as their re-base for Debian 11. Read more