Language Selection

English French German Italian Portuguese Spanish

Sci/Tech

Accurate Conclusions from Bogus Data: Methodological Issues in “Collaboration in the open-source arena: The WebKit case”

Filed under
Sci/Tech

Nearly five years ago, when I was in grad school, I stumbled across the paper Collaboration in the open-source arena: The WebKit case when trying to figure out what I would do for a course project in network theory (i.e. graph theory, not computer networking; I’ll use the words “graph” and “network” interchangeably). The paper evaluates collaboration networks, which are graphs where collaborators are represented by nodes and relationships between collaborators are represented by edges. Our professor had used collaboration networks as examples during lecture, so it seemed at least mildly relevant to our class, and I wound up writing a critique on this paper for the class project. In this paper, the authors construct collaboration networks for WebKit by examining the project’s changelog files to define relationships between developers. They perform “community detection” to visually group developers who work closely together into separate clusters in the graphs. Then, the authors use those graphs to arrive at various conclusions about WebKit (e.g. “[e]ven if Samsung and Apple are involved in expensive patent wars in the courts and stopped collaborating on hardware components, their contributions remained strong and central within the WebKit open source project,” regarding the period from 2008 to 2013).

At the time, I contacted the authors to let them know about some serious problems I found with their work. Then I left the paper sitting in a short-term to-do pile on my desk, where it has been sitting since Obama was president, waiting for me to finally write this blog post. Unfortunately, nearly five years later, the authors’ email addresses no longer work, which is not very surprising after so long — since I’m no longer a student, the email I originally used to contact them doesn’t work anymore either — so I was unable to contact them again to let them know that I was finally going to publish this blog post. Anyway, suffice to say that the conclusions of the paper were all correct; however, the networks used to arrive at those conclusions suffered from three different mistakes, each of which was, on its own, serious enough to invalidate the entire work.

So if the analysis of the networks was bogus, how did the authors arrive at correct conclusions anyway? The answer is confirmation bias. The study was performed by visually looking at networks and then coming to non-rigorous conclusions about the networks, and by researching the WebKit community to learn what is going on with the major companies involved in the project. The authors arrived at correct conclusions because they did a good job at the later, then saw what they wanted to see in the graphs.

I don’t want to be too harsh on the authors of this paper, though, because they decided to publish their raw data and methodology on the internet. They even published the python scripts they used to convert WebKit changelogs into collaboration graphs. Had they not done so, there is no way I would have noticed the third (and most important) mistake that I’ll discuss below, and I wouldn’t have been able to confirm my suspicions about the second mistake. You would not be reading this right now, and likely nobody would ever have realized the problems with the paper. The authors of most scientific papers are not nearly so transparent: many researchers today consider their source code and raw data to be either proprietary secrets to be guarded, or simply not important enough to merit publication. The authors of this paper deserve to be commended, not penalized, for their openness. Mistakes are normal in research papers, and open data is by far the best way for us to be able to detect mistakes when they happen.

Read more

LabPlot 2.8.1 released

Filed under
KDE
Software
Sci/Tech

We’re happy to announce the availability of the first minor patch release of the big release we made two months ago. This release contains minor improvements and bug fixes only.

In the plot we now allow to change the background color for axis labels. This is useful if you place the axis labels above the axis line and don’t want to see an underlying line in the bounding box of the label. The default setting is that the background remain transparent.

For the cursor, the tool used to measure positions and distances in the plots, we now allow you to copy the values in the result window to the clipboard.

When pasting new values into LabPlot’s spreadsheet, the auto-detection of the datatime format has been improved. We now better recognize the different formats produced in external programs and being pasted into LabPlot.

Many smaller improvements were included in the dialog for the creation of the live-data sources related to the handling of errors coming from remote servers like MQTT brokers, etc. Besides the more stable behavior, the user now also gets clearer notifications about what went wrong. Furthermore, when reading live data it is possible to generate the timestamp column in LabPlot for the data being read also for TCP and UDP network sources. This was only possible for MQTT sources in the past.

Read more

JASP: A Less Complicated Free Open-source SPSS Alternative for Advanced Statistics

Filed under
Software
Sci/Tech

I had a run with many open-source statistics software and packages, but JASP was truly unique among them.

JASP is a free open-source complete statistical package supported by University of Amsterdam. It's a multi-platform program that runs on Windows, Linux and macOS.

It's designed for users who want to do some statistical work without having to deal with programming or dive deep in learning complex statistical programs. It's a recommended option for students and researchers.

Read more

Senaite: An Open-source Enterprise-grade Laboratory Information Management System (LIMS)

Filed under
OSS
Sci/Tech

Senaite is a free open-source self-hosted laboratory information management system (LIMS) that built for enterprise. It offers several features which are cost and resources effective with a rich set of add-ons and a strong supportive community of developers behind it.

In this article we demonstrate Senaite's features and how it helps enterprise through an efficient management for labs, lab equipments and reduce the turnaround time.

Read more

NASA ROSES-20 Amendment 64: Release of Final text of E.8 Supplemental Open Source Software Awards

Filed under
OSS
Sci/Tech
Legal

Supplemental open source software awards are used to encourage the conversion of legacy software into modern code to be released under a generally accepted, open source license (e.g., Apache-2, BSD-2-clause, GPL). The supplement would add a software component to their previously selected "parent" research and analysis award.

ROSES-2020 Amendment 64 Releases Final text for E.8 Supplemental Open Source Software Awards. Notices of Intent are not requested. Proposals will be accepted on a rolling basis with a final due date of April 14, 2021.

Read more

Chemtool: Open-source Chemical Structure drawing program

Filed under
Software
Sci/Tech

Chemtool is a lightweight application for drawing chemical structures like organic molecules. It's originally written by Thomas Volk from Germany. Later on, more developers came to aid for development and code maintenance.

[...]

The program is created for Linux X systems, it does not work on Windows or macOS.

License

Chemtool is released under GNU General Public License.

Read more

Stellarium 0.20.3 Released with Tons of Changes [Ubuntu PPA]

Filed under
Software
Sci/Tech
SciFi

Free-software planetarium Stellarium 0.20.3 was released a day ago with numerous changes. Here’s how to install it in Ubuntu 18.04, Ubuntu 20.04 via PPA.

Stellarium 0.20.3 fixed nutation and, with it, season beginning times, included many changes in AstroCalc tool, Oculars and Satellites plugins, and updated DSO catalog.

Read more

LabPlot 2.8 Released

Filed under
KDE
Software
Sci/Tech

In 2.8 we made it easier to access many online resources that provide data sets for educational purposes. These data sets cover a variety of different areas, such as physics, statistics, medicine, etc., and are usually organized in collections.

Read more

“It Just Works”: An Interview with Dexai Robotics

Filed under
Linux
Interviews
Sci/Tech

The simulators wind up using a lot of computational power, which is one of the reasons why we use System76. Portability is another. I really like the fact that I can run the full software stack on a laptop that I can always have with me. Previously, we had desktops sitting around in a lab environment, and people were often having to sign into them and borrow them. We needed a solution for new hires to have a computer they can rely on at all times.

A co-worker mentioned that she bought a machine from you guys back in 2019. After she recommended it, I did a little bit of digging online for the best Linux laptops available, and you all were named a fair amount in those searches—so I ordered one. I was pleasantly surprised with how it just worked right out of the box. I wasn’t fiddling with drivers, I wasn’t dealing with bootloader problems and figuring out how to get a working desktop environment up; I just opened it up and installed a bunch of software and I was ready to go.

Read more

CAELinux 2020: Linux for engineering

Filed under
GNU
Linux
Sci/Tech

CAELinux is a distribution focused on computer-aided engineering (CAE) maintained by Joël Cugnoni. Designed with students and academics in mind, the distribution is loaded with open-source software that can be used to model everything from pig livers to airfoils. Cugnoni's latest release, CAELinux 2020, was made on August 11; readers with engineering interests may want to take a look.

CAELinux's first stable version was released in 2007 and was based on PCLinuxOS 2007. The distribution was created to make the GPL-licensed finite element analysis tool Salome-Meca easier to obtain. CAELinux 2020 is now the eighth release of the distribution, which is based on Xubuntu 18.04 LTS, and has expanded its focus over the years into an impressive array of open-source CAE-related tools.

The minimum requirements for CAELinux 2020 are a x86-64 platform with 4GB of RAM for "simple analysis." For professional use, the project recommends 8GB of RAM or more with a "modern AMD/NVidia graphic card." The entire distribution can be run from an 8GB USB memory drive, with the option to install it to disk (35GB minimum). For those users (like me) who wanted to run the distribution as a virtual machine, the project recommends the commercial VMware Player over the open-source VirtualBox project due to "some graphical limitations" of VirtualBox.

There are too many different software packages unique to the CAELinux distribution to cover them all in a single article. Since the distribution is built on top of Xubuntu, CAELinux comes with all of the standard tools available in the base distribution. In addition to the standard packages, however, CAELinux bundles CAE pre/post processors, CAD and CAM software, finite element solvers, computational fluid dynamics applications, circuit board design tools, biomedical image processing software, and a large array of programming language packages. A review of the release announcement provides a full list of the specific open-source projects available, including a few web-based tools that merely launch the included browser to the appropriate URL.

It would be impossible for me to claim familiarity with the full range of tools provided, but I was familiar with many. For example, FreeCAD has been written about at LWN, and CAMLab was used in our article on open-source CNC manufacturing. I have personally used other bundled packages like FlatCAM for isolation routing of homemade circuit boards and Cura to slice 3D models for printing. What was particularly neat about exploring the distribution was getting introduced to new open-source software that matched my interests. I discovered KiCad EDA's PCB Calculator utility (simple, but handy), and I am looking forward to checking out CAMotics as another CAM alternative for my CNC router.

Read more

Syndicate content

More in Tux Machines

EasyOS Dunfell 2.6.1 released for x86_64 PC

Yesterday announced EasyOS Dunfell 2.6.1 aarch64 for the Raspberry Pi4: https://bkhome.org/news/202101/easyos-dunfell-261-released-for-the-raspberry-pi4.html Today it is the turn for EasyOS Dunfell-series 2.6.1 64-bit on the PC. This is the first official release in this series. Same packages compiled in OpenEmbedded. Latest SeaMonkey 2.53.6. A different kernel for the PC build, 5.10.11. Read all about it here: http://distro.ibiblio.org/easyos/amd64/releases/dunfell/2.6.1/release-notes-2.6.1.htm As stated in the release notes, all three streams are being sync'ed to the same version number. The Buster-series 2.6.1 will probably be uploaded tomorrow. I have to compile the latest 5.4.x kernel, and SeaMonkey 2.53.6. As to which you would choose for the PC, it is like asking "which is better, strawberry icecream or chocolate icecream?" Read more

Top 20 Uses of Linux

The Linux OS and its related distros and flavors have transformed it from hardcore software into an industrial brand. Even if you are not a fan of it, the Linux OS might be as common as the air you breathe if you closely analyze your day to day interactive activities. Almost all the modern technologies that transform and innovate the tech industry have a Linux OS DNA imprinted on them. Those that are yet to be branded with their innovative uniqueness and recognition are waiting in line for the famed chance. Therefore, you might boldly claim that the Linux OS does not run your life, but the world around you cannot avoid the flirty pursuits of this open-source and free software. Nowadays, almost anything that can be described as cool is either pursuing Linux or is being pursued by Linux. It is the perfect symbiotic relationship in a world that tries to find a balance in technology and innovation. This article explores the awesomeness and outreach of the Linux OS in the world around us. It might even be an eye-opener for some of us to start taking our Linux skills to the next level. Top500 quotes Linux as the powerhouse or engine behind five-hundred fastest computers worldwide. I do not know of the speed of the computer composing this article or whether it qualifies to be among the listed five-hundred fastest computers worldwide. However, one thing is certain; it is 100% Linux DNA. On this note, let us start parading the top 20 uses of Linux. Read more

parted-3.4 released [stable]

Parted 3.4 has been released.  This release includes many bug fixes and new features. 
Here is Parted's home page: 
    http://www.gnu.org/software/parted/ 
For a summary of all changes and contributors, see: 
  https://git.savannah.gnu.org/cgit/parted.git/log/?h=v3.4 
or run this command from a git-cloned parted directory: 
  git shortlog v3.3..v3.4 (appended below) 
Here are the compressed sources and a GPG detached signature[*]: 
  http://ftp.gnu.org/gnu/parted/parted-3.4.tar.xz 
  http://ftp.gnu.org/gnu/parted/parted-3.4.tar.xz.sig 
Use a mirror for higher download bandwidth: 
  https://www.gnu.org/order/ftp.html 
[*] Use a .sig file to verify that the corresponding file (without the 
.sig suffix) is intact.  First, be sure to download both the .sig file 
and the corresponding tarball.  Then, run a command like this: 
  gpg --verify parted-3.4.tar.xz.sig 
If that command fails because you don't have the required public key, 
then run this command to import it: 
  gpg --keyserver keys.gnupg.net --recv-keys 117E8C168EFE3A7F 
and rerun the 'gpg --verify' command. 
This release was bootstrapped with the following tools: 
  Autoconf 2.69 
  Automake 1.16.1 
  Gettext 0.21 
  Gnulib v0.1-4131-g252c4d944a 
  Gperf 3.1 
Read more

Kernel: LWN's Latest and IO_uring Patches

  • Resource limits in user namespaces

    User namespaces provide a number of interesting challenges for the kernel. They give a user the illusion of owning the system, but must still operate within the restrictions that apply outside of the namespace. Resource limits represent one type of restriction that, it seems, is proving too restrictive for some users. This patch set from Alexey Gladkov attempts to address the problem by way of a not-entirely-obvious approach. Consider the following use case, as stated in the patch series. Some user wants to run a service that is known not to fork within a container. As a way of constraining that service, the user sets the resource limit for the number of processes to one, explicitly preventing the process from forking. That limit is global, though, so if this user tries to run two containers with that service, the second one will exceed the limit and fail to start. As a result, our user becomes depressed and considers a career change to goat farming. Clearly, what is needed is a way to make at least some resource limits apply on per-container basis; then each container could run its service with the process limit set to one and everybody will be happy (except perhaps the goats).

  • Fast commits for ext4

    The Linux 5.10 release included a change that is expected to significantly increase the performance of the ext4 filesystem; it goes by the name "fast commits" and introduces a new, lighter-weight journaling method. Let us look into how the feature works, who can benefit from it, and when its use may be appropriate. Ext4 is a journaling filesystem, designed to ensure that filesystem structures appear consistent on disk at all times. A single filesystem operation (from the user's point of view) may require multiple changes in the filesystem, which will only be coherent after all of those changes are present on the disk. If a power failure or a system crash happens in the middle of those operations, corruption of the data and filesystem structure (including unrelated files) is possible. Journaling prevents corruption by maintaining a log of transactions in a separate journal on disk. In case of a power failure, the recovery procedure can replay the journal and restore the filesystem to a consistent state. The ext4 journal includes the metadata changes associated with an operation, but not necessarily the related data changes. Mount options can be used to select one of three journaling modes, as described in the ext4 kernel documentation. data=ordered, the default, causes ext4 to write all data before committing the associated metadata to the journal. It does not put the data itself into the journal. The data=journal option, instead, causes all data to be written to the journal before it is put into the main filesystem; as a side effect, it disables delayed allocation and direct-I/O support. Finally, data=writeback relaxes the constraints, allowing data to be written to the filesystem after the metadata has been committed to the journal. Another important ext4 feature is delayed allocation, where the filesystem defers the allocation of blocks on disk for data written by applications until that data is actually written to disk. The idea is to wait until the application finishes its operations on the file, then allocate the actual number of data blocks needed on the disk at once. This optimization limits unneeded operations related to short-lived, small files, batches large writes, and helps ensure that data space is allocated contiguously. On the other hand, the writing of data to disk might be delayed (with the default settings) by a minute or so. In the default data=ordered mode, where the journal entry is written only after flushing all pending data, delayed allocation might thus delay the writing of the journal. To assure data is actually written to disk, applications use the fsync() or fdatasync() system calls, causing the data (and the journal) to be written immediately.

  • MAINTAINERS truth and fiction

    Since the release of the 5.5 kernel in January 2020, there have been almost 87,000 patches from just short of 4,600 developers merged into the mainline repository. Reviewing all of those patches would be a tall order for even the most prolific of kernel developers, so decisions on patch acceptance are delegated to a long list of subsystem maintainers, each of whom takes partial or full responsibility for a specific portion of the kernel. These maintainers are documented in a file called, surprisingly, MAINTAINERS. But the MAINTAINERS file, too, must be maintained; how well does it reflect reality? The MAINTAINERS file doesn't exist just to give credit to maintainers; developers make use of it to know where to send patches. The get_maintainer.pl script automates this process by looking at the files modified by a patch and generating a list of email addresses to send it to. Given that misinformation in this file can send patches astray, one would expect it to be kept up-to-date. Recently, your editor received a suggestion from Jakub Kicinski that there may be insights to be gleaned from comparing MAINTAINERS entries against activity in the real world. A bit of Python bashing later, a new analysis script was born.

  • Experimental Patches Allow For New Ioctls To Be Built Over IO_uring

    IO_uring continues to be one of the most exciting technical innovations in the Linux kernel in recent years not only for more performant I/O but also opening up other doors for new Linux innovations. IO_uring has continued adding features since being mainlined in 2019 and now the newest proposed feature is the ability to build new ioctls / kernel interfaces atop IO_uring. The idea of supporting kernel ioctls over IO_uring has been brought up in the past and today lead IO_uring developer Jens Axboe sent out his initial patches. These initial patches are considered experimental and sent out as "request for comments" - they provide the infrastructure to provide a file private command type with IO_uring handling the passing of the arbitrary data.