The Linux Foundation recently added the Platform for Network Data Analytics (PNDA)
Summary: Network/traffic analytics for Tux Machines
ULTIMATELY, here in Tux Machines we strive to include every bit of relevant news (standalone pages for more important news, clusters of links for the rest, grouped by topic). We rarely blog although sometimes we add an opinion (marked "Ed", shorthand for "Editor").
It has been a long time since we last wrote about statistics. As readers may know by now, we only retain logs for up to 4 weeks (security/diagnostics purposes), then these get deleted for good so as to maintain privacy (we cannot be compelled to hand over data). Those logs show only direct hits, they don't include pages served through the cache* (Varnish) and here is the latest, where the date stands for "week ending":
-rw-r--r--. 1 root root 224439408 Aug 7 03:17 access.log-20160807 -rw-r--r--. 1 root root 310050330 Aug 14 03:22 access.log-20160814 -rw-r--r--. 1 root root 343901488 Aug 21 03:17 access.log-20160821 -rw-r--r--. 1 root root 344256886 Aug 28 03:15 access.log-20160828
The above indicates that, judging by the back end (not cache), traffic continues to increase. Over the past week the site was sometimes unbearably slow if not inaccessible. In the worse case we'll upgrade the server for extra capacity, assuring decent speed. Worth noting is that in the latest log (ending August 28th) less than 1,000 hits came from Edge, so very few among our visitors use the latest and 'greatest' from Microsoft. █
* The cache server services several domains, notably Tux Machines and Techrights, and it averages at around 1.5 GB of traffic per hour.
THE past few weeks were exceptionally busy for the site as readership grew considerably and the site turned 12. Originally, the site did not share Linux news but had various other sections. Years later Susan Linton made it the success story it is today and in 2013-2014 we only modernised the theme and kept the old tradition, format, etc. We hope this pleases longtime readers of the site. Comments on how the site is run are always taken into account. █
FOR those who may be wondering, we didn't get compromised or anything. We never had any such incidents. What happened earlier is that some spammer, who had created an account before we limited account creation (due to spam), made the spam expandable to the whole screen, covering many of the pages with that spam (overlay). We are working on code to help prevent such spamming so that legitimate users can post comments etc. without spammers ruining the experience for everyone else. █
Summary: A 2-hour investigation reveals that Tux Machines is now the victim of an arrogant, out-of-control Baidu
TUX MACHINES has been mostly offline later this morning. It has evidently become the victim of Baidu's lawlessness, having fallen under huge dumps of requests from IP addresses which can be traced back to Baidu and whose requests say Baidu as well (we tried blocking these, but it's not easy to do by IP because they have so many). They don't obey
robots.txt rules; not even close! It turns out that others suffer from this as well. These A-holes have been causing a lot of problems to the site as of late (slowdowns was one of those problems), including damage to the underlying framework. Should we report them? To who exactly? Looking around the Web, there are no contact details (in English anyway) by which to reach them.
Baidu can be very evil towards Web sites. Evil. Just remember that. █
Update: 3 major DDOS attacks (so far today) led to a lot of problems and they also revealed that not Baidu was at fault but botmasters who used "Baidu" to masquerade themselves, hiding among some real and legitimate requests from Baidu (with Baidu-owned IP addresses). We have changed our firewall accordingly. We don't know who's behind these attacks and what the motivations may be.
I love the KDE desktop—I really do. However… here are some grumbles.
QUIETLY but surely, last week marked an important milestone, with traffic at the back end (not the cache layer*) exceeding 1.8 million hits, thus establishing a new record. So far this week it looks as though we are going to break this record again. We hope that the new format, which places emphasis on high importance links (as standalone nodes) and puts less important links in topical groups (grouping like games or howtos), makes reading the site more convenient and makes keeping abreast of the news easier, without getting overloaded in a way that is not somewhat manageable (links inside groups are typically less important, as intended). We're open to any suggestions readers may have to ensure we remain a leading syndicator of GNU/Linux and Free/Open Source software news. Any feedback can improve the site. █
* It is difficult to measure what happens at the Varnish layer as it's shared among several domains, including Techrights.
IN CASE it's not already obvious, we have been posting fewer links since the 14th of this month because we are both away and we catch up with some news only when time permits. Today's hot day (38 degrees) will probably allow us to stay indoors more time than usual and therefore post some more links (from Rianne's laptop), but a week for now is when we'll properly catch up with everything that was missed and gradually get back to normal, hopefully for a long time to come.
Please bear with us while we enjoy our last chance to have a summer vacation. It's already cold back home in Manchester. █
Earlier this month my husband and I needed a replacement for the Chromebook that I had installed Linux on after Christmas because the keyboard developed a fault. This was a good opportunity to get an upgrade and to connect the 28-inch monitor to it, allowing us to watch Wimbledon over the Internet (we don't watch TV).
Setting up the machine:
It comes with Chrome OS, but I don't want that:
Switch to developer mode:
Setting it up to not be so locked down:
With Roy's help, installing Ubuntu LTS:
Running KDE/Plasma (my favourite):
Running Unity (which I still try to use on a daily basis after using KDE for years):
We have since then bought a cabinet for the external screen and Roy finished building it 2 days ago, so now we can watch shows while we work (4 screen combined using Synergy). █
Summary: Some numbers to show what goes on in sites that do not share information about their visitors (unlike Windows-centric sites which target non-technical audiences)
THE common perception of GNU/Linux is that it is scarcely used, based on statistics gathered from privacy-hostile Web sites that share (or sell) access log data, embed spyware in all of their pages, and so on. Our sites are inherently different because of a reasonable -- if not sometimes fanatic -- appreciation of privacy at both ends (server and client). People who read technical sites know how to block ads, impede spurious scripts etc. These sites also actively avoid anything which is privacy-infringing, such as interactive 'social' media buttons (these let third parties spy on all visitors in all pages).
Techrights and Tux Machines attract the lion's share our traffic (and server capacity). They both have dedicated servers. These are truly popular and some of the leaders in their respective areas. Techrights deals with threats to software freedom, whereas Tux Machines is about real-time news discovery and organisation (pertaining to Free software and GNU/Linux).
The Varnish layer, which protects both of these large sites (nearly 100,000 pages in each, necessitating a very large cache pool), handles somewhere between a gigabyte to 2.5 gigabytes of data per hour (depending on the time of day, usually somewhere in the middle of this range, on average).
The Apache layer, which now boasts 32 GB of RAM and sports many CPU cores, handled 1,324,232 hits for Techrights (ranked 6636th for traffic in Netcraft) in this past week and 1,065,606 for Tux Machines (ranked 6214th for traffic in Netcraft).
Based on VISITORS Web Log Analyzer, this is what we've had in Techrights:
Unknown: (e.g. bots/spiders): (23.0%)
As a graph (charted with LibreOffice):
Tux Machines reveals a somewhat different pattern. Based on
grepping/filtering the of past month's log at the Apache back end (not Varnish, which would have been a more sensible but harder thing to do), presenting the top 3 only:
One month is as far as retention goes, so it's not possible to show long-term trends (as before, based on Susan's summary of data). Logs older than that are automatically deleted, as promised, for both sites -- forever! We just need a small tail of data (temporarily) for DDOS prevention. █
IN the coming days we will prioritise very recent news and of course important news, but at the same time we shall be catching up with some older but important news that we missed. This means that some older items (one or two weeks old) may occasionally appear. In lieu with requests from readers we will also stop abbreviating long summaries of news, such as today's leftovers and howto roundups. █
THIS COMING WEEK, starting Tuesday in particular, will be a lot less busy than usual because Rianne and I are flying away and will be absent for a couple of weeks. Depending on availability of Wi-Fi, we ought to be able to still post some links, just not the usual volume of links.
We kindly ask anyone who is interested and willing to submit links highlighting relevant news, as every registered user can do that. It will greatly help us run the site while we are very far away in east Asia. █