Language Selection

English French German Italian Portuguese Spanish

How to use Spark SQL: A hands-on tutorial

Filed under
OSS
HowTos

In the first part of this series, we looked at advances in leveraging the power of relational databases "at scale" using Apache Spark SQL and DataFrames. We will now do a simple tutorial based on a real-world dataset to look at how to use Spark SQL. We will be using Spark DataFrames, but the focus will be more on using SQL. In a separate article, I will cover a detailed discussion around Spark DataFrames and common operations.

I love using cloud services for my machine learning, deep learning, and even big data analytics needs, instead of painfully setting up my own Spark cluster. I will be using the Databricks Platform for my Spark needs. Databricks is a company founded by the creators of Apache Spark that aims to help clients with cloud-based big data processing using Spark.

Read more

Also: Scaling relational databases with Apache Spark SQL and DataFrames

More in Tux Machines

Network Security Toolkit 30-11210

We are pleased to announce the latest NST release: "NST 30 SVN:11210". This release is based on Fedora 30 using Linux Kernel: "kernel-5.1.17-300.fc30.x86_64". This release brings the NST distribution on par with Fedora 30. Read more

Univention Corporate Server 4.4-1/Point Release UCS 4.4-1: performance improvements, app recommendations and UDM REST API Beta

There are significant performance improvements for managing the contents of the directory service via UDM, especially for application scenarios with complex structures. There have also been further minor improvements in DNS management, where the search for IP addresses is now enabled in further modules, as well as in the use of standard containers of domain controller objects. A brand new feature is the REST API for UDM, which considerably facilitates the integration of UDM with other applications. This REST API has been released as beta version for the time being. After further tests and improvements we plan to release a stable version in autumn. Read more

Proxmox VE 6.0 released!

We're excited to announce the final release of our Proxmox VE 6.0! It's based on the great Debian 10 codename "Buster" and the latest 5.0 Linux kernel, QEMU 4.0, LXC 3.1.0, ZFS 0.8.1, Ceph 14.2, Corosync 3.0, and more. This major release includes the latest Ceph Nautilus feautures and an improved Ceph management dashboard. We have updated the cluster communication stack to Corosync 3 using Kronosnet, and have a new selection widget for the network making it simple to select the correct link address in the cluster creation wizard. With ZFS 0.8.1 we have included TRIM support for SSDs and also support for native encryption with comfortable key-handling. Read more

today's howtos