Language Selection

English French German Italian Portuguese Spanish

Databricks brings its Delta Lake project to the Linux Foundation

Filed under
Linux

Databricks, the big data analytics service founded by the original developers of Apache Spark, today announced that it is bringing its Delta Lake open-source project for building data lakes to the Linux Foundation and under an open governance model. The company announced the launch of Delta Lake earlier this year and even though it’s still a relatively new project, it has already been adopted by many organizations and has found backing from companies like Intel, Alibaba and Booz Allen Hamilton.

“In 2013, we had a small project where we added SQL to Spark at Databricks […] and donated it to the Apache Foundation,” Databricks CEO and co-founder Ali Ghodsi told me. “Over the years, slowly people have changed how they actually leverage Spark and only in the last year or so it really started to dawn upon us that there’s a new pattern that’s emerging and Spark is being used in a completely different way than maybe we had planned initially.”

This pattern, he said, is that companies are taking all of their data and putting it into data lakes and then do a couple of things with this data, machine learning and data science being the obvious ones. But they are also doing things that are more traditionally associated with data warehouses, like business intelligence and reporting. The term Ghodsi uses for this kind of usage is ‘Lake House.’ More and more, Databricks is seeing that Spark is being used for this purpose and not just to replace Hadoop and doing ETL (extract, transform, load). “This kind of Lake House patterns we’ve seen emerge more and more and we wanted to double down on it.”

Read more

The LF's press release

  • The Delta Lake Project Turns to Linux Foundation to Become the Open Standard for Data Lakes

    Amsterdam and San Francisco, October 16, 2019 – The Linux Foundation, the nonprofit organization enabling mass innovation through open source, today announced that it will host Delta Lake, a project focusing on improving the reliability, quality and performance of data lakes. Delta Lake, announced by Databricks earlier this year, has been adopted by thousands of organizations and has a thriving ecosystem of supporters, including Intel, Alibaba and Booz Allen Hamilton. To further drive adoption and contributions, Delta Lake will become a Linux Foundation project and use an open governance model.

    Every organization aspires to get more value from data through data science, machine learning and analytics, but they are massively hindered by the lack of data reliability within data lakes. Delta Lake addresses data reliability challenges by making transactions ACID compliant enabling concurrent reads and writes. Its schema enforcement capability helps to ensure that the data lake is free of corrupt and not-conformant data. Since its launch in October 2017, Delta Lake has been adopted by over 4,000 organizations and processes over two exabytes of data each month.

Delta Lake finds new home at Linux Foundation

  • Delta Lake finds new home at Linux Foundation

    Databricks used the currently happening Spark + AI Summit Europe to announce a change in the governance of Delta Lake.

    The storage layer was introduced to the public in April 2019 and is now in the process of moving to the Linux Foundation, which also fosters software projects such as the Linux kernel and Kubernetes.

    The new home is meant to drive the adoption of Delta Lake and establish it as a standard for managing big data. Databricks’ cofounder Ali Ghodsi commented the move in a canned statement. “To address organizations’ data challenges we want to ensure this project is open source in the truest form. Through the strength of the Linux Foundation community and contributions, we’re confident that Delta Lake will quickly become the standard for data storage in data lakes.”

Open source Delta Lake project moves to the Linux Foundation

  • Open source Delta Lake project moves to the Linux Foundation

    Databricks Inc.’s Delta Lake today became the latest open-source software project to fall under the banner of the Linux Foundation.

    Delta Lake has rapidly gained momentum since it was open-sourced by Databricks in April, and is already being used by thousands of organizations, including important backers such as Alibaba Group Holding Ltd., Booz Allen Hamilton Corp. and Intel Corp., its founders say. The project was conceived as a way of improving the reliability of so-called “data lakes,” which are systems or repositories of data stored in its natural format, usually in object “blobs” or files.

    Data lakes are popularly used by large enterprises as they provide a reliable way of ensuring that data can be accessed by anyone within an organization. They can be used to store any kind of data, including both structured and unstructured information in its native format, and also support analysis of data that helps provide real-time insights on business matters.

Databricks contributes Delta Lake to the Linux Foundation

  • Databricks contributes Delta Lake to the Linux Foundation

    The Databricks-led open source Delta Lake project is getting a new home and a new governance model at the Linux Foundation.

    In April, the San Francisco-based data science and analytics vendor open sourced the Delta Lake project, in an attempt to create an open community around its data lake technology. After months of usage and feedback from a community of users, Databricks decided that a more open model for development, contribution and governance was needed and the Linux Foundation was the right place for that.

Databricks’ Delta Lake Moves To Linux Foundation

Unifying cloud storage and data warehouses: Delta Lake project..

  • Unifying cloud storage and data warehouses: Delta Lake project hosted by the Linux Foundation

    Going cloud for your storage needs comes with some baggage. On the one hand, it's cheap, elastic, and convenient - it just works. On the other hand, it's messy, especially if you are used to working with data management systems like databases and data warehouses.

    Unlike those systems, cloud storage was not designed with things such as transactional support or metadata in mind. If you work with data at scale, these are pretty important features. This is why Databricks introduced Delta Lake to add those features on top of cloud storage back in 2017.

SDxCentral coverage

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

More in Tux Machines

digiKam 7.7.0 is released

After three months of active maintenance and another bug triage, the digiKam team is proud to present version 7.7.0 of its open source digital photo manager. See below the list of most important features coming with this release. Read more

Dilution and Misuse of the "Linux" Brand

Samsung, Red Hat to Work on Linux Drivers for Future Tech

The metaverse is expected to uproot system design as we know it, and Samsung is one of many hardware vendors re-imagining data center infrastructure in preparation for a parallel 3D world. Samsung is working on new memory technologies that provide faster bandwidth inside hardware for data to travel between CPUs, storage and other computing resources. The company also announced it was partnering with Red Hat to ensure these technologies have Linux compatibility. Read more

today's howtos

  • How to install go1.19beta on Ubuntu 22.04 – NextGenTips

    In this tutorial, we are going to explore how to install go on Ubuntu 22.04 Golang is an open-source programming language that is easy to learn and use. It is built-in concurrency and has a robust standard library. It is reliable, builds fast, and efficient software that scales fast. Its concurrency mechanisms make it easy to write programs that get the most out of multicore and networked machines, while its novel-type systems enable flexible and modular program constructions. Go compiles quickly to machine code and has the convenience of garbage collection and the power of run-time reflection. In this guide, we are going to learn how to install golang 1.19beta on Ubuntu 22.04. Go 1.19beta1 is not yet released. There is so much work in progress with all the documentation.

  • molecule test: failed to connect to bus in systemd container - openQA bites

    Ansible Molecule is a project to help you test your ansible roles. I’m using molecule for automatically testing the ansible roles of geekoops.

  • How To Install MongoDB on AlmaLinux 9 - idroot

    In this tutorial, we will show you how to install MongoDB on AlmaLinux 9. For those of you who didn’t know, MongoDB is a high-performance, highly scalable document-oriented NoSQL database. Unlike in SQL databases where data is stored in rows and columns inside tables, in MongoDB, data is structured in JSON-like format inside records which are referred to as documents. The open-source attribute of MongoDB as a database software makes it an ideal candidate for almost any database-related project. This article assumes you have at least basic knowledge of Linux, know how to use the shell, and most importantly, you host your site on your own VPS. The installation is quite simple and assumes you are running in the root account, if not you may need to add ‘sudo‘ to the commands to get root privileges. I will show you the step-by-step installation of the MongoDB NoSQL database on AlmaLinux 9. You can follow the same instructions for CentOS and Rocky Linux.

  • An introduction (and how-to) to Plugin Loader for the Steam Deck. - Invidious
  • Self-host a Ghost Blog With Traefik

    Ghost is a very popular open-source content management system. Started as an alternative to WordPress and it went on to become an alternative to Substack by focusing on membership and newsletter. The creators of Ghost offer managed Pro hosting but it may not fit everyone's budget. Alternatively, you can self-host it on your own cloud servers. On Linux handbook, we already have a guide on deploying Ghost with Docker in a reverse proxy setup. Instead of Ngnix reverse proxy, you can also use another software called Traefik with Docker. It is a popular open-source cloud-native application proxy, API Gateway, Edge-router, and more. I use Traefik to secure my websites using an SSL certificate obtained from Let's Encrypt. Once deployed, Traefik can automatically manage your certificates and their renewals. In this tutorial, I'll share the necessary steps for deploying a Ghost blog with Docker and Traefik.