Language Selection

English French German Italian Portuguese Spanish

Databricks brings its Delta Lake project to the Linux Foundation

Filed under
Linux

Databricks, the big data analytics service founded by the original developers of Apache Spark, today announced that it is bringing its Delta Lake open-source project for building data lakes to the Linux Foundation and under an open governance model. The company announced the launch of Delta Lake earlier this year and even though it’s still a relatively new project, it has already been adopted by many organizations and has found backing from companies like Intel, Alibaba and Booz Allen Hamilton.

“In 2013, we had a small project where we added SQL to Spark at Databricks […] and donated it to the Apache Foundation,” Databricks CEO and co-founder Ali Ghodsi told me. “Over the years, slowly people have changed how they actually leverage Spark and only in the last year or so it really started to dawn upon us that there’s a new pattern that’s emerging and Spark is being used in a completely different way than maybe we had planned initially.”

This pattern, he said, is that companies are taking all of their data and putting it into data lakes and then do a couple of things with this data, machine learning and data science being the obvious ones. But they are also doing things that are more traditionally associated with data warehouses, like business intelligence and reporting. The term Ghodsi uses for this kind of usage is ‘Lake House.’ More and more, Databricks is seeing that Spark is being used for this purpose and not just to replace Hadoop and doing ETL (extract, transform, load). “This kind of Lake House patterns we’ve seen emerge more and more and we wanted to double down on it.”

Read more

The LF's press release

  • The Delta Lake Project Turns to Linux Foundation to Become the Open Standard for Data Lakes

    Amsterdam and San Francisco, October 16, 2019 – The Linux Foundation, the nonprofit organization enabling mass innovation through open source, today announced that it will host Delta Lake, a project focusing on improving the reliability, quality and performance of data lakes. Delta Lake, announced by Databricks earlier this year, has been adopted by thousands of organizations and has a thriving ecosystem of supporters, including Intel, Alibaba and Booz Allen Hamilton. To further drive adoption and contributions, Delta Lake will become a Linux Foundation project and use an open governance model.

    Every organization aspires to get more value from data through data science, machine learning and analytics, but they are massively hindered by the lack of data reliability within data lakes. Delta Lake addresses data reliability challenges by making transactions ACID compliant enabling concurrent reads and writes. Its schema enforcement capability helps to ensure that the data lake is free of corrupt and not-conformant data. Since its launch in October 2017, Delta Lake has been adopted by over 4,000 organizations and processes over two exabytes of data each month.

Delta Lake finds new home at Linux Foundation

  • Delta Lake finds new home at Linux Foundation

    Databricks used the currently happening Spark + AI Summit Europe to announce a change in the governance of Delta Lake.

    The storage layer was introduced to the public in April 2019 and is now in the process of moving to the Linux Foundation, which also fosters software projects such as the Linux kernel and Kubernetes.

    The new home is meant to drive the adoption of Delta Lake and establish it as a standard for managing big data. Databricks’ cofounder Ali Ghodsi commented the move in a canned statement. “To address organizations’ data challenges we want to ensure this project is open source in the truest form. Through the strength of the Linux Foundation community and contributions, we’re confident that Delta Lake will quickly become the standard for data storage in data lakes.”

Open source Delta Lake project moves to the Linux Foundation

  • Open source Delta Lake project moves to the Linux Foundation

    Databricks Inc.’s Delta Lake today became the latest open-source software project to fall under the banner of the Linux Foundation.

    Delta Lake has rapidly gained momentum since it was open-sourced by Databricks in April, and is already being used by thousands of organizations, including important backers such as Alibaba Group Holding Ltd., Booz Allen Hamilton Corp. and Intel Corp., its founders say. The project was conceived as a way of improving the reliability of so-called “data lakes,” which are systems or repositories of data stored in its natural format, usually in object “blobs” or files.

    Data lakes are popularly used by large enterprises as they provide a reliable way of ensuring that data can be accessed by anyone within an organization. They can be used to store any kind of data, including both structured and unstructured information in its native format, and also support analysis of data that helps provide real-time insights on business matters.

Databricks contributes Delta Lake to the Linux Foundation

  • Databricks contributes Delta Lake to the Linux Foundation

    The Databricks-led open source Delta Lake project is getting a new home and a new governance model at the Linux Foundation.

    In April, the San Francisco-based data science and analytics vendor open sourced the Delta Lake project, in an attempt to create an open community around its data lake technology. After months of usage and feedback from a community of users, Databricks decided that a more open model for development, contribution and governance was needed and the Linux Foundation was the right place for that.

Databricks’ Delta Lake Moves To Linux Foundation

Unifying cloud storage and data warehouses: Delta Lake project..

  • Unifying cloud storage and data warehouses: Delta Lake project hosted by the Linux Foundation

    Going cloud for your storage needs comes with some baggage. On the one hand, it's cheap, elastic, and convenient - it just works. On the other hand, it's messy, especially if you are used to working with data management systems like databases and data warehouses.

    Unlike those systems, cloud storage was not designed with things such as transactional support or metadata in mind. If you work with data at scale, these are pretty important features. This is why Databricks introduced Delta Lake to add those features on top of cloud storage back in 2017.

SDxCentral coverage

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

More in Tux Machines

Android Leftovers

pip 20.3 release

On behalf of the Python Packaging Authority, I am pleased to announce that we have just released pip 20.3, a new version of pip. You can install it by running `python -m pip install --upgrade pip`. This is an important and disruptive release -- we explained why in a blog post last year Read more

Western Digital WD_BLACK SN850 NVMe PCIe 4.0 SSD Linux Performance

This month Western Digital introduced the WD_BLACK SN850 as the latest PCI Express 4.0 solid-state drive hitting the market. The WD_BLACK SN850 is a surprisingly strong performer if looking to upgrade to PCIe 4.0 solid-state storage, competing with the fastest of the consumer drives currently available. The WD_BLACK SN850 makes use of Western Digital's G2 controller and 96L TLC NAND flash memory. The 1TB drive being tested today is rated for 7,000 MB/s sequential reads and 5,300 MB/s sequential writes and 1 million IOPS for random reads and 720k IOPS for random writes. Read more

GNU Octave 6.1 Released with Improvements / New Functions

GNU Octave 6.1 was released a few days ago with numerous improvements, bug-fixes, and a list of new functions. Changes in Octave 6.1 include... There’s no PPA repository contains the new release package at the moment of writing. Before the official Snap package and the community maintained Flatpak package publish the new package, you can download & build GNU Octave from the source tarball... Read more