Language Selection

English French German Italian Portuguese Spanish

Databricks brings its Delta Lake project to the Linux Foundation

Filed under
Linux

Databricks, the big data analytics service founded by the original developers of Apache Spark, today announced that it is bringing its Delta Lake open-source project for building data lakes to the Linux Foundation and under an open governance model. The company announced the launch of Delta Lake earlier this year and even though it’s still a relatively new project, it has already been adopted by many organizations and has found backing from companies like Intel, Alibaba and Booz Allen Hamilton.

“In 2013, we had a small project where we added SQL to Spark at Databricks […] and donated it to the Apache Foundation,” Databricks CEO and co-founder Ali Ghodsi told me. “Over the years, slowly people have changed how they actually leverage Spark and only in the last year or so it really started to dawn upon us that there’s a new pattern that’s emerging and Spark is being used in a completely different way than maybe we had planned initially.”

This pattern, he said, is that companies are taking all of their data and putting it into data lakes and then do a couple of things with this data, machine learning and data science being the obvious ones. But they are also doing things that are more traditionally associated with data warehouses, like business intelligence and reporting. The term Ghodsi uses for this kind of usage is ‘Lake House.’ More and more, Databricks is seeing that Spark is being used for this purpose and not just to replace Hadoop and doing ETL (extract, transform, load). “This kind of Lake House patterns we’ve seen emerge more and more and we wanted to double down on it.”

Read more

The LF's press release

  • The Delta Lake Project Turns to Linux Foundation to Become the Open Standard for Data Lakes

    Amsterdam and San Francisco, October 16, 2019 – The Linux Foundation, the nonprofit organization enabling mass innovation through open source, today announced that it will host Delta Lake, a project focusing on improving the reliability, quality and performance of data lakes. Delta Lake, announced by Databricks earlier this year, has been adopted by thousands of organizations and has a thriving ecosystem of supporters, including Intel, Alibaba and Booz Allen Hamilton. To further drive adoption and contributions, Delta Lake will become a Linux Foundation project and use an open governance model.

    Every organization aspires to get more value from data through data science, machine learning and analytics, but they are massively hindered by the lack of data reliability within data lakes. Delta Lake addresses data reliability challenges by making transactions ACID compliant enabling concurrent reads and writes. Its schema enforcement capability helps to ensure that the data lake is free of corrupt and not-conformant data. Since its launch in October 2017, Delta Lake has been adopted by over 4,000 organizations and processes over two exabytes of data each month.

Delta Lake finds new home at Linux Foundation

  • Delta Lake finds new home at Linux Foundation

    Databricks used the currently happening Spark + AI Summit Europe to announce a change in the governance of Delta Lake.

    The storage layer was introduced to the public in April 2019 and is now in the process of moving to the Linux Foundation, which also fosters software projects such as the Linux kernel and Kubernetes.

    The new home is meant to drive the adoption of Delta Lake and establish it as a standard for managing big data. Databricks’ cofounder Ali Ghodsi commented the move in a canned statement. “To address organizations’ data challenges we want to ensure this project is open source in the truest form. Through the strength of the Linux Foundation community and contributions, we’re confident that Delta Lake will quickly become the standard for data storage in data lakes.”

Open source Delta Lake project moves to the Linux Foundation

  • Open source Delta Lake project moves to the Linux Foundation

    Databricks Inc.’s Delta Lake today became the latest open-source software project to fall under the banner of the Linux Foundation.

    Delta Lake has rapidly gained momentum since it was open-sourced by Databricks in April, and is already being used by thousands of organizations, including important backers such as Alibaba Group Holding Ltd., Booz Allen Hamilton Corp. and Intel Corp., its founders say. The project was conceived as a way of improving the reliability of so-called “data lakes,” which are systems or repositories of data stored in its natural format, usually in object “blobs” or files.

    Data lakes are popularly used by large enterprises as they provide a reliable way of ensuring that data can be accessed by anyone within an organization. They can be used to store any kind of data, including both structured and unstructured information in its native format, and also support analysis of data that helps provide real-time insights on business matters.

Databricks contributes Delta Lake to the Linux Foundation

  • Databricks contributes Delta Lake to the Linux Foundation

    The Databricks-led open source Delta Lake project is getting a new home and a new governance model at the Linux Foundation.

    In April, the San Francisco-based data science and analytics vendor open sourced the Delta Lake project, in an attempt to create an open community around its data lake technology. After months of usage and feedback from a community of users, Databricks decided that a more open model for development, contribution and governance was needed and the Linux Foundation was the right place for that.

Databricks’ Delta Lake Moves To Linux Foundation

Unifying cloud storage and data warehouses: Delta Lake project..

  • Unifying cloud storage and data warehouses: Delta Lake project hosted by the Linux Foundation

    Going cloud for your storage needs comes with some baggage. On the one hand, it's cheap, elastic, and convenient - it just works. On the other hand, it's messy, especially if you are used to working with data management systems like databases and data warehouses.

    Unlike those systems, cloud storage was not designed with things such as transactional support or metadata in mind. If you work with data at scale, these are pretty important features. This is why Databricks introduced Delta Lake to add those features on top of cloud storage back in 2017.

SDxCentral coverage

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

More in Tux Machines

8 great podcasts for open source enthusiasts

Where I live, almost everything is a 20- or 30-minute drive from my home, and I'm always looking for ways to use my car time productively. One way is by listening to podcasts on topics that interest me, so as an open source enthusiast, I subscribe to a variety of open source-related podcasts. Here are eight Linux and open source podcasts that I Iook forward to every week. Read more

Leftovers: Certifications, KDE, Ubuntu and Security

  • Top 5 options for Linux certifications

    Linux certifications present an interesting mix of distribution- and brand-agnostic credentials, as well as vendor-specific ones. Many of these offerings provide data center professionals with defined pathways to learn, use and master Linux OS management, features and potential Linux use cases. Other programs are more ad hoc and specific to certain IT roles, such as systems engineers or IT administrators, but they go beyond self-taught curriculums and forums. Each program includes coursework and an exam. Depending on the certification, admins can buy everything as a bundle or pay separately for study materials and exams.

  • SimpleMailQt v2.0.0-beta1

    On my last post I talked about the new async simplemail-qt API that I wanted to add, yesterday I finished the work required to have that. SMTP (Simple Mail Transfer Protocol) as the name says it’s a very simple but strict protocol, you send a command and MUST wait for the reply, which is rather inefficient but it’s the way it is, having it async means I don’t need an extra thread (+some locking) just to send an email, no more GUI freezes or an HTTP server that is stalled. The new Server class has a state machine that knows what reply we are waiting, and which status code is the successful one. Modern SMTP servers have PIPELING support, but it’s rather different from HTTP PIPELING, because you still have to wait for several commands before you send another command, in fact it only allows you to send the FROM the RECIPIENTS email list and DATA commands at once, parse each reply and then send the mail data, if you send the email data before you are allowed by the DATA command the server will just close the connection.

  • Plasma 5 for Slackware – November ktown release

    Dear all, today I released KDE-5_19.11 and it comes with some upgrades to official Slackware packages. Don’t worry – Pat Volkerding kindly added the shared libraries of the official Slackware packages to aaa_elflibs, so if you have been updating your Slackware-current installation properly then nothing will break when you update Slackware’s exiv2 and LibRaw packages to the newer versions contained in the November release of ‘ktown‘. Official Slackware package updates for exiv2 and LibRaw will come sometime soon, but it will require Pat to recompile several other packages as well that depend on exiv2 and/or LibRaw. I needed the new exiv2 to compile the latest digikam, so I was pleased with Pat’s cooperation to make this a smooth ‘ktown‘ upgrade for you.

  • Ubuntu Weekly Newsletter Issue 604
  • Ubuntu-ready Apollo Lake mini-PC features Myriad X AI accelerator

    IEI’s rugged, “ITG-100AI” DIN-rail PC runs on an Apollo Lake SoC and a new “Mustang-MPCIE-MX2” mini-PCIe card with dual Myriad X VPUs. The system ships with 8GB RAM and a 128GB SATA SSD plus GbE, serial, USB, and M.2. IEI has launched a compact, Intel Apollo Lake based “ITG-100AI” computer for industrial AI that showcases its Mustang-MPCIE-MX2 AI acceleration card. The fanless, 137 x 102.8 x 49.4mm ITG-100AI supports DIN-rail or desktop mounting and offers a 0 to 50°C range with airflow, as well as 5G shock resistance compliant with IEC68-2-27 and vibration resistance per MIL-STD-810G 514.6C-1.

  • Vulnerability Values Fluctuate Between White, Grey and Black Hats

    A black hat selling vulnerabilities can make as much money as a white hat researcher using bug bounty programs, or a grey hat working for a nation state doing reverse engineering. Speaking at a Tenable conference in London last week, director of research Oliver Rochford said that to have people do vulnerability research is expensive, and all of the white, black and grey markets are symbiotic, as despite the difference between being legal and illegal, the different factors “mirror each other as it starts with vulnerability discovery.” Rochford said that this “shows how professional cybercrime has become,” pointing to the fact that the main difference between criminal and legal sides are ethics. In one slide, Rochford pointed out vulnerability discovery, exploit research and development are the same for both offense and defensive sides, while the differences fall at the "operationalization" side, where offensive sides look at espionage, sabotage and fraud, while defense sides look at threat intelligence and compensating control adaptation. In his research, Rochford showed that in some cases you can earn more as a white hat vulnerability manager than as a black hat, with a black hat able to earn around $75,000 in this sort of work. Rochford said this “is achievable and attractive” and while it was more lucrative to do it legally, if it is not “it is a way to make a living.”

  • Name That Toon: Endpoint Protection

Slow Connections Discriminated Against: Google Stadia and Google Chrome

  • Google reveal Stadia will only have 12 games available at launch, more later in the year

    With the Stadia streaming service from Google launching on November 19th for those with the Founder's Edition or Premiere Edition, they're finally revealing what will be available. It will only have 12, yes 12, titles at launch and a few of them are sequels. They are: Assassin's Creed Odyssey, Destiny 2, GYLT, Just Dance 2020, Kine, Mortal Kombat 11, Red Dead Redemption 2, Thumper, Tomb Raider + Rise + Shadow and lastly Samurai Showdown. The only title you will get included in the Stadia Pro subscription (three months free with the Founder/Premier Edition) is Destiny 2, all others you have to pay for. If you stop paying for Stadia Pro, you lose access to any free games claimed and only keep those you've paid for normally.

  • Google Chrome To Begin Marking Sites That Are Slow / Fast

    Chrome has successfully shamed web-sites not supporting HTTPS and now they are looking to call-out websites that do not typically load fast. Google announced today that they will begin marking websites that are often either loading slow or fast. Chrome developers are experimenting with ways to show whether a website typically loads fast or slow so the user is aware even before they navigate to a given web page or web app. The changes will be rolled out in future Chrome updates.

Shows and Screencasts: Linux Headlines, Frank Karlitschek, Linux Action News and OpenIndiana 2019.10 Run Through

  • 2019-11-11 | Linux Headlines 46

    Steam gets support for Linux namespaces, some distributions are struggling with the shift from Python 2, Arch Linux supports reproducible builds, and GNOME has a new app in beta.

  • Will Europe Succeed At Democratizing The Cloud?

    Europe (led by Germany and France) is contemplating Gaia-X, its own cloud infrastructure to create interoperability among clouds and also allow local companies to compete in the cloud market dominated by US companies like AWS, Microsoft and Google. It’s an ambitious effort, but will it work? We sat down with Frank Karlitschek, founder of Nextcloud to discuss.

  • Linux Action News 131

    Google steps up support for older Chromebooks, Microsoft Edge is coming to Linux, and the App Defense Alliance teams up to fight Android malware. Plus Google Cardboard goes open source, and a neat machine-learning tool to pull songs apart.

  • OpenIndiana 2019.10 Run Through

    In this video, we are looking at OpenIndiana 2019.10. Enjoy!