Agenda

Day 1 | Day 2

Wednesday, April 3

9:00AM – 9:10AMWelcome Remarks &
Commitment to Open Source for Meta Compute
Ali LeClerc, Community Chair at IBM
Amit Purohit, Director at Meta
9:10AM – 9:55AMVelox and Composable Data ManagementPedro Pedreira, Velox Lead &
Software Engineer at Meta; Manos Karpathiotakis and Deblina Gupta, Software Engineers at Meta
10:00AM – 10:45AMPrestissimo Batch Efficiency at MetaAmit Dutta, Software Engineer at Meta
10:45AM – 11:15AMBreak
11:15AM – 11:30AMVelox at IBMRemus Lazar, VP Software Development, Data & AI at IBM
11:30AM – 12:00PMPrestissimo at IBMAditi Pandit, Software Engineer at IBM
12:00PM – 12:15PMParquet & Iceberg 2.0 SupportYing Su, Software Engineer at IBM
12:15PM – 1:30PMLunch
1:30PM – 2:00PMWhat’s new in Velox? Overview of Optimizations, Features and Reliability.Jimmy Lu, Software Engineer at Meta
2:00PM – 2:30PMAn update on the Apache Gluten project (incubator) and its use of VeloxBinwei Yang, Founder and Technical Lead of the Gluten project at Intel
2:30PM – 2:45PMUnlocking Data Query Performance @ Pinterest: Integrating Spark SQL with Gluten and VeloxZaheen Aziz, Software Engineer at Pinterest
2:45PM – 3:00PMAccelerating Spark at Microsoft using Gluten & VeloxZhen Li & Swinky Mann, Software Engineers at Microsoft
3:00PM – 3:30PMBreak
3:30PM – 4:00PMVelox Memory ManagementXiaoxuan Meng, Software Engineer at Meta
4:00PM – 4:15PMSimple Aggregation Function InterfaceWei He, Software Engineer at Meta
4:30PM – 6:30PMConference Reception

Session Details

Commitment to Open Source for Meta Compute

Amit Purohit
Director at Meta

Velox and Composable Data Management

In this talk Pedro will discuss the concept of composability in data management, which brought Velox, and some other recent developments including Velox<->Arrow alignment. He’ll also discuss Velox’s current usage inside Meta (going beyond traditional SQL analytics) and will be joined by guest speakers Manos Karpathiotakis and Deblina Gupta from the Scribe and ODS teams at Meta.

Pedro Pedreira
Velox Lead & Software Engineer at Meta

Manos Karpathiotakis
Software Engineer at Meta

Deblina Gupta
Software Engineer at Meta

Prestissimo Batch Efficiency at Meta

Amit Dutta
Software Engineer at Meta

Velox at IBM

Learn more about IBM’s work and vision for Velox, including key contributions and focus areas. Remus will cover the work done over the last year and what’s ahead for Velox at IBM.

Remus Lazar
VP Software Development, Data & AI at IBM

Prestissimo at IBM

In this talk we will give an overview of all Prestissimo related activity at IBM since last VeloxCon. This includes : i) Feature enhancements for Prestissimo tech preview on IBM watsonx.data. ii) TPC-DS updates. iii) Presto 2.0 plans. iv) Connector SPI

Aditi Pandit
Software Engineer at IBM

Parquet and Iceberg 2.0 Support

Ying Su
Software Engineer at IBM

What’s new in Velox? Overview of Optimizations, Features and Reliability

Jimmy Lu
Software Engineer at Meta

An update on the Apache Gluten project (incubator) and its use of Velox

This talk will provide a technical overview of the project. An emphasis will be on experiences working with customers from across the globe on enabling them to get their Spark workloads up and running with Gluten and Velox. The talk will also cover Gluten’s recent acceptance as an Apache incubator project. The talk will close with some details on what’s next.

Binwei Yang
Founder and Technical Lead of the Gluten project at Intel

Unlocking Data Query Performance @ Pinterest: Integrating Spark SQL with Gluten and Velox

In this talk we will delve into the technical design of integrating Spark SQL with Gluten and Velox at Pinterest. We will explore the background, motivation and goals behind this project, as well as the high-level and detailed design considerations. From adhoc query flow to production query flow, we will outline the implementation, challenges, and solutions that we took to seamlessly integrate Gluten and Velox. Additionally, we will discuss the rollout plan, considerations for security, privacy, cost, and production readiness. Join us to discover how Gluten and Velox is transforming data query performance at Pinterest.

Zaheen Aziz
Software Engineer at Pinterest

Accelerating Spark at Microsoft using Gluten & Velox

Microsoft Fabric emerges as a cornerstone big data solution, proficient in executing Spark workloads. In our quest to enhance Spark performance, we’ve made substantial investments in query optimization and execution to cater to our customers’ needs. Amidst exploring avenues for faster query execution engines, we delved into existing solutions such as Weld. In this presentation, we aim to elucidate our decision of adopting Velox and Gluten stack as our native query execution engine for Spark. We’ll delve into the intricacies of integrating it seamlessly within the Azure Fabric ecosystem, including features like ABFS support and integration with read cache. Our efforts have yielded remarkable results, with performance gains reaching up to 2x faster TPCDS benchmarks. The gains are not limited to just industry benchmarks rather are evident from customer testing done with internal customers as well. Join us as we share insights, lessons learned, and the transformative impact of leveraging Velox and Gluten stack within the Microsoft Fabric environment.

Zhen Li
Software Engineer at Microsoft

Swinky Mann
Software Engineer at Microsoft

Velox Memory Management

Velox memory system is designed for safely running highly variable query workloads within a fixed memory resource. It provides the query execution with all the required memory allocation functions and optimizes both physical memory allocation and query memory allocation patterns. It provides fair-memory sharing among queries by memory arbitration and disk spilling techniques. It provides the total memory capacity enforcement by managing the physical memory on its own.

Xiaoxuan Meng
Software Engineer at Meta

Simple Aggregation Function Interface

An introduction about the new simple function interface for user-defined aggregation functions (UDAFs). This interface allows UDAF authors to write less and row-based code when implementing a UDAF, with minimal to zero performance degradation, compared to the existing vector-based interface.

Wei He
Software Engineer at Meta