Day 1 | Day 2
Wednesday, April 3
9:00AM – 9:10AM | Welcome Remarks & Commitment to Open Source for Meta Compute | Ali LeClerc, Community Chair at IBM Amit Purohit, Director at Meta |
9:10AM – 9:55AM | Velox and Composable Data Management | Pedro Pedreira, Velox Lead & Software Engineer at Meta; Manos Karpathiotakis and Deblina Gupta, Software Engineers at Meta |
10:00AM – 10:45AM | Prestissimo Batch Efficiency at Meta | Amit Dutta, Software Engineer at Meta |
10:45AM – 11:15AM | Break | |
11:15AM – 11:30AM | Velox at IBM | Remus Lazar, VP Software Development, Data & AI at IBM |
11:30AM – 12:00PM | Prestissimo at IBM | Aditi Pandit, Software Engineer at IBM |
12:00PM – 12:15PM | Parquet & Iceberg 2.0 Support | Ying Su, Software Engineer at IBM |
12:15PM – 1:30PM | Lunch | |
1:30PM – 2:00PM | What’s new in Velox? Overview of Optimizations, Features and Reliability. | Jimmy Lu, Software Engineer at Meta |
2:00PM – 2:30PM | An update on the Apache Gluten project (incubator) and its use of Velox | Binwei Yang, Founder and Technical Lead of the Gluten project at Intel |
2:30PM – 2:45PM | Unlocking Data Query Performance @ Pinterest: Integrating Spark SQL with Gluten and Velox | Zaheen Aziz, Software Engineer at Pinterest |
2:45PM – 3:00PM | Accelerating Spark at Microsoft using Gluten & Velox | Zhen Li & Swinky Mann, Software Engineers at Microsoft |
3:00PM – 3:30PM | Break | |
3:30PM – 4:00PM | Velox Memory Management | Xiaoxuan Meng, Software Engineer at Meta |
4:00PM – 4:15PM | Simple Aggregation Function Interface | Wei He, Software Engineer at Meta |
4:30PM – 6:30PM | Conference Reception | |
Session Details
Commitment to Open Source for Meta Compute
Amit Purohit
Director at Meta
Velox and Composable Data Management
In this talk Pedro will discuss the concept of composability in data management, which brought Velox, and some other recent developments including Velox<->Arrow alignment. He’ll also discuss Velox’s current usage inside Meta (going beyond traditional SQL analytics) and will be joined by guest speakers Manos Karpathiotakis and Deblina Gupta from the Scribe and ODS teams at Meta.
Pedro Pedreira
Velox Lead & Software Engineer at Meta
Manos Karpathiotakis
Software Engineer at Meta
Deblina Gupta
Software Engineer at Meta
Prestissimo Batch Efficiency at Meta
Amit Dutta
Software Engineer at Meta
Velox at IBM
Learn more about IBM’s work and vision for Velox, including key contributions and focus areas. Remus will cover the work done over the last year and what’s ahead for Velox at IBM.
Remus Lazar
VP Software Development, Data & AI at IBM
Prestissimo at IBM
In this talk we will give an overview of all Prestissimo related activity at IBM since last VeloxCon. This includes : i) Feature enhancements for Prestissimo tech preview on IBM watsonx.data. ii) TPC-DS updates. iii) Presto 2.0 plans. iv) Connector SPI
Aditi Pandit
Software Engineer at IBM
Parquet and Iceberg 2.0 Support
Ying Su
Software Engineer at IBM
What’s new in Velox? Overview of Optimizations, Features and Reliability
Jimmy Lu
Software Engineer at Meta
An update on the Apache Gluten project (incubator) and its use of Velox
This talk will provide a technical overview of the project. An emphasis will be on experiences working with customers from across the globe on enabling them to get their Spark workloads up and running with Gluten and Velox. The talk will also cover Gluten’s recent acceptance as an Apache incubator project. The talk will close with some details on what’s next.
Binwei Yang
Founder and Technical Lead of the Gluten project at Intel
Unlocking Data Query Performance @ Pinterest: Integrating Spark SQL with Gluten and Velox
In this talk we will delve into the technical design of integrating Spark SQL with Gluten and Velox at Pinterest. We will explore the background, motivation and goals behind this project, as well as the high-level and detailed design considerations. From adhoc query flow to production query flow, we will outline the implementation, challenges, and solutions that we took to seamlessly integrate Gluten and Velox. Additionally, we will discuss the rollout plan, considerations for security, privacy, cost, and production readiness. Join us to discover how Gluten and Velox is transforming data query performance at Pinterest.
Zaheen Aziz
Software Engineer at Pinterest
Accelerating Spark at Microsoft using Gluten & Velox
Microsoft Fabric emerges as a cornerstone big data solution, proficient in executing Spark workloads. In our quest to enhance Spark performance, we’ve made substantial investments in query optimization and execution to cater to our customers’ needs. Amidst exploring avenues for faster query execution engines, we delved into existing solutions such as Weld. In this presentation, we aim to elucidate our decision of adopting Velox and Gluten stack as our native query execution engine for Spark. We’ll delve into the intricacies of integrating it seamlessly within the Azure Fabric ecosystem, including features like ABFS support and integration with read cache. Our efforts have yielded remarkable results, with performance gains reaching up to 2x faster TPCDS benchmarks. The gains are not limited to just industry benchmarks rather are evident from customer testing done with internal customers as well. Join us as we share insights, lessons learned, and the transformative impact of leveraging Velox and Gluten stack within the Microsoft Fabric environment.
Zhen Li
Software Engineer at Microsoft
Swinky Mann
Software Engineer at Microsoft
Velox Memory Management
Velox memory system is designed for safely running highly variable query workloads within a fixed memory resource. It provides the query execution with all the required memory allocation functions and optimizes both physical memory allocation and query memory allocation patterns. It provides fair-memory sharing among queries by memory arbitration and disk spilling techniques. It provides the total memory capacity enforcement by managing the physical memory on its own.
Xiaoxuan Meng
Software Engineer at Meta
Simple Aggregation Function Interface
An introduction about the new simple function interface for user-defined aggregation functions (UDAFs). This interface allows UDAF authors to write less and row-based code when implementing a UDAF, with minimal to zero performance degradation, compared to the existing vector-based interface.
Wei He
Software Engineer at Meta