VeloxCon 2025 Agenda Day 2

Day 1 | Day 2

Wednesday, April 16

8:00AM – 9:00AM	Registration & Breakfast
9:00AM – 9:10AM	Day 2 Welcome & Opening Remarks	Ali LeClerc, Open Source at IBM
9:10AM – 9:40AM	Velox in the Accelerated Age	Orri Erling, Co-creator of Velox & Software Engineer at Meta
9:45AM – 10:15AM	Accelerating Velox with RAPIDS cuDF	Greg Kimball, SWE Manager at NVIDIA Karthikeyan Natarajan, Sr. Software Engineer at NVIDIA
10:20AM – 10:40AM	Breeze and Wave: Multi-architecture acceleration for Velox	David Reveman, Principal Member of Technical Staff at Rivos
10:40AM – 11:10AM	Break
11:10AM – 11:30PM	Velox Memory System Improvements	Xiaoxuan Meng, Software Engineer at Meta
11:35AM – 11:55AM	Introducing HW-Accelerated Velox with CXL Computational Memory	Harry Kim, Chief Product Officer at XCENA
12:00PM – 12:20PM	Data in Motion: Adapting ETL to Real-World Hardware Chaos	Felipe Aramburu, Distinguished Architect & Co-Founder, Voltron Data
12:20PM – 1:30PM	Lunch
1:30PM – 1:50PM	Real-time ML: Accelerating Velox & Python for inference (< 10ms) at scale	Nathan Fenner, Software Engineer at Chalk
1:55PM – 2:05PM	Velox Reading Column Encrypted Parquet Files for Column Level Access Control	Yunfei Chen, Software Engineer at Uber
2:10PM – 2:30PM	Next Generation Data Processing Architecture for heterogeneous computing infrastructure	Rajan Goyal, Co-Founder & CEO at DataPelago
2:35PM – 3:20PM	Keynote Panel Hardware Accelerators: The Next 10x for Data Management	Orri Erling, Co-creator of Velox & Software Engineer at Meta Felipe Aramburu, Distinguished Architect & Co-Founder, Voltron Data Greg Kimball, SWE Manager at NVIDIA Zoltan Arnold Nagy, Technical Lead at IBM Research Rajan Goyal, Co-Founder & CEO at DataPelago
3:25PM – 3:30PM	Closing

Session Details

Velox in the Accelerated Age

Orri Erling
Co-creator of Velox & Software Engineer at Meta

Accelerating Velox with RAPIDS cuDF

RAPIDS cuDF is an open source library for accelerating database and dataframe operations on NVIDIA GPUs. The query plan, pipeline, and driver components in Velox are a great match for the composable, device-wide algorithms in cuDF. We’ve added cuDF-based GPU Operators for TableScan, HashJoin, LocalAggregation, and more, and have demonstrated efficient GPU execution using Velox’s TPC-H-derived query plans. Please stay tuned to “velox/experimental” for upcoming cuDF content.

Greg Kimball
SWE Manager at NVIDIA

Karthikeyan Natarajan
Software Engineer at NVIDIA

Breeze and Wave: Multi-architecture acceleration for Velox

Breeze is a header-only library that provides a portable implementation of algorithms for data parallel processing. It provides an abstraction that enables vendor and architecture specialization for optimal performance, and integrates easily into existing CUDA, HIP, SYCL, and OpenCL projects. This talk will give an overview of how Breeze is being used by Wave to provide acceleration on Nvidia and Rivos architectures and how it can be used to support more architectures in the future.

David Reveman
Principal Member of Technical Staff at Rivos

Velox Memory System Improvements

This talk provides an update on the major improvements made to the Velox memory system over the past year, as part of Meta’s Prestissimo migration effort. We’ve made three key enhancements to improve the overall performance and reliability of the system:

Improved Performance: We optimized memory arbitration, query spilling, and memory allocation to enhance performance under high concurrent workloads and hardware platforms.
Query Out-of-Memory Prevention: We increased query spilling coverage and built memory pressure based throttling mechanisms to prevent query out-of-memory failures.
Server Out-of-Memory Prevention: We introduced server memory pushback and fine-tuned memory system configurations in production to prevent server crashes.

Xiaoxuan Meng
Software Engineer at Meta

Introducing HW-Accelerated Velox with CXL Computational Memory

Meeting the ever-increasing demands of data analytics requires more memory and larger computing clusters. However, scaling these infrastructures is complex, costly, and yields diminishing returns. CXL is an advanced interconnect technology that expands memory while maintaining cache coherency, enabling unprecedented memory-centric computing—unlike traditional PCIe-based hardware accelerators (e.g., GPUs). This session introduces XCENA’s CXL computing hardware and software stack and presents a roadmap for implementing Velox integration and Velox-Gluten-Spark applications.

XCENA has a many-core parallel architecture optimized for big data processing and supports software development based on C++/Rust using a distributed framework similar to MapReduce. Additionally, through XFLARE, a query processing engine optimized for XCENA hardware, you can directly execute queries by connecting to the data analytics engines you primarily use, such as Velox, Presto or Spark(via Gluten).

Harry Kim
Chief Product Officer at XCENA

Data in Motion: Adapting ETL to Real-World Hardware Chaos

In this talk, we explore a variety of hardware configurations encountered in real-world scenarios to highlight the necessity of flexible, distributed execution systems. These systems must be capable of adapting to diverse storage solutions, varying network capacities, and heterogeneous GPU environments. This talk is about flexible system design, particularly in optimizing performance bottlenecks within ETL (Extract, Transform, Load) pipelines.

We will examine how data flows through distributed systems and present practical strategies for maintaining efficiency amidst changing hardware conditions. Topics include leveraging compression, tuning parallelism, and selecting appropriate algorithms to maximize throughput. By understanding and addressing system-level constraints, we aim to offer actionable insights for building resilient and adaptable data processing pipelines.

Felipe Aramburu
Distinguished Architect and Co-Founder of Voltron Data

Real-time ML: Accelerating Velox & Python for inference (< 10ms) at scale

Executing complex online data pipelines (< 10ms) end-to-end requires different tradeoffs than scale-out analytical workloads. Velox enables us to have a unified compute platform to support both online and offline analytical workloads. In this talk we will delve into how we built a symbolic Python interpreter and accelerated various Velox internals. Join us to discover how Chalk leverages Velox to power inference-time machine learning models!

Nathan Fenner
Software Engineer at Chalk

Velox Reading Column Encrypted Parquet Files for Column Level Access Control

In the realm of data security, ensuring fine-grained access control is paramount, especially when dealing with sensitive information stored in columnar formats like Apache Parquet. We have introduced capabilities to read column-encrypted Parquet files into Velox, thereby facilitating column-level access control. This advancement leverages Parquet’s modular encryption framework, which allows for the encryption of individual columns using distinct keys, enabling selective data access without compromising the integrity of unencrypted columns. By integrating this functionality, Velox not only enhances data security but also maintains efficient query performance, ensuring that encryption overhead is minimized. This talk will delve into the technical implementation of reading column-encrypted Parquet files within Velox, discuss the challenges encountered, and present performance benchmarks that underscore the efficacy of this approach in real-world scenarios.

Yunfei Chen
Software Engineer at Uber

Next Generation Data Processing Architecture for heterogeneous computing infrastructure

Interest in using accelerated computing hardware for data processing has grown rapidly in recent years. As the industry moves quickly to adopt various acceleration options like CPU/SIMD, FPGA, GPU, TPU, XPU, and more — driven by open-source innovations, such as Gluten and Velox — there’s an urgent need to define the next-generation architecture, using a virtualization approach that abstracts the underlying heterogeneous hardware. In this session, DataPelago will share our vision for the future data stack, one that processes all types of data—unstructured, structured, and semi-structured—across any hardware platform, regardless of the vendor. We’ll also explore the need to establish new industry standard benchmark to evaluate capabilities of heterogenous processing elements for data processing and emerging accelerated data processing stack.

Rajan Goyal
Co-founder and CEO, DataPelago

Keynote Panel – Hardware Accelerators: The Next 10x for Data Management?

Hardware accelerators present a unique opportunity for disruption in the cost efficiency of data systems. How quickly is this opportunity becoming reality? What are the challenges hindering adoption? What role does Velox play in this landscape?

This panel brings together experts from NVIDIA, Meta, IBM Research, DataPelago, and Voltron Data to discuss the opportunities and challenges of running high-performance workloads on specialized hardware. We’ll cover practical considerations such as UDFs, compatibility, and optimizing for both AI and data processing use cases.

Orri Erling
Co-creator of Velox & Software Engineer at Meta

Felipe Aramburu
Distinguished Architect and Co-Founder of Voltron Data

Greg Kimball
SWE Manager at NVIDIA

Zoltan Arnold Nagy
Technical Lead at IBM Research

Rajan Goyal
Co-founder and CEO, DataPelago

Pedro Pedreira
Moderator
Software Engineer at Meta