
Day 1 | Day 2
Wednesday, April 16
| 8:00AM – 9:00AM | Registration & Breakfast | |
| 9:00AM – 9:10AM | Day 2 Welcome & Opening Remarks | Ali LeClerc, Open Source at IBM |
| 9:10AM – 9:40AM | Velox in the Accelerated Age | Orri Erling, Co-creator of Velox & Software Engineer at Meta |
| 9:45AM – 10:15AM | Accelerating Velox with RAPIDS cuDF | Greg Kimball, SWE Manager at NVIDIA Karthikeyan Natarajan, Sr. Software Engineer at NVIDIA |
| 10:20AM – 10:40AM | Breeze and Wave: Multi-architecture acceleration for Velox | David Reveman, Principal Member of Technical Staff at Rivos |
| 10:40AM – 11:10AM | Break | |
| 11:10AM – 11:30PM | Velox Memory System Improvements | Xiaoxuan Meng, Software Engineer at Meta |
| 11:35AM – 11:55AM | Introducing HW-Accelerated Velox with CXL Computational Memory | Harry Kim, Chief Product Officer at XCENA |
| 12:00PM – 12:20PM | Data in Motion: Adapting ETL to Real-World Hardware Chaos | Felipe Aramburu, Distinguished Architect & Co-Founder, Voltron Data |
| 12:20PM – 1:30PM | Lunch | |
| 1:30PM – 1:50PM | Real-time ML: Accelerating Velox & Python for inference (< 10ms) at scale | Nathan Fenner, Software Engineer at Chalk |
| 1:55PM – 2:05PM | Velox Reading Column Encrypted Parquet Files for Column Level Access Control | Yunfei Chen, Software Engineer at Uber |
| 2:10PM – 2:30PM | Next Generation Data Processing Architecture for heterogeneous computing infrastructure | Rajan Goyal, Co-Founder & CEO at DataPelago |
| 2:35PM – 3:20PM | Keynote Panel Hardware Accelerators: The Next 10x for Data Management | Orri Erling, Co-creator of Velox & Software Engineer at Meta Felipe Aramburu, Distinguished Architect & Co-Founder, Voltron Data Greg Kimball, SWE Manager at NVIDIA Zoltan Arnold Nagy, Technical Lead at IBM Research Rajan Goyal, Co-Founder & CEO at DataPelago |
| 3:25PM – 3:30PM | Closing |
Session Details
Velox in the Accelerated Age

Orri Erling
Co-creator of Velox & Software Engineer at Meta
Accelerating Velox with RAPIDS cuDF
RAPIDS cuDF is an open source library for accelerating database and dataframe operations on NVIDIA GPUs. The query plan, pipeline, and driver components in Velox are a great match for the composable, device-wide algorithms in cuDF. We’ve added cuDF-based GPU Operators for TableScan, HashJoin, LocalAggregation, and more, and have demonstrated efficient GPU execution using Velox’s TPC-H-derived query plans. Please stay tuned to “velox/experimental” for upcoming cuDF content.

Greg Kimball
SWE Manager at NVIDIA

Karthikeyan Natarajan
Software Engineer at NVIDIA
Breeze and Wave: Multi-architecture acceleration for Velox
Breeze is a header-only library that provides a portable implementation of algorithms for data parallel processing. It provides an abstraction that enables vendor and architecture specialization for optimal performance, and integrates easily into existing CUDA, HIP, SYCL, and OpenCL projects. This talk will give an overview of how Breeze is being used by Wave to provide acceleration on Nvidia and Rivos architectures and how it can be used to support more architectures in the future.

David Reveman
Principal Member of Technical Staff at Rivos
Velox Memory System Improvements
This talk provides an update on the major improvements made to the Velox memory system over the past year, as part of Meta’s Prestissimo migration effort. We’ve made three key enhancements to improve the overall performance and reliability of the system:
- Improved Performance: We optimized memory arbitration, query spilling, and memory allocation to enhance performance under high concurrent workloads and hardware platforms.
- Query Out-of-Memory Prevention: We increased query spilling coverage and built memory pressure based throttling mechanisms to prevent query out-of-memory failures.
- Server Out-of-Memory Prevention: We introduced server memory pushback and fine-tuned memory system configurations in production to prevent server crashes.

Xiaoxuan Meng
Software Engineer at Meta
Introducing HW-Accelerated Velox with CXL Computational Memory
Meeting the ever-increasing demands of data analytics requires more memory and larger computing clusters. However, scaling these infrastructures is complex, costly, and yields diminishing returns. CXL is an advanced interconnect technology that expands memory while maintaining cache coherency, enabling unprecedented memory-centric computing—unlike traditional PCIe-based hardware accelerators (e.g., GPUs). This session introduces XCENA’s CXL computing hardware and software stack and presents a roadmap for implementing Velox integration and Velox-Gluten-Spark applications.
XCENA has a many-core parallel architecture optimized for big data processing and supports software development based on C++/Rust using a distributed framework similar to MapReduce. Additionally, through XFLARE, a query processing engine optimized for XCENA hardware, you can directly execute queries by connecting to the data analytics engines you primarily use, such as Velox, Presto or Spark(via Gluten).

Harry Kim
Chief Product Officer at XCENA
Data in Motion: Adapting ETL to Real-World Hardware Chaos
In this talk, we explore a variety of hardware configurations encountered in real-world scenarios to highlight the necessity of flexible, distributed execution systems. These systems must be capable of adapting to diverse storage solutions, varying network capacities, and heterogeneous GPU environments. This talk is about flexible system design, particularly in optimizing performance bottlenecks within ETL (Extract, Transform, Load) pipelines.
We will examine how data flows through distributed systems and present practical strategies for maintaining efficiency amidst changing hardware conditions. Topics include leveraging compression, tuning parallelism, and selecting appropriate algorithms to maximize throughput. By understanding and addressing system-level constraints, we aim to offer actionable insights for building resilient and adaptable data processing pipelines.

Felipe Aramburu
Distinguished Architect and Co-Founder of Voltron Data
Real-time ML: Accelerating Velox & Python for inference (< 10ms) at scale
Executing complex online data pipelines (< 10ms) end-to-end requires different tradeoffs than scale-out analytical workloads. Velox enables us to have a unified compute platform to support both online and offline analytical workloads. In this talk we will delve into how we built a symbolic Python interpreter and accelerated various Velox internals. Join us to discover how Chalk leverages Velox to power inference-time machine learning models!

Nathan Fenner
Software Engineer at Chalk
Velox Reading Column Encrypted Parquet Files for Column Level Access Control
In the realm of data security, ensuring fine-grained access control is paramount, especially when dealing with sensitive information stored in columnar formats like Apache Parquet. We have introduced capabilities to read column-encrypted Parquet files into Velox, thereby facilitating column-level access control. This advancement leverages Parquet’s modular encryption framework, which allows for the encryption of individual columns using distinct keys, enabling selective data access without compromising the integrity of unencrypted columns. By integrating this functionality, Velox not only enhances data security but also maintains efficient query performance, ensuring that encryption overhead is minimized. This talk will delve into the technical implementation of reading column-encrypted Parquet files within Velox, discuss the challenges encountered, and present performance benchmarks that underscore the efficacy of this approach in real-world scenarios.

Yunfei Chen
Software Engineer at Uber
Next Generation Data Processing Architecture for heterogeneous computing infrastructure
Interest in using accelerated computing hardware for data processing has grown rapidly in recent years. As the industry moves quickly to adopt various acceleration options like CPU/SIMD, FPGA, GPU, TPU, XPU, and more — driven by open-source innovations, such as Gluten and Velox — there’s an urgent need to define the next-generation architecture, using a virtualization approach that abstracts the underlying heterogeneous hardware. In this session, DataPelago will share our vision for the future data stack, one that processes all types of data—unstructured, structured, and semi-structured—across any hardware platform, regardless of the vendor. We’ll also explore the need to establish new industry standard benchmark to evaluate capabilities of heterogenous processing elements for data processing and emerging accelerated data processing stack.

Rajan Goyal
Co-founder and CEO, DataPelago
Keynote Panel – Hardware Accelerators: The Next 10x for Data Management?
Hardware accelerators present a unique opportunity for disruption in the cost efficiency of data systems. How quickly is this opportunity becoming reality? What are the challenges hindering adoption? What role does Velox play in this landscape?
This panel brings together experts from NVIDIA, Meta, IBM Research, DataPelago, and Voltron Data to discuss the opportunities and challenges of running high-performance workloads on specialized hardware. We’ll cover practical considerations such as UDFs, compatibility, and optimizing for both AI and data processing use cases.

Orri Erling
Co-creator of Velox & Software Engineer at Meta

Felipe Aramburu
Distinguished Architect and Co-Founder of Voltron Data

Greg Kimball
SWE Manager at NVIDIA

Zoltan Arnold Nagy
Technical Lead at IBM Research

Rajan Goyal
Co-founder and CEO, DataPelago

Pedro Pedreira
Moderator
Software Engineer at Meta