Agenda

April 29-30 • Meta HQ, Menlo Park

Day 1 • April 29

8:00 AM

9:00 AM

Welcome and Opening Remarks for VeloxCon 2026 where we’ll look at how far the community has come in the last year and what’s in store.

Aakash Deep
Software Engineering Manager

Ali LeClerc
Head of Open Source Strategy

9:15 AM

As AI becomes data’s biggest customer, the components of the Lakehouse must evolve. This talk explores how Lakehouse workloads evolved to power AI, from batch ETL and interactive analytics, to training data preparation, AI agentic workload, and feeding data-hungry GPU trainers at scale. The talk presents how Velox is enabling this new era, and discusses new open source projects that are also becoming foundational parts of the AI Lakehouse, including Nimble, A11, Axiom, and Collagen.

Pedro Pedreira
Software Engineer

9:50 AM

Masha will provide an update on the Axiom project including vision, architecture, and what the future holds at Meta.

Maria Basmanova
Software Engineer & Co-creator of Velox

10:10 AM

10:40 AM

Hear more about IBM’s work with open-source Velox and their vision for the project.

Volkmar Uhlig
VP and CTO Data Platform & Engineering

11:00 AM

Adding custom functions to a query engine traditionally requires code changes and full release cycles. In this talk, we present Presto’s JSON file-based function namespace manager – an extensible framework that decouples function registration from the Presto release process. Functions are defined in a simple JSON configuration file specifying signatures and script paths, then loaded by the coordinator at startup.
We’ll dive into the optimizer that bridges the gap between these JSON-registered functions and Velox’s execution backend. This optimizer rewrites function calls, performs type casting, and handles new functions automatically – no code changes required.

On the execution side, Prestissimo offers two complementary paths for Python functions. Lightweight, vetted functions run in-process on the Velox worker as vector functions, leveraging pybind11 and free-threaded Python for multi-threaded execution with minimal overhead. Complex scripts with custom dependencies are offloaded to a remote Python co-processor service, providing isolation and flexibility. This dual execution model lets teams balance performance and isolation per function.

We demonstrate two production use cases built on this framework:

  1. User-defined Python functions – users provide a function signature and a Python script, both in external files, with execution routed to in-process or remote paths based on configuration.
  2. AI-powered SQL functions – calling large language model services through Python APIs, making LLM capabilities accessible directly from SQL queries.

Attendees will learn how to leverage this architecture to rapidly extend their Presto/Prestissimo deployment with custom and AI-powered functions.

Feilong Liu
Software Engineer

Sebastiano Peluso
Software Engineer

11:25 AM

A growing class of AI workloads—clustering, graph algorithms, and agentic AI pipelines—share a common requirement: the ability to repeat a computation until convergence. K-means iterates until centroids stabilize. Connected components and label propagation iterate until labels stop changing. Agentic workflows iterate through retrieve–reason–refine cycles. Today, these workloads must leave the compute engine for external systems because Velox has no iteration primitive.

We explore what it would take to bring iteration into Velox as a first-class capability. The opportunity is significant: Velox already has the right operators for each individual step—vector search for retrieval, aggregation for reduction, exchange for communication between workers. The missing piece is a way to feed output back as input and repeat.
We examine how established distributed computation models—including the patterns behind systems like Spark GraphX and Pregel-style graph engines—map onto Velox’s existing architecture. Velox’s exchange and partitioning infrastructure, for example, provides functionality analogous to the message-passing phase in graph computation frameworks.

The question is whether Velox can serve as a general compute substrate for these workloads, rather than requiring purpose-built engines for each one.
We discuss the key challenges—distributed coordination across iterations, memory management for iteration state, and the tension between Velox’s columnar model and graph-structured data—and outline design directions that stay within Velox’s existing architectural patterns. We also explore how iteration composes with Velox’s GPU acceleration layer, since iteration bodies that are standard plan trees could potentially benefit from hardware offload with no special-casing.

Practical impact: workloads that today require purpose-built external systems could run entirely inside the engine, composing with existing operators and benefiting from Velox’s distributed execution and hardware acceleration.

Zhichen Xu
Software Engineer

Junjie Qi
Software Engineer

Jingfang Liu
Software Engineer

11:50 AM

Vector search is increasingly important for AI workloads — retrieval-augmented generation, semantic search, recommendation, and entity resolution all depend on finding similar vectors at scale. The common approach is to build specialized infrastructure separate from the data processing engine.

We show how vector search can be implemented in Presto and Velox using existing relational operators, providing composability with standard data operations and optimizer-driven distributed execution.

We validated this approach on the public DEEP1B benchmark (1B vectors, 96 dimensions), achieving 96.6% Recall@1 in under 2 minutes.

We present the architecture and lessons learned.

Zhichen Xu
Software Engineer

Aakash Deep
Software Engineering Manager

12:10 PM

1:10 PM

This talk presents recent advancements in Velox’s IO layer across three fronts. First, we introduce a Nimble Serializer that encodes SST file values in a hybrid Nimble file format, improving serving efficiency for existing normalized training data systems as a near-term solution. Second, we present cluster key and dense index support for the Nimble file format, designed to optimize normalization training data ingestion and storage as a long-term solution. Finally, we discuss extending Velox connectors to support Iceberg and Paimon open table formats, broadening Velox’s interoperability with the broader data lakehouse ecosystem.

Xiaoxuan Meng
Software Engineer

1:35 PM

Meta’s recommendation models consume massive volumes of training data and requires last mile data transformations before tensor conversion. To meet this demand, we replaced our legacy data processing engine with Velox’s vectorized execution engine — a project we call Veloski. We share how we rolled out Velox across tens of thousands of production training jobs per day, and the key challenges we resolved — especially correctness validation in the presence of black-box UDFs that run arbitrary mini-engines or access external data, and interface with legacy in-memory representations. Finally, we look ahead at what Velox unlocks: better debuggability, richer query capabilities, and more for ML engineers.

Elodie Li
Software Engineer

2:00 PM

Talking about Uber’s journey and next steps with Velox.

Jay Narale
Software Engineer

Hitarth Trivedi
Software Engineer

2:15 PM

This talk covers the last mile of Meta’s migration from Presto Java to Prestissimo, powered by Velox – completing the full transition of our Presto infrastructure. We dive into the key differences between the two engines, the challenges of migrating long-tail workloads, and the strategies used to close out the final stretch. The session also highlights the capacity and cost efficiencies gained through Velox.

Amit Dutta
Software Engineer

2:40 PM

Roadmap for Iceberg format version 3 features support in Presto and Velox

Naveen Kumar Mahadevuni
Software Engineer

Apurva Kumar
Software Engineer

3:00 PM

3:30 PM

GeoVelox is Meta’s initiative to bring native geospatial capabilities to Velox, closing the final major feature gap blocking deprecation of the Presto Java execution stack. This talk covers the project vision, the architectural challenges of adding geospatial to a composable execution engine, and the road ahead.

James Gill
Software Engineer

3:55 PM

This talk presents the design, implementation, and production deployment of Velox’s Expression Optimizer—a powerful framework that transforms and simplifies expression trees through constant folding and logical rewrites. We’ll explore how this capability unlocks consistent expression evaluation semantics between heterogeneous engines like Presto’s Java coordinator and C++ workers.

Pramod Satya
Software Engineer

4:20 PM

This talk covers the probabilistic data structures in Velox — HyperLogLog, P4HLL, KHyperLogLog, TDigest, QDigest, and SetDigest — when to use each, lessons from porting Java Presto’s sketch functions to C++, and fuzzer-driven correctness testing.

Natasha Sehgal
Software Engineer

4:45 PM

Polymorphic Table Functions (PTFs) are specialized user-defined functions that can be invoked in the FROM clause of a SQL query. Unlike standard table-valued functions, their return schema is dynamic and determined at runtime based on the arguments or input tables passed to them. PTFs allow for complex logic—such as dynamic pivots, row replication, or custom data transformations—that goes beyond standard SQL capabilities. 

We added PTFs in Presto and Presto C++ this year. This talk will introduce the functionality, APIs and examples.

Aditi Pandit
Software Engineer

5:10 PM

Closing Remarks for VeloxCon 2026

Pedro Pedreira
Software Engineer

Ali LeClerc
Head of Open Source Strategy

5:15 – 7:00pm

Day 2 • April 30

8:00 AM

9:00 AM

Day 2 Opening Remarks

Aakash Deep
Software Engineering Manager

Ali LeClerc
Head of Open Source Strategy

9:15 AM

Hear about Nvidia’s work with Velox and GPUs, including work on Presto, Spark/Apache Gluten, and the work going back into open-source.

Greg Kimball
Software Engineer

Shruti Shivakumar
Software Engineer

Karthikeyan Natarajan
Sr. Software Engineer

9:50 AM

Orri Erling
Software Engineer & Co-creator of Velox

10:20 AM

10:50 AM

In this presentation, I will discuss how we can push the limits of data movement in Presto/Velox by optimizing how data is fetched, cached, and exchanged across a distributed system.

I will introduce cuDF Exchange, a GPU-native data exchange mechanism that enables direct GPU-to-GPU transfers over UCX without staging through host memory. CuDF Exchange allows Velox workers to transfer data directly between GPUs, fully utilizing high-speed interconnects such as NVLink and RDMA, and thereby unlocking the full bandwidth of modern hardware.

I will also present a distributed asynchronous data cache layer that is built on top of Velox’s asynchronous data cache enabling data to be fetched and cached in parallel. The system is designed to scale elastically in cloud environments, allowing us to increase cache capacity and control parallel data fetching in cold-start scenarios. By moving data closer to GPUs and overlapping I/O with computation, we significantly reduce the gap between cold and warm query execution, making I/O no longer the dominant bottleneck.

Together, these innovations redefine how data flows through a distributed query engine resulting in a system where data movement is no longer the limiting factor, enabling analytical workloads to fully exploit the performance potential of modern GPU infrastructure.

Marios Angelis
Software Engineer

11:15 AM

DataPelago Nucleus is the core of DataPelago’s universal data processing engine. DataPelago Nucleus features an accelerator-centric virtual machine with a data processing domain-specific instruction set architecture and a smart acceleration planner that dynamically optimizes across multiple execution backends. It is purpose-built to unlock the full potential of the hybrid CPU and GPU infrastructure that enterprises have already invested in. Its architecture extends Velox’s capabilities and performance on heterogeneous infrastructure including CPU-only and GPU-enabled servers.

This talk will discuss these two features of DataPelago Nucleus:
1) We describe how our implementation of the accelerator-centric virtual machine for GPUs innovates around technical hurdles that have stymied earlier implementations. We illustrate how these enable DataPelago Nucleus to achieve better performance on TPC-DS 1TB queries running on Apache Spark over competing implementations.
2) We describe our smart acceleration planner, which dynamically evaluates the acceleration benefit and cost of executing the operators in the logical plan on the available mix of compute elements. This evaluation drives the optimization of the overall physical plan, judiciously targeting sections of the plan to the appropriate acceleration compute backend. We illustrate how Apache Gluten and Velox’s built-in fallback mechanism can degrade performance on real-world Apache Spark applications and how our smart acceleration planner for Apache Spark alleviates this.

John Janakiraman
VP Engineering

Satyanarayana Lakshmipathi Billa
Technical Fellow

11:40 AM

Modern analytical engines demand massive memory capacity and bandwidth for large-scale scans, aggregations, and joins, yet conventional server architectures face hard limits in slot count, density, and cost. CXL offers a compelling alternative by enabling cache-coherent access to device-attached memory over PCIe. Memory expansion allows more datasets to reside in-memory, memory pooling shares capacity across multiple hosts to improve utilization and reduce inter-host data transfer overhead, and near-data processing pushes query operations closer to where data resides, reducing unnecessary movement.

In this talk, we introduce the MX1, a CXL computational memory device that goes beyond expansion by offloading columnar query operations — decompression, filtering, aggregation, and string search — directly at the memory controller. We present microbenchmark results showing up to 5× throughput and 19× energy efficiency improvements over host CPU execution with CXL memory, and demonstrate how these kernels compose into TPC-H query plans with end-to-end performance gains.

We then share our experience integrating with Velox, describing how we leveraged its extensibility interfaces to offload query operators to XFLARE, our Rust-based OLAP query engine built for accelerating MX1. We discuss what worked, the extensibility challenges we encountered, and future directions including a contribution idea for a CXL-aware memory allocation.

Yongil Jung
Software Engineer

Sungwoo Chang
Software Engineer

12:00 PM

1:00 PM

With Project Flare, we’re using Gluten to veloxify the Spark workloads. The session would talk about our motivation, journey, roadmap and learnings.

Ankur Pathela
Staff Software Engineer

1:25 PM

Meta’s graph data — social graphs, knowledge graphs, data lineage — is spread across multiple storage backends with no unified query interface. Teams building RAG pipelines, knowledge graph applications, and data lineage tools write bespoke integration code against each backend. We built a Cypher/ISO GQL Query Layer that lets developers write a single declarative graph query and execute it across all of these backends, powered by Velox as the execution engine.

To our knowledge, this is the first system to repurpose Velox’s connector and plan infrastructure for federated graph query execution. The key insight is that graph pattern matching — MATCH (a)-[r]->(b) — maps naturally to relational operations: table scans, hash joins, and filters. By targeting Velox’s logical plan representation, we inherit its vectorized columnar execution, memory management, and connector abstraction without building a custom graph engine. Velox’s extensible connector interface lets us federate queries across TAO (Meta’s distributed graph store), ZippyDB, relational databases, search infrastructure, and Hive warehouse tables through a single execution framework.

The same connector abstraction that federates queries across storage backends also provides a natural path to AI-native retrieval. The architecture supports pushing specialized operations — BM25 text search, vector similarity and recency — down to dedicated indexes before Velox handles post-processing joins and filters. This enables hybrid search patterns for RAG: find semantically similar entities via vector search, then traverse the knowledge graph to retrieve source documents, all expressed in a single Cypher query. We’ll discuss the architecture of this multi-retrieval path with plug-in backends.

Whether you’re building query systems on top of Velox, working on RAG pipelines, or exploring graph query capabilities on columnar engines, you’ll walk away with a practical understanding of how to layer graph semantics onto Velox’s execution model and connector framework.

Abdullah Ozturk
Software Engineer

Vishal Gandhi
Software Engineering Manager

1:40 PM

Apache Gluten has officially graduated from the Apache Incubator to become a Top-Level Project (TLP), marking a significant milestone for the project and its community . As a middle-layer plugin designed to dramatically accelerate Apache Spark workloads, Gluten’s success is deeply intertwined with the high-performance native execution of engines like Velox .
In this session, we will celebrate the project’s graduation and share the journey of building a diverse and mature open-source community around The Apache Way . We will then dive deep into the latest advancements in Gluten’s Velox backend. Attendees will learn about the performance and stability improvements from recent releases, including enhanced support for Spark SQL operators, optimized columnar processing, and expanded data type coverage . We will also discuss the project’s roadmap, focusing on deeper integration with Velox, plans for supporting GPU, and how Gluten is paving the way for a more efficient and scalable future for Spark .

Yuan Zhou
Software Engineering Manager

2:05 PM

Journey at Meta to Validate & run Presto-on-Spark + Velox for analytics workloads at Meta. Going through the motivation, intial setup to verification and benchmarking achieving the current state of more more than 1/3 of queries running in PROD stably with latency and efficiency wins. Plan for full 100% adoption.

Chandrashekhar Singh
Software Engineer

2:30 PM

Apache Gluten has grown from an incubating project into an Apache Software Foundation Top‑Level Project by building on a strong foundation of native execution powered by Velox. This session walks through that journey, starting with how Gluten evolved architecturally to offload Spark SQL execution to Velox, enabling vectorized native execution while preserving Spark semantics, APIs, and fault tolerance.

We ground this architecture in real‑world practice by sharing how Microsoft uses Gluten and Velox in large‑scale production Spark environments, including why native execution was introduced, which workload patterns benefit most, and what challenges emerged around correctness, fallback behavior, security integration, and operational stability at scale. These production experiences directly informed upstream improvements, ensuring that lessons learned in practice fed back into open‑source design rather than remaining platform‑specific.

Beyond adoption, we highlight what Microsoft has contributed back to the Velox and Gluten communities as part of this journey, including expanded Spark operator and expression coverage, improved plan translation and fallback behavior, strengthened compatibility with newer Spark runtimes, and Velox enhancements driven by large‑scale execution requirements. The talk concludes with lessons from Gluten’s ASF graduation and a look ahead at its 2026 roadmap, emphasizing continued collaboration across the broader Velox ecosystem.

William Chen
Engineering Manager | Apache Gluten PMC Chair

2:50 PM

3:20 PM

AI is rapidly becoming part of the day-to-day workflow for engineers building systems like Velox, but it’s not all upside. In this panel, we’ll go beyond the hype and talk about what actually works: where AI meaningfully speeds up development, where it still falls short on complex, performance-critical code, and what we’ve lost (or maybe gained) in the shift.

Panelists will share real experiences using AI in production codebases including what they rely on, what they don’t trust, and how they think about correctness, debugging, and maintainability in an AI-assisted world.

Maria Basmanova
Co-creator of Velox

Muhammad Haseeb
Sr. Software Engineer

Aditi Pandit
Software Engineer

Eric Liu
Software Engineer

Amit Dutta
Software Engineer

4:10 PM

Sapphire (Presto-on-spark) is Presto’s C++, scheduled on spark runtime, with Velox execution, designed to deliver significant performance improvements for large-scale analytical workloads at Meta. In this talk, we present Sapphire’s architecture and how it deeply integrates with the Velox execution engine.

We dive into three key areas where Sapphire extends and optimizes Velox’s capabilities: shuffle integration, detailing our shuffle operators and performance enhancements through tight coupling with Velox’s memory and serialization layers; broadcast join with hash table caching, showing how reusing pre-built hash tables across tasks eliminates redundant work; and sorted-shuffle and sort-merge join support, extending Velox’s operator model for merge-based join strategies critical at scale.

We close with production results demonstrating Sapphire’s gains over Presto’s Java-based execution engine across key workloads at Meta

Shrinidhi Joshi
Software Engineer

4:35 PM

Oracle AI Data Platform is a unified, governed cloud platform that brings together enterprise data, analytics, and AI into a single, cohesive environment for building and scaling AI applications. It unifies structured and unstructured data, open lakehouse storage, AI models, and developer tooling under a common catalog, security, and governance layer—enabling teams to move from data ingestion to AI-driven insights and applications on one consistent foundation.
As data volumes and AI workloads grow, the JVM-based execution layer in Apache Spark increasingly becomes a bottleneck, especially for large-scale analytical queries over OCI Object Storage.
In this talk, we share our experience accelerating Spark on Oracle Cloud Infrastructure using Gluten and Velox, alongside a custom native connector for OCI Object Storage. By eliminating JVM overhead and enabling end-to-end columnar native execution, from storage to compute, we achieved 2×–7× performance improvements on representative TPC-DS and production ETL workloads, with no changes to user-facing Spark SQL.
We will cover the system architecture, execution plan transformations, fallback mechanisms, key challenges, and lessons learned from real-world deployments. A live or recorded demo may also be included.

Koushik Kumar Mondal
Consulting Member of Technical Staff

4:50 PM

Every contributor has stared at a red CI build wondering why — and every maintainer has fielded the follow-up question. Over the past quarter we set out to fix both on Velox. The pipeline is now faster (feedback in minutes, not hours), greener (flaky retries, sanitizer cleanups, clearer failure reports), and smarter (an LLM reviews each PR, and a second one reads failed CI logs and writes plain-English diagnostics back on the PR). Come for the AI bots — stay for the field report on how LLMs are integrated into Velox’s open-source CI today, and what that’s meant for contributors and maintainers.

Krishna Pai
Software Engineer

5:05 PM

At Meta, with 1,100+ Hive/Spark Java UDFs and ~60% lacking Velox equivalents, manual migration to Velox is impractical. We present a closed-loop workflow where an AI agent generates Velox C++ UDF implementations and iteratively validates them using a differential fuzz testing framework we built. The framework executes the original Spark/Hive Java UDF and the new Velox C++ implementation side-by-side in-process via JNI, comparing outputs, null semantics, and exception behavior across thousands of randomized inputs to guarantee behavioral parity. We’ve been able to veloxify 10+ UDFs/UDAFs in one week with high correctness confidence — 80%+ faster than manual approaches.

MJ Deng
Software Engineer

Sandeep Thandassery
Software Engineering Manager

5:20 PM

5 minute closing remarks