
Saturday, December 13
Session Details
Opening Remarks and Open Source
General welcome, opening remarks, and Velox open source.

Pedro Pedreira
Meta, Software Engineer
Composability driving faster innovation at EMR spark
Composability enables experts from different domains collaborate together to build high quality systems. This session shares how Alibaba Cloud EMR Serverless Spark leverages various components, i.e. Apache Celeborn, Velox, Apache Gluten (incubator), Apache Paimon to build high performance Spark engine and our contribution to community.

Keyong Zhou
Alibaba Cloud, Software Engineer
Axiom: Composable Compute Frontend

Masha Basmanova
Co-creator of Velox & Software Engineer at Meta
Accelerating Training Data Normalization with Velox at Meta
In this talk, I will present how Velox accelerates training data normalization for AI at Meta, enabling scalable model development with raw user-sequence features. Velox powers the training data loading with high-performant operators like index join and streaming aggregation for fast user sequence injection, optimizes performance of the sequence preparation pipeline on PySpark-Velox, and supports advanced user sequence exploration on Prestissimo and Sapphire-Velox.

Xiaoxuan Meng
Software Engineer at Meta
Accelerating Spark in AntGroup with Gluten & Velox
This talk will share production experience on accelerating spark by using gluten & velox. We will also share improvements on gluten/velox based on our workloads and running environment

Yewei Huang
Software Engineer at AntGroup
Apache Gluten: Delivering Continuous Innovation in Big Data Analytics
Apache Gluten is an open-source project that brings the power of native execution to the modern data lakehouse. By offloading query execution from the JVM to high-performance C++ backends like Velox — and leveraging GPU acceleration — Gluten delivers major speedups for compute-intensive workloads while remaining fully compatible with popular engines such as Apache Spark and Flink.
In this session, we’ll introduce what Gluten is, how it works, and why it matters. You’ll learn about the architecture behind native and GPU-accelerated query execution, how Gluten integrates seamlessly into existing Spark and Delta Lake pipelines, and the performance gains users have seen in production.
Whether you’re running analytics at scale, optimizing Delta Lake workloads, or just curious about the future of high-performance query engines, this talk will help you understand where Gluten fits in the open data ecosystem and how to get started.

Yuan Zhou
Software Engineer at IBM

Rui Mo
Software Engineer at IBM
Optimization and Practice of Velox for Accelerating Lakehouse Analytics with Iceberg in WeChat
This session primarily presents the deployment of the Apache Gluten plugin — Velox as the backend—in WeChat business scenarios, alongside our related engineering practices.
For canary releases, we independently developed a canary release system that allows for fine-grained management at both the business and cluster levels. By performing experimental comparisons on business SQL queries, we ensured query result consistency and performance improvements before gradually enabling the Gluten-Velox vectorized engine for specific SQL workloads. This approach effectively guarantees the smooth rollout of the engine.
Regarding feature support, we have implemented vectorized write capabilities for various table types, including non-partitioned, partitioned, and bucket tables, significantly improving data write efficiency. In addition, we have developed and enhanced multiple Spark SQL functions within Velox, further expanding the engine’s functionality. To address execution instability, we investigated and resolved memory management issues, and enabled disk overflow protection within Kubernetes clusters.
On the operations and management side, we built a Spark SQL Listener to collect execution metrics, as well as a reporting platform that provides daily statistics. These metrics include the proportion of SQL statements leveraging the vectorized engine, acceleration ratios, failure rates, and total core-hours saved. Through comprehensive monitoring and data analysis, we have provided robust support for both the stable operation and benefit evaluation of the Gluten-Velox solution.

Jinhai Chen
Big Data Development Engineer at Tencent WeChat
Technical Evolution and Production Deployment of Xiaohongshu’s Native Engine
Xiaohongshu’s Native Engine is built upon the Gluten and Velox technology frameworks, leveraging vectorized execution to accelerate data processing and computation. By integrating the Native Engine into the Apache Spark ecosystem, Xiaohongshu has achieved substantial performance gains and reductions in resource consumption.
Through continuous technical iteration and optimization—including enhancements to query execution planning, I/O efficiency, and other core components—the Native Engine has achieved up to a 1.45× performance boost compared to the original Spark engine. It is now being fully deployed in production across key business domains such as AI Platform, Search & Recommendation, Applied Algorithm Platforms, and Offline Data Warehousing, resulting in an overall 30%+ reduction in computing resource costs and delivering a more efficient, scalable, and high-performance data processing infrastructure.

Xiuli Wei
Tech Lead at Xiaohongshu
Accelerating Spark with Gluten and Velox: Xiaomi’s Engineering Practice
This presentation outlines Xiaomi’s production-level experience in accelerating Apache Spark through the adoption of the Gluten and Velox technology stack:
1. The rationale behind Xiaomi’s decision to adopt Gluten + Velox for Spark acceleration;
2. Key challenges encountered during deployment and our solutions, including data consistency, memory stability;
3. Measurable benefits achieved in performance and efficiency;
4. Future directions for Spark acceleration and native execution at scale.

胜杰 王
Computing Engine R&D Engineer at Xiaomi
Scaling Spark at Meta with Velox: Our Journey and Future
Meta’s data infrastructure faces astronomical scale and complexity, driven by analytics and cutting-edge AI/ML workloads that far exceed the capabilities of open source Spark. As GenAI and recommendation systems push the boundaries of scalability and efficiency, Meta doubled down on Velox as the high-performance execution engine to supercharge Spark and unlock new possibilities for AI and ML innovation.
This talk will share how Meta executed a multi-stage plan to enable Spark on Velox, including complementary product offerings such as PySpark/Velox and Gluten-based Veloxification. We’ll walk through our journey so far, highlight key learnings, and discuss how these efforts are helping Meta lead the industry in large-scale data processing for AI and ML.

Stanley Yao
Software Engineering Manager at Meta
GPU accelerated data processing on Velox and Presto
Over the past year, NVIDIA and IBM engineers have been working towards bringing GPU acceleration to Velox and Presto. Since April’s VeloxCon a lot of engineering effort has gone into stabilizing the approach. In this talk I’d like to give an overview of where we are, what are we working on and what are the next challenges, along with showing benchmark data from various platforms. I’m also going to talk about GPU deployment challenges and options for both on-prem and on AWS.

Zoltan Arnold Nagy
Sr. Software Engineer at IBM Research
Native-Presto-on-Spark: Prestissimo at Scale
Following the success of Prestissimo (Presto with Velox), Meta has taken a significant step forward in large-scale data processing by integrating Presto-on-Spark with the Velox engine. This integration combines Velox’s high-performance execution capabilities with Spark’s robust large-scale scheduling & fault tolerance capabilities, enabling us to scale Prestissimo workloads by several orders of magnitude reliably.
In this talk, we will share the high-level technical architecture of the new Native Presto-on-Spark system. We’ll discuss how we architected Presto-on-Spark to leverage Spark’s distributed scheduler while harnessing Velox’s efficiency, resulting in dramatic improvements in scalability and performance. We will also compare this new approach with the previous generation of Presto-on-Spark, highlighting key improvements, challenges overcome, and lessons learned.
Attendees will gain insights into:
* The motivation and design principles behind Native-Presto-on-Spark
* How Velox and Spark complement each other in this architecture
* The technical challenges faced and solutions implemented
* Quantitative and qualitative improvements over the previous generation
* Future directions for large-scale data processing at Meta

Jialiang Tan
Software Engineer at Meta
Presto and Velox in Alibaba Cloud Log Analytics Service and the trending of Data X AI
(1) Started from 2017, 20K+ daily active users, 20K+ nodes, 5000M+ queries/day
(2) System architecture: inverted index, column store, compute-storage separation, horizontal scale, affinity schedule, load balance
(3) Presto&Velox mix run: protocol migration, whitelist restriction, functional correctness guarantee, most java workers have gone offline
(4) Incremental materialized views: predicate compensation, aggregation rewrite, union compensation, one-click acceleration, automatic materialized views
(5) Expression Pushdown: Introducing an asynchronous multi-stage I/O evaluation framework that pushes computation down to TableScan, supporting Filter-Project pushdown, Join Dynamic Filter, and Top-N pushdown, to eliminate redundant I/O
(6) The trending of Data × AI: Reshaping computing and storage in the AI era

Yunlei Ma
Staff Engineer at Alibaba Cloud

Adong Fan
Sr. Software Engineer at Alibaba Cloud

Bin Wang
Sr. Software Engineer at Alibaba
Cloud
Building a Cloud-Native Time-Series Database on Velox: Insights and Futures
This talk will share our practical experience building a cloud-native time-series database based on Velox, the Meta open-source execution engine. It will cover system architecture design, key technical challenges, and the in-depth customization and extensions we’ve implemented to adapt to time-series scenarios. Our core work on Velox includes:
1. Supporting conversion from InfluxQL to Velox Logical Plan, enabling compatibility with traditional time-series query languages.
2. Addressing the performance and memory consumption issues of window operators in deduplication scenarios, we designed and introduced a new Deduplicate operator, which improved performance by two orders of magnitude in real-world scenarios.
3. Expanding TableWrite capabilities from supporting only partition/bucket writes to supporting batch writes constrained by file size, better suited to the high-frequency, small-batch write model of time-series data.
4. Implementing support for inverted indexes and page indexes, significantly improving the performance of filter queries based on tags and time ranges.
5. Implementing support for fill, last/first, and group by operations. Native support for time series functions like time simplifies user requirements for filling missing data and time alignment.
We will also compare the application of Apache DataFusion in time series databases such as InfluxDB 3.0, analyzing Velox’s strengths and weaknesses in time series scenarios, such as gap-filling support, as-of-join (aligned by time) capabilities, and flexible time-sharding write strategies.
Finally, we will share our outlook on Velox’s future development in the time series database space, including areas for community contribution and the technical evolution path we aim to drive.

兆龙 李
Sr. Engineer at Tencent
Supercharging Iceberg Writes with Velox
We present our recent work on enabling high-performance Iceberg table writes through Presto and Gluten, powered by Velox. By leveraging Velox’s vectorized execution and native compute engine, we achieved up to 11× faster write performance compared to the Java-based implementation.
This capability has been fully integrated into both Prestissimo and Gluten, extending native Iceberg write support across multiple compute engines and delivering a consistent, high-speed data ingestion experience.
In this session, we’ll share the design and optimization of the native Iceberg writer, discuss key technical challenges, and present benchmark results that highlight how Velox enables next-generation performance for Iceberg workloads.

Ping Liu
Software Engineer at IBM
Deep Integration of Velox with Paimon C++: Accelerating Lakehouse Analytics at Alibaba
This talk presents the end-to-end optimization of lakehouse analysis based on the Spark native vectorization engine of Gluten and Velox and Apache Paimon data lake format. We begin with an overview of the modern SQL execution stack—comprising Apache Spark, Gluten, Velox, and Paimon—and explain how they interoperate. We then detail Gluten’s unified framework for integrating diverse lake formats (e.g., Iceberg, Hudi, Delta, Paimon) via Spark DataSource APIs. The core of the talk focuses on our native C++ integration of Paimon with Velox. Finally, we share real-world adoption within Alibaba, including performance benchmarks, resource savings, and lessons learned from running this stack at scale on cloud-native infrastructure.

Tao Zhou
Engineer at Alibaba

Yan Bi
Cloud Engineer at Alibaba
Flex: Unified Stream and Batch Vectorized Engine Built on Velox
While Velox excels at CPU-bound, in-memory computation, a significant challenge remains: how to harness this power within a distributed stream system like Flink, without a complete architectural rewrite?
Flex is designed to be a unified engine that supports both stream and batch processing, leveraging velox to maximize performance. Its goal is to seamlessly handle diverse data workloads, whether they are real-time streams or batch jobs, with high efficiency and scalability.
In this talk, we will detail our architecture and present benchmarks that isolate the significant performance gains achieved by offloading computation to Velox.

Jacky Lau
Expert in Distributed Computing Engines at Ant Financial, Calcite Committer and Flink Contributor
Hardening Velox-Powered Engines for AI/ML: Systematic Cross-Engine Testing at Scale
At Meta, Velox powers a diverse set of compute engines that drive innovation across AI/ML lifecycles, analytics, and large-scale data processing. From high-performance query engines like Prestissimo, to advanced integrations such as PySpark on Velox and Gluten, Velox is at the core of our most ambitious AI and data infrastructure initiatives.
Operating at Meta scale brings unique challenges in ensuring data reliability, correctness, and performance across heterogeneous engines and AI/ML optimized storage systems. In this talk, we’ll share how we systematically validate and harden these platforms to meet the demands of modern AI/ML workload.
Join us to learn:
How to efficiently validate end-to-end data paths across multiple Velox-powered engines and advanced storage features.
How synthetic, production-like data enables efficient, high-signal, privacy-compliant testing.
Advanced techniques such as data fuzzing and comprehensive testing matrices to proactively catch regressions in reliability, correctness, and performance.
Whether you’re building AI/ML platforms or scaling data infrastructure, this session will equip you with practical strategies and insights to enhance reliability and performance in your own Velox-powered systems.

Eric Liu
Software Engineer at Meta
Solving Window Operator OOM in Gluten Using Velox
This topic presents how we addressed window operator OOM issues in Gluten by adopting a streaming-based approach with Velox. By processing data incrementally and managing memory more efficiently, we resolved stability challenges in real-world customer scenarios

Ke Jia
Software Engineer at Intel Shanghai