23–28 Feb 2025
IBS
Asia/Seoul timezone

A Tree-to-Token Paradigm for Jet Physics: Bridging High-Energy Phenomenology and Language Modeling

26 Feb 2025, 13:30
1h
IBS

IBS

Speaker

Masahiro Morinaga (ICEPP, The University of Tokyo)

Description

We propose a novel approach for more advanced analysis of jets, which serve as crucial objects of study in high-energy particle physics experiments. While conventional methods often treat jets as point clouds, our work focuses on the binary tree structure obtained during clustering. It explores ways to handle its constituents (such as tracks) using natural language processing language models. Specifically, we convert the binary tree derived from clustering into a bracketed representation, serialize it into a one-dimensional sequence, and then apply tokenization (quantization) to produce a data format suitable for training Transformer models. In this presentation, we will discuss the generation of the tree structure, the tokenization process, and, if time permits, the results of Transformer-based training, thereby demonstrating the potential of this novel perspective for jet analysis.

Presentation materials