I need to admit that PhD is really harder, than what I expected ... 😰

Amirshayan Nasirimajd

Hello! I am Shayan, and I am currently a PhD student at the Tübingen AI Center in Germany under the supervision of Dr. Almut Sophia Koepke, and the co-supervision of Dr. Hilde Kuehne. I previously did research as a fellow at Politecnico di Milano in Italy, where I worked on the EU-funded projects ARISE and ENFIELD. Furthermore, I received my master's degree in Data Science and Engineering from the Polytechnic University of Turin, where I worked on Domain Adaptation and Generalization of Egocentric videos under the supervision of Dr. Giuseppe Averta and Dr. Chiara Plizzari.

Email CV Scholar Twitter GitHub

News

Jan, 2026 I am starting my PhD at Tübingen AI center.
Oct, 2025 SeqDG accepted at the Pattern Recognition Letters journal.
Apr, 2024 Completed my MSc in Data Science & Engineering at Politecnico di Torino.
Jun, 2023 Won the EPIC@CVPR (EPIC-KITCHENS-100 UDA) challenge — oral at CVPR 2023.

Research

Currently, my research interests lie in video understanding, multimodal learning, and robotics. Most of my research focuses on the role of different modalities besides vision and language in better understanding long-term videos.

Sequential Domain Generalisation for Egocentric Action Recognition

Amirshayan Nasirimajd, Chiara Plizzari, Simone Alberto Peirone, Marco Ciccone, Giuseppe Averta, Barbara Caputo

Accepted at the Pattern Recognition Letters Journal, 2025

Paper Project Website

Recognizing human activities from visual inputs, particularly through a first-person viewpoint, is essential for enabling robots to replicate human behavior. Egocentric vision, characterized by cameras worn by observers, captures diverse changes in illumination, viewpoint, and environment. This variability leads to a notable drop in the performance of Egocentric Action Recognition models when tested in environments not seen during training. In this paper, we tackle these challenges by proposing a domain generalization approach for Egocentric Action Recognition. Our insight is that action sequences often reflect consistent user intent across visual domains. By leveraging action sequences, we aim to enhance the model's generalization ability across unseen environments. Our proposed method, named SeqDG, introduces a visual-text sequence reconstruction objective (SeqRec) that uses contextual cues from both text and visual inputs to reconstruct the central action of the sequence. Additionally, we enhance the model's robustness by training it on mixed sequences of actions from different domains (SeqMix). We validate SeqDG on the EGTEA and EPIC-KITCHENS-100 datasets. Results on EPIC-KITCHENS-100 show that SeqDG leads to +2.4% relative average improvement in cross-domain action recognition in unseen environments, and on EGTEA the model achieved +0.6% Top-1 accuracy over SOTA in intra-domain action recognition.

Sequential Domain Generalisation for Egocentric Action Recognition

Amirshayan Nasirimajd

Master's Degree Thesis, 2024

Webthesis · PoliTo

In this thesis, we present Sequential Domain Generalisation (SeqDG), a reconstruction-based architecture to improve the generalization of action recognition models. This is accomplished through the utilization of a language model and a dual encoder-decoder that refines the feature representation.

Mixed Sequences Prediction teaser figure

EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge: Mixed Sequences Prediction

Amirshayan Nasirimajd, Simone Alberto Peirone, Chiara Plizzari, Barbara Caputo

IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop, 2023

CVPR Oral Presentation arXiv

The winner of the EPIC-Kitchens-100 Unsupervised Domain Adaptation (UDA) Challenge in Action Recognition. Our approach is based on the idea that the order in which actions are performed is similar between the source and target domains. Based on this, we generate a modified sequence by randomly combining actions from the source and target domains. As only unlabelled target data are available under the UDA setting, we use a standard pseudo-labeling strategy for extracting action labels for the target. We then ask the network to predict the resulting action sequence. This allows us to integrate information from both domains during training and to achieve better transfer results on target. Additionally, to better incorporate sequence information, we use a language model to filter unlikely sequences.