Guy Yariv - Generative AI Researcher

Guy Yariv

I am a Senior Researcher at Wix and a PhD candidate in Computer Science at the Hebrew University of Jerusalem, supervised by Yossi Adi and Sagie Benaim.

I spent the winter of 2025 and the summer of 2024 as a Research Scientist Intern at Meta (Meta Superintelligence Labs / GenAI) and worked as an AI Researcher at Spot by NetApp from 2022 to 2024.

My research focuses on generative AI, pushing the boundaries of what generative models can create and how they can be controlled.

Publications Contact

Email Google Scholar Semantic Scholar Github Twitter LinkedIn

Publications

DyPE.

DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion

Arxiv Preprint

Noam Issachar*, Guy Yariv*, Sagie Benaim, Yossi Adi, Dani Lischinski, and Raanan Fattal

DyPE lets pre-trained diffusion transformers generate ultra-high-res images (16M+ px) without retraining or extra cost, by matching positional encoding extrapolation to diffusion's shift from low-freq structures to high-freq details.

Project Page arXiv Code

RewardSDS: Aligning Score Distillation via Reward-Weighted Sampling

LoViF @ CVPR 2026

Itay Chachy, Guy Yariv, Sagie Benaim

Introducing RewardSDS, a text-to-3D score distillation method that enhances SDS by using reward-weighted sampling to prioritize noise samples based on alignment scores, achieving fine-grained user alignment.

Project Page arXiv

TTM.

Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation

CVPR 2025

Guy Yariv, Yuval Kirstain, Amit Zohar, Shelly Sheynin, Yaniv Taigman, Yossi Adi, Sagie Benaim, and Adam Polyak

We propose Through-The-Mask, a two-stage framework for Image-to-Video generation that uses mask-based motion trajectories to enhance object-specific motion accuracy and consistency.

Project Page arXiv

vLMIG's method.

Improving Visual Commonsense in Language Models via Multiple Image Generation

Arxiv Preprint

Guy Yariv, Idan Schwartz, Yossi Adi*, and Sagie Benaim*

We improve large language models' visual commonsense by generating multiple images from text prompts and integrating them into decision-making via late fusion, boosting performance on visual commonsense reasoning and NLP tasks.

Project Page arXiv Code

TempoTokens.

Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation

AAAI 2024

Guy Yariv, Itai Gat, Sagie Benaim, Lior Wolf, Idan Schwartz*, and Yossi Adi*

We propose a method to generate realistic, audio-aligned videos by adapting a text-to-video model with a lightweight adaptor.

Project Page arXiv Code

AudioToken.

AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation

InterSpeech 2023

Guy Yariv, Itai Gat, Lior Wolf, Yossi Adi*, and Idan Schwartz*

We adapt text-conditioned diffusion models for audio-to-image generation by encoding audio into a token compatible with text representations.

Project Page arXiv Code