Showcases
| |
Reference Video |
Generated Video 1 |
Generated Video 2 |
Generated Video 3 |
| Butterfly |
|
|
|
|
| Angle Wings |
|
|
|
|
| Artistic Clay |
|
|
|
|
| Baby Me |
|
|
|
|
| Anime Couple |
|
|
|
|
| Venom |
|
|
|
|
| Blaze |
|
|
|
|
| Disintegration |
|
|
|
|
| Flow into Minecraft |
|
|
|
|
| Freezing |
|
|
|
|
| Invisible |
|
|
|
|
| Jellycat |
|
|
|
|
| Medusa |
|
|
|
|
| Poke |
|
|
|
|
| Soul_Jump |
|
|
|
|
| Thunder_God |
|
|
|
|
| Crush |
|
|
|
|
| Earth Fly Away |
|
|
|
|
| Garden_Bloom |
|
|
|
|
| Judge |
|
|
|
|
|
Generalization of Out-Of-Domain Data
| |
Reference Video |
Generated Video 1 |
Generated Video 2 |
Generated Video 3 |
| Boxing Punch |
|
|
|
|
| The Flash |
|
|
|
|
| Tiger Snuggle |
|
|
|
|
| Fire Breathe |
|
|
|
|
| Jelly Drift |
|
|
|
|
| Floral Eyes |
|
|
|
|
| Magic Hair |
|
|
|
|
| Shark |
|
|
|
|
| Burst Into Tears |
|
|
|
|
|
Comparisons with baseline methods
| |
Crumble |
Dissolve |
Harley |
Squish |
| Ours |
|
|
|
|
| Omini-Effect |
|
|
|
|
| VFXCreator |
|
|
|
|
|
How Does it Work?
VFXMaster is a unified reference-based cinematic visual effect~(VFX) generation framework that can reproduce the intricate dynamics and transformations from a reference video onto a user-provided image. It not only shows outstanding performance on in-domain effects, but also strong generalization capability on out-of-domain effects.
Abstract
Visual effects (VFX) are crucial to the expressive power of digital media, yet their creation remains a major challenge for generative AI. Prevailing methods often rely on the one-LoRA-per-effect paradigm, which is resource-intensive and fundamentally incapable of generalizing to unseen effects, thus limiting scalability and creation. To address this challenge, we introduce VFXMaster, the first unified, reference-based framework for VFX video generation. It recasts effect generation as an in-context learning task, enabling it to reproduce diverse dynamic effects from a reference video onto target content. In addition, it demonstrates remarkable generalization to unseen effect categories. Specifically, we design an in-context conditioning strategy that prompts the model with a reference example. An in-context attention mask is designed to precisely decouple and inject the essential effect attributes, allowing a single unified model to master the effect imitation without information leakage. In addition, we propose an efficient one-shot effect adaptation mechanism to boost generalization capability on tough unseen effects from a single user-provided video rapidly. Extensive experiments demonstrate that our method effectively imitates various categories of effect information and exhibits outstanding generalization to out-of-domain effects. To foster future research, we will release our code, models, and a comprehensive dataset to the community.
BibTeX