Replica Overview

Overview of Vendor.com's Replica offerings- Stock Replicas and Personal Replicas, all powered by the Phoenix AI model. Get tips on how to create the perfect replica, and how to get a high quality output.

A Replica is a realistic video model of a human created using the Phoenix Model. The Phoenix model is a fully-synthetic 3D based model that generates realistic replica videos from just a script, complete with natural face (lip, cheek, nose, chin) movements and expressions synchronized with your script and generated voice. Developed by our team, the model uses a novel approach that bypasses traditional methods and constructs dynamic, three-dimensional facial scenes using neural radiance fields (NeRFs).

Replicas are created using just 2 minutes of training data, and are designed to learn how someone speaks and sounds, how they look, and how they move their face while speaking. Using a Replica you can generate hyper-realistic videos that look and sound just like you- from just text, in up to 30 languages.

It's important to provide a high-quality input video in order to get great outputs from a Replica. Your Replica will attempt to mimic your gestures and movements, as well as your accent, even if you generate a video in a different language.

Here's an example of an output from one of our Stock Replicas:

Stock Replicas

High-quality, diverse selection
Available immediately
Can be used for majority of use-cases

Discover Our Stock Replicas

Arjun - Courtyard

Arjun - Office

Destiny - Courtyard

Destiny - Home Office

Jimmy - Office

Lucy Women

Nathan - Conference Room

Aaliyah - Office

Laura - Office

Personal Replicas

High-quality clone of voice and face of person
Train once, and re-use endlessly without having to record again

Personal Replicas allow you to train a new Replica of a human using the Phoenix model, from just 2 minutes of training data. Personal Replicas take between 4-6 hours to train. You can only train Replicas using training data that has a verbal consent statement. Personal Replicas go through Voice and Face ID checks to ensure consent is present.

Learn how to create a high-quality personal replica with just a few minutes of training data.

Getting Started with Your Personal Replica

Personal Replicas allow you to train a new Replica of a human using the Phoenix model, from just 2 minutes of training data. Personal Replicas take between 4-6 hours to train, and are available on all plans except for Starter.

Create a Replica via the UI (Avatar dashboard)

You can create a Replica via the Avatar dashboard. Navigate to the Replicas tab in our portal. Here, you'll be able to record in app or upload footage to create a new Replica.

Recording Your Training Footage

Your journey to creating a personal Replica begins with a simple requirement: a two-minute video of you engaging with the camera. There is no predefined script beyond the consent statement, you can discuss anything that showcases your natural speaking style and expertise.

Tips for Success

Our platform simplifies the first step. Use your webcam through the developer portal to capture the essence of your persona. Achieving the best possible Replica involves attention to detail. Here's how:

Do: Utilize high-definition recording equipment, ensure proper lighting, and maintain focus on your face and upper body. Aim for a quiet, well-lit setting, and speak naturally. See more in Best Practices & Examples.
Don't: Wear clothes that blend with the background, bulky accessories, or any headwear that obscures your face. Keep your gaze steady, minimize background distractions, and avoid excessive movement.

Here's an example of high quality training footage:

Consent and Others

An integral part of the process involves reading a specific authorization phrase. This step confirms your consent and kicks off the Replica creation process.

“I, [FULL NAME], am currently speaking and give consent to Tavus to create an AI clone of me by using the audio and video samples I provide. I understand that this AI clone can be used to create videos that look and sound like me.”

We currently accept consent statements in any of our supported languages. You can see the supported languages here..

How to Act

Gaze: Keep eye level with the camera, maintain relatively stable eye contact.
Gesturing: Avoid crossing your hands in front of your face and limit gestures.
Tone: Aim for an upbeat tone to keep the content positive and engaging.
Mistakes: Perfection in reading the script isn't required. Continue naturally if you stumble.
Lips: Close your lips during pauses (the script will remind you of this).

Recording Format

If you are uploading training footage, it's important that it is in the correct format:

Format and Quality: MP4 format is required, with a resolution up to 4K and a size limit of 750 MB. NOTE: Tavus accepts up to 4k for resolution, however more common webcam resolutions (such as 720p/1080p) are also known to produce excellent replicas.
Content Authenticity: Provide unedited, raw footage for the most genuine Replica creation.

Train in Chosen Language

We highly recommend the full training to be done in the language you are most likely to use for the generated videos. This does not prohibit future videos from being created in a different language if desired!

Training Time & Next Steps

Your replica will be processed in the background upon submission. This process will take around 4-6 hours. If you're not happy with your personal replica, be sure to contact us.

Language Support

Vendor.com enables the creation of videos in a multitude of languages, expanding the reach of content globally. When you input a script in any of the supported languages, the resulting video features your replica articulating the message in that specific language.

For example, by providing a script in Spanish, as shown in the example below, your replica will deliver the content in Spanish, mirroring natural language nuances and expressions. You can even mix and match languages in the same script.

Please note that the voice cloning model attempts to maintain your accent even whilst speaking a different language. This can sometimes result in, for example, an American Accent while speaking Spanish.

Languages We Support

🇺🇸 English (USA)
🇬🇧 English (UK)
🇦🇺 English (Australia)
🇨🇦 English (Canada)
🇯🇵 Japanese
🇨🇳 Chinese
🇩🇪 German
🇮🇳 Hindi
🇫🇷 French (France)
🇨🇦 French (Canada)
🇰🇷 Korean
🇧🇷 Portuguese (Brazil)
🇵🇹 Portuguese (Portugal)
🇮🇹 Italian
🇪🇸 Spanish (Spain)
🇲🇽 Spanish (Mexico)
🇮🇩 Indonesian
🇳🇱 Dutch
🇹🇷 Turkish
🇵🇭 Filipino
🇵🇱 Polish
🇸🇪 Swedish
🇧🇬 Bulgarian
🇷🇴 Romanian
🇸🇦 Arabic (Saudi Arabia)
🇦🇪 Arabic (UAE)
🇨🇿 Czech
🇬🇷 Greek
🇫🇮 Finnish
🇭🇷 Croatian
🇲🇾 Malay
🇷🇺 Russian
🇸🇰 Slovak
🇩🇰 Danish
🇮🇳 Tamil
🇺🇦 Ukrainian

Best Practices & Examples

Set Up: Environment

🌞 Lighting

Ensure your face is evenly lit with no shadows.

Example: If a window casts shadows on your face, change your orientation or use a ring light to even it out.
A large diffuse light will work best, providing consistent even and neutral lighting for the entire face.
This helps Phoenix to properly map your face, resulting in a better-looking video overall.

🔊 Noise

Your space should be silent or almost silent.

Avoid noise from air conditioning, construction, traffic, refrigerators, and conversations.
Choose rooms with minimal reverb to prevent sound amplification.
Clean audio, free from background noises, will produce the best audio output for your replica.

🌆 Background

Keep your background clear.

Remove moving objects.
Ensure no other people are visible in the video.

Set Up: Equipment

📷 Camera & Placement

Use a high-quality camera with at least 2K pixels.

Examples: DSLR, newer laptops, iPhones, Samsung Galaxy, or Google Pixel.
Frames per second: Optimal FPS is 30, but 24-60 FPS is acceptable.
Distance: Maintain a distance of 3ft-6ft (or 0.9m-1.8m) from the camera.
Level: The camera should be at eye level.
Lens: Ensure the camera lens is clean of smudges.

🎙️ Microphone

Start with your phone or computer's microphone.

Remove moving objects.
For external USB or XLR mics:
- Place the mic 1ft (0.3m) from your mouth, not exceeding 2-3ft (0.5-0.9m).
- Position the mic at least 1 inch below your chin to avoid blocking your mouth.
Wireless earbuds, like Apple AirPods or Samsung Galaxy Buds, are not recommended due to poor mic quality.

👾 Software

Disable any software-based audio enhancements.

Turn off compressors, equalizers, noise suppression, etc., as we perform our own sound processing post-recording.

Set Up: Yourself

👀 Gaze

Maintain eye level with the camera and act naturally.

🗣️ Speaking Vibe & Pace

Be yourself and relax.

Pace: Take your time, don't rush.
Pausing: Close your lips during pauses (the script will remind you).
Tone: Aim for an upbeat tone to keep content positive and engaging. Keep continuous eye contact with the camera. Be animated in your mouth, eyes, and cheeks.
Gestures: Keep hand gestures to a minimum and avoid blocking your face.
Mistakes: If you stumble, continue speaking. Perfection isn't necessary.

🎅 Accessories & Beards

If possible, avoid beards, glasses, and accessories.

Our model is still being refined to better process these elements.

This comprehensive guide ensures you capture the highest quality footage for your replica, leading to a more authentic and engaging digital representation.

Quality Checklist

Review the following checklist to ensure your video recording is optimized for use as training footage for a Tavus digital replica.

Environment

Face is evenly lit with no shadows.
Space is silent or almost silent. No background noise, echo, or reverb.
Background is clear. No moving objects, no other people.
Good lighting contrast

Video File

Format is one of the following:
- mp4 with h264 video codec and aac audio codec
- webm
Maximum size is 750MB.
Duration is at least 1 minute. (Between 1.5 to 2 minutes is optimal.)
Consent clause is included.

Camera

FPS is 30 for optimal result. 24-60 FPS is acceptable.
Resolution is 720p at minimum.
Video appears clear.
User takes up a minimum of 25% of the screen, and the full head is in frame.
The camera is at eye level.
No visual artifacts. Camera lens is clean of smudges.
Not recorded from Zoom app or other video call apps.

Microphone

No mic blocking the user's mouth.
Clearly audible audio.
Not recorded with wireless earbuds, like Apple AirPods or Samsung Galaxy Buds.

User

Face is directly looking at the camera consistently.
No rushing; natural speaking and pace.
User pauses for 0.5 seconds and closes lips fully at the end of each sentence.
No hand gestures or blocking the face.
Minimal accessories, e.g., no glasses or hat.
No hair blocking the face.
Minimal movement. Avoid head movement, jolts, turns, etc.
No high-collar shirt covering neck (eg turtleneck)