Some time ago, it was very popular “ Ants, hey ”, Transfer a person's speech, action and expression to another static picture , Let the face in the static graph make the specified action expression , Based mainly on FOMM(First Order Motion model)
technology . This is already 2
Years ago , In some scenes, the effect is not ideal . In the near future , Tsinghua University team in CVPR2022
Release the latest facial expression movement migration paper Thin-Plate Spline Motion Model for Image Animation
. This article does not specifically talk about the principles of the thesis , Instead, the open source model is directly used down
Come down and use . The effect is as follows :
The first 1 This picture is a still photo , The second picture is gif Drive animation , The third is the generated result .
Purpose of this paper : Package the open source model into a single interface , The reader simply passes in a picture and an animation (gif or mp4), You can generate facial expression migration animation (mp4).
The reader needs to install pytorch
Environmental Science , You can go to https://pytorch.org/get-started/locally/ According to the actual hardware environment , choice GPU or cpu edition .
install imageio-ffmpeg
library , For reading mp4
file .
Readers can skip to the end , Access to the source code , Download the source resource package , Replace the corresponding picture in the folder with your own picture , You can generate... With one click “ Ants, hey ”.
Export the model as pt
after , To create a Model
class , Encapsulate the interface into infer
function , The specific code is as follows :
class Model():
def __init__(self, kp="models/kp.pt", aio="models/aio.pt",
device=torch.device('cpu')):
self.device = device
self.kp = torch.jit.load(kp, map_location=device).eval()
self.aio = torch.jit.load(aio, map_location=device).eval()
def relative_kp(self, kp_source, kp_driving, kp_driving_initial):
source_area = ConvexHull(kp_source[0].data.cpu().numpy()).volume
driving_area = ConvexHull(
kp_driving_initial[0].data.cpu().numpy()).volume
adapt_movement_scale = np.sqrt(source_area) / np.sqrt(driving_area)
kp_new = kp_driving
kp_value_diff = (kp_driving - kp_driving_initial)
kp_value_diff *= adapt_movement_scale
kp_new = kp_value_diff + kp_source
return kp_new
def get_kp(self, src):
src = np.expand_dims(src, 0).transpose(0, 3, 1, 2)
src = torch.from_numpy(src).float().to(self.device)
return self.kp(src)
def infer(self, src, driving, src_kp, init_kp):
src = np.expand_dims(src, 0).transpose(0, 3, 1, 2)
src = torch.from_numpy(src).float().to(self.device)
driving = np.expand_dims(driving, 0).transpose(0, 3, 1, 2)
driving = torch.from_numpy(driving).float().to(self.device)
kp_driving = self.kp(driving)
kp_norm = self.relative_kp(kp_source=src_kp,
kp_driving=kp_driving,
kp_driving_initial=init_kp)
with torch.no_grad():
out = self.aio(src, src_kp, kp_norm)
out = out[0].cpu().numpy()
out = out.transpose(1, 2, 0)
return out
among ,get_kp
Function is used to obtain face key data .infer
Function ,src
Represents a static diagram ,driving
Represents a frame in the dynamic graph ,src_kp
Represent the key points of the static graph ,init_kp
Represents the key point of the first frame in the dynamic graph .
The whole calling process can be split into 4 Step : Create model objects 、 Read every frame of the motion graph 、 Call model 、 Generate frame export mp4
.
As defined above Model
object , Need basis GPU
and CPU
Environmental Science , It is up to the reader to specify the use of specific pytorch
edition , The specific code is as follows .
def create_model(use_gpu):
if use_gpu:
device = torch.device('cuda')
else:
device = torch.device('cpu')
model = Model(device=device)
return model
In the above code ,use_gpu
It's a boolean
type , Used to determine whether to use GPU
edition , Readers can set up according to their own time .
call imageio-ffmpeg
library , Read mp4
or gif
Every frame in the file . The specific code is as follows , Function returns a list , The contents of the list are video frames :
def read_mp4_or_gif(path):
reader = imageio.get_reader(path)
if path.lower().endswith('.mp4'):
fps = reader.get_meta_data().get('fps')
elif path.lower().endswith('.gif'):
fps = 1000 / Image.open(path).info['duration']
driving_video = []
try:
for im in reader:
im = resize(im, (256, 256))[..., :3]
driving_video.append(im)
except RuntimeError:
pass
reader.close()
return driving_video, fps
Because of the constraints of the model , Here, each frame resize
To 256*256
.
Model calls are very simple , Just read every frame of static graph and dynamic graph , And call the first frame of static graph and dynamic graph Model
Class get_kp
Function to obtain key points . Traverse every frame of the dynamic graph , Move frame 、 Static diagram 、 Static graph key points 、 The key points of the first frame of the motion graph are transmitted to Model
Of infer
Function to get the generated frame . The specific code is as follows .
def run(use_gpu, src_path, driving_path):
src = imageio.imread(src_path)
src = resize(src, (256, 256))[..., :3]
driving_video, fps = read_mp4_or_gif(driving_path)
model = create_model(use_gpu)
src_kp = model.get_kp(src)
init_kp = model.get_kp(driving_video[0])
outs = []
for driving in driving_video:
out = model.infer(src, driving, src_kp, init_kp)
out = img_as_ubyte(out)
outs.append(out)
return outs, fps
Continue to call imageio-ffmpeg
library , Assemble video frames into mp4
file , The code is as follows :
def write_mp4(out_path, frames, fps):
imageio.mimsave(out_path, frames, fps=fps)
The whole call pipeline is as follows :
src_path = 'assets/source.png'
driving_path = 'assets/driving2.gif'
frames, fps = run(True, src_path, driving_path)
write_mp4("out.mp4", frames, fps)
Python Learning from actual combat
Expression transfer
, Get the full source code . If you find this article helpful , Thank you for your free praise , Your effort will provide me with unlimited writing power ! Welcome to my official account. :Python Learning from actual combat , Get the latest articles first .