CodeMiko Done Cheap - DIY real-time performance capture

A DIY motion capture suit with facial tracking for livestreaming a 3D character (to zero viewers)

UPDATE: I have taken this project much further.

The Idea

Deep in COVID lockdown (December 2020), I stumbled upon a trending clip of a fascinating livestreamer named “CodeMiko”:

I could not believe what I was seeing. Real-time motion capture with face-tracking and live audience interactivity? I had no idea this was possible. I needed to know how she was doing it. And more importantly, I wanted my own.

The Tools

Chordata Motion
Kicad
T-962 Reflow Oven
T-8280 IR Preheater
Atten ST-862D Hot Air Rework Station
Eakins Stereo Trinocular Microscope
Creality Ender 3
Raspberry Pi 3B+
iPhone Xs
Face Cap
OSC
Blender
OBS
NVIDIA GeForce GTX 1080

The Process

I began watching every CodeMiko stream I could. This quickly became an obsession - CodeMiko was creating the most innovative and hilarious content I had seen in years. After the chaotic streams were over, CodeMiko’s creator “Technician” would talk about her struggles with the CodeMiko development process. This provided key details on how she was creating this content:

full-body motion capture via XSens motion capture suit (purchased with a $20,000 USD loan!)
character, environment, and audience interactivity running in real-time via Unreal Engine 4 (UE4)
face-tracking via Face ID-enabled iPhone, using Live Link app to send facial blendshapes to UE4
compositing and streaming via OBS
graphics handled by a NVIDIA GTX 3090

That is a lot of work and equipment, but I can see how it might be achievable on a budget. I’ve released a mediocre VR game built with UE4, so that engine is a possibility for the 3D character and environment - but I’m also familiar with Blender and 3dsmax as alternatives. An iPhone with FaceID is easy enough to get. Crypto is booming so 3090s are hard to get, but I’ve overclocked my GTX 1080 and I’m confident I can squeeze enough performance out of that to make it work. But there is one very big problem:

I’m not willing to spend $20,000 USD on a motion capture suit for a fun side-project.

This left me no choice: I had to make one.

1. DIY Inertial Motion Capture Suit

A quick search reveals that many vTubers achieve full-body motion-tracking by strapping Vive trackers (at least 3, up to 7) onto their bodies and using Vive Lighthouses for outside-in positional tracking. This certainly works (and I already have a Vive system with a tracker) but the limited number of tracking points is very apparent in the jitteriness of the character animation:

I wanted something better.

Another option is using a live-feed from a webcam (sometimes also a Kinect) to estimate pose and position. This suffers from even lower fidelity, and has problems with occlusion:

I wanted something better.

The most complex option is an inertial motion capture suit. Second-hand ones are quite rare (and expensive!) when they do pop up. An exhaustive search of Hackaday, GitHub, Chinese tech forums, and obscure corners of the internet reveals a few dead-end DIY projects. But one project shows hope: Chordata Motion - a 15-point tracking system which utilizes small off-the-shelf IMU sensors, all run from a Raspberry Pi:

The team has shipped assembled units to early backers, the project has a small but active community forum full of technical discussion and support, and everything about the project is open-source. With the gerber files, bill of materials, 3d models (for enclosures), and firmware/software openly published, we have everything needed to build our own inertial motion capture suit.

1.1 PCBs

The Chordata Motion system is built up of two components: a central hub to receive and process positional data, and as many “kceptor” motion trackers as you would like to feed into the hub (15 for full-body tracking).

This means there are two unique PCB layouts we will need to get made. The project team have shared gerber files, but one of the benefits of an open-source project is that the community can make further improvements. In this case, a member of the community (@valor) has made a more compact version of the hub and kceptor:

In addition to 3D-printable enclosures:

Before sending the gerber files off for PCB creation (EasyEDA is my preferred vendor), I checked the forums for any outstanding issues. A couple members have confirmed the designs work, so as a final check I used Kicad to ensure the PCBs would pass the EasyEDA DRC (Design Rule Check - each PCB vendor has specific tolerances and limits which their machinery requires). I modified the PCB layouts slightly to pass DRC, and the finalized gerber files were sent to EasyEDA.

A few weeks later, I had the PCBs:

As well as a stencil to aid with applying solderpaste (highly recommended):

1.2 Electrical Components

The COVID electronics component shortage was in full effect across the planet, meaning many of the electronics components specified in the Bill of Materials would need to be replaced with equivalents. Luckily, the Chordata system uses mostly standard parts with easy substitutions (resistors, capacitors, LEDs, voltage regulators, multiplexers), but there is one extremely important component which presents a significant problem:

LSM9DS1TR

This tiny IMU (inertial measurement unit) chip contains a magnetometer, accelerometer, and gyroscope and is used for orientation and motion detection. This chip is present on every single kceptor tracker, and is the key to the entire operation of the motion capture suit. And it was sold out across the planet, with no known restocking date.

There are no equivalent chips which can be substituted. Somehow I needed to get my hands on at least 15 of these chips (ideally more than 20), so I would have some flexibility during assembly, testing, and future use.

I found a way: salvage.

There are pre-assembled 9DOF sensor modules for use with Arduino that use the LSM9DS1TR as their IMU. These assembled boards are only slightly more expensive than buying the IMU chip separately, but removing the IMU chip presented two challenges:

They are in a 24-LGA (24-pin land grid array) package, which means the solder pads are underneath the chip
They are MEMS (micro-electromechanical systems) devices, which means there are sensitive moving parts inside the IMU chip - extreme temperature changes can damage them

To deal with problem 1, I used an infrared preheater to pre-heat the boards from below. To deal with problem 2 I bought as many spare boards as possible and worked as quickly as possible when desoldering.

The workstation for desoldering of the surface mount chips:

After a few failed attempts, the method used by Louis Rossman gave reliable results (the secret being obscene amounts of flux):

I harvested enough IMU chips to proceed.

1.3 Other Components

The enclosures for the Chordata hub and kceptors were easily printed on an Ender-3:

The Raspberry Pi 3B+ was readily available. An enclosure for it was also printed:

I needed to make cabling of various lengths, some for pin headers, others terminated with RJ45 for communication from the kceptors to the hub. This was easy to achieve with cheap-ish crimping tools:

Everything was powered in a self-contained fashion, using a 5V 2.1A 26800mAh powerbank in a waistbelt:

The Chordata kceptors and hub needed to be attached to my body, with the kceptors kept in the same position along a bone (to ensure the tracking of each bone remained stable). This was achieved by supergluing 3D-printed clips to polyester webbing with anti-slip backing. Velcro sewn to the straps allowed easy attachment/removal:

I moved on to assembly of the electronics.

1.4. Electronics Assembly

The PCBs were designed for the use of SMD (surface mount) components, which are typically soldered using a reflow technique in an oven. Luckily I had a reflow oven, but even without one it would have been possible to solder everything using tweezers and a hot air gun.

The reflow temperature profile was manually entered to match the solderpaste being used. Solderpaste was applied to the PCB using the matching metal stencil, electrical components were manually placed using tweezers. To increase airflow during reflow, the PCB being reflowed was raised using spare PCBs.

The end result was assembled hubs and kceptors:

For redundancy, a total of 2 hubs and 18 kceptors were assembled.

1.5. Suit Assembly

The intended location of each kceptor is given by this diagram:

The full suit was assembled using RJ45 cables to connect all kceptors to the hub:

The hub and Raspberry Pi were velcro’d to the central webbing which is placed around the waist:

The full suit:

The suit is time-consuming to put on, but once everything is strapped in the kceptors remain rigidly in place:

The Chordata team has developed the notochord software which runs on the Raspberry Pi, receiving all of the raw data from the kceptors and performing the sensor fusion required to determine the relative position and rotation of each kceptor. It then transmits this information over WiFi to a receiving computer. Getting this software onto the Raspberry Pi was a simple process.

After many painful hours of troubleshooting communication errors, intermittent connections, failed calibrations, undocumented software bugs, and even more intermittent connections, I finally had the motion capture suit sending tracking data over WiFi using the OSC protocol:

With body tracking solved, I moved on to facial tracking.

2. Facial Tracking

Facial tracking was a much simpler problem to solve, thanks to Apple’s inclusion of a front-facing laser dot projector on their newer iPhones to enable the FaceID authentication feature. By projecting a grid of dots on a face using infrared light, it is possible to track facial expressions.

The app Face Cap has a simple interface and makes it easy to transmit facial blendshapes data using OSC. With an iPhone pointed at my face at all times, I had continuous (and expressive) facial tracking.

The next problem was figuring out how to keep the iPhone pointed at my face. One solution was quite simple: a ski helmet with various GoPro-mount accessories cobbled together.

It’s heavy, but it worked: (for an hour or two before neck pain kicks in)

If I didn’t expect to perform much movement during a motion capture session, I could get by with a desk-mounted iPhone holder (as long as I’m facing forwards):

With the hardware and software on the transmitting side (motion capture suit, iPhone) figured out, I moved onto the software on the receiving side (a desktop PC).

3. Software

My ultimate goal was to have a motion-captured 3D character in a 3D environment streamed live onto the internet in real-time.

Unreal Engine was the obvious first choice to achieve this - it is free, extremely flexible, and is what CodeMiko herself uses. A significant problem however was that receiving tracking data from the Chordata suit was not supported by any plugins, and I would have to develop my own plugin. This would be a significant effort, which explains why the community had not yet developed a solution.

There was an alternative option: the Chordata team maintains a Blender plugin for receiving tracking data from the suit. Blender is fully-featured and free, however it presented one major problem: it is not a real-time rendering engine like Unreal. Instead it is intended for a traditional pre-rendered 3D still/animation pipeline.

What if I could come up with a way to use Blender as a real-time engine?

3.1 Blender

It was relatively straightforward to get the Chordata plugin up-and-running in Blender. OSC messages are sent over WiFi from the Raspberry Pi on the suit, intercepted by Blender, and translated to bone positions and rotations:

After calibrating the suit (a procedure that establishes the baseline position, rotation, and environmental factors for each kceptor), the result was a 3D character directly controlled by the motion capture suit:

The intention was to record animations within Blender using the motion capture suit and then later play back these animations during a render. But this wasn’t required - even without recording an animation, the Blender viewport updates in real-time with the motion capture suit’s pose. By making the Blender viewport look as good as possible within its limitations (simplified lighting, no motion blur, limited anti-aliasing, lower resolution), I was able to use the viewport itself as a real-time renderer:

In theory this solution allowed me to build a full 3D environment for the character, but achieving decent results with an environment required a different viewport configuration than the character.:

I took a hybrid approach to solve this problem: I pre-render the environment, then composite the real-time character on top of the pre-rendered environments. (an approach familiar to anyone who remembers the PlayStation 1 era)

I rendered the environment in a looping animation:

And then modified the viewport to work as a green screen to enable easy masking of the character model:

With these two video layers figured out, I worked on compositing them in real-time.

3.2 OBS

I had two video streams (a pre-rendered video and a screen-captured portion of the Blender viewport), and planned on having multiple audio streams (microphone audio, background music) in the future. I needed to composite all of these streams together, potentially apply effects to particular elements (audio or visual), and stream the results on the internet. Every livestreamer solves this type of problem the same way: OBS.

OBS is the swiss army knife of video and audio production. It composes scenes with multiple sources and records or streams the results live to online platforms. Anything that it can’t do out-of-the-box can be achieved with one of the countless plugins developed by the community. It is also free.

The animated background was a simple looping video as the bottom layer:

For the top layer, I captured the Blender viewport and masked out the character model:

And when put together:

I have accomplished my goal: a real-time motion-captured 3D character in a 3D environment, ready to be streamed or recorded.

Improvements

While this is a satisfying starting point, there are a number of enhancements required to make this into a system capable of producing compelling content.

1. Walking Around (Global Positioning)

If you’re familiar with motion capture systems, you are probably surprised that an open-source DIY inertial motion capture suit is capable of determining translation of the root bone in space. Your surprise is warranted, Chordata is not able to do this (yet). As a result the root bone (the hip) stays anchored in place for the most part (with limited vertical translation made possible by anchoring the foot bones to the floor and using IK). The practical implication of this is that this motion capture system does not work for tracking a character walking around in an environment - it can only capture the “inner pose” (all rotations relative to the root bone at the hip). This limitation makes sense in the context of the Chordata system, which was originally intended for capturing dance poses.

One idea I am exploring is attaching a Vive tracker to the waist of the suit, and using the Vive outside-in tracking system for global positioning of the root bone of the character model.

2. Multiple Camera Angles

With only one pre-rendered background video, I am stuck with a front-view of the character. I could render views of the environment from multiple angles, and adjust the viewport angle of the character model to match.

3. Remote Control

Once I introduce a controllable camera, I need a way for the performer to be able to control it in real-time. This could be achieved with a wireless numpad in the hand of the performer, firing off hotkeys in OBS. This would also allow the performer to trigger effects and other interactivities.

4. Visual Effects

I could add “filters” and other effects to the video layers in OBS (background and character) to spice things up.

5. Additional Characters

I could use the character armature to control multiple characters within Blender, or could make duplicates of the existing characters in OBS, treating the duplicates differently than the main character.

The Result

These clips incorporate the improvements mentioned above, as well as others described in detail in the follow-up to this post.

To show the flexibility of my approach: by swapping out the background video, changing the camera framing, and swapping out the character model I can achieve an entirely different type of scene: