Creating highly specialized imagery without access to $1,000,000 in scientific equipment
UPDATE: I now have a scanning electron microscope in my garage.
I have been fascinated by electron microscopy since I was a child flipping through Popular Science and stumbling upon an image like this:
I love using things for things they were never intended for. So, what if I used an electron microscope to make landscape images?
My personal scanning electron microscope was still in transit across the Atlantic Ocean, so until it arrived I had to get creative. Without an electron microscope handy, I instead trained a Stable Diffusion LoRA on scanning electron microscope images and used that LoRA to perform style transfer on landscape photographs.
My process was:
I started by getting as close as possible to a typical scanning electron microscope image using the base SDXL model. This gave the prompting style to be used during LoRA training and ultimately in image generation.
The base model was trained on scanning electron microscope imagery, so I obtained a decent approximation with the following prompting style:
scanning electron microscope image of the head of a fly, smooth surface, grooves, sharp edges, striated surface, wide angle, high resolution, 4k, rule of thirds, masterpiece, monochrome
Not great, but a good enough starting point for LoRA training.
The success of LoRA training is dependent upon the quality of the data used for training. High resolution scanning electron microscope imagery is readily available:
However, captions are non-existent. I needed to create highly detailed captions for each of the sample images. In addition, I planned to use this LoRA in an unconventional way; instead of generating typical scanning electron microscopy images (i.e. close-ups of surfaces or small objects), I would be generating landscape photographs. To best achieve this goal, I needed to “trick” the LoRA during training by convincing it that it was seeing landscape images.
To do this I wrote deceptive captions for the training images, for example:
wide angle landscape photo of a sparse forest with three tall bare tree trunks and numerous smaller bare tree trunks in between, rocky surface beneath, top down view, black and white, monochrome, scanning electron microscope style
This steered the LoRA in the unusual direction I wanted: macroscopic compositions in a microscopic context.
With the training data prepared, I used Kohya’s GUI to generate the LoRA. After considerable experimentation, the following settings worked best:
With 50 training images, 20 repeats, a batch size of 1, and 15 epochs (saving a checkpoint at each epoch) I trained for a total of 15,000 steps (with a checkpoint saved every 1000 steps). This gave a wide cross-section of checkpoints with varying degrees of training. It is not necessarily the case that more training produces a better result, overtraining is always a risk.
Once I had 15 epochs, I determined which one worked best. To do this I generated a grid of test images using the prompting style developed earlier:
wide angle landscape photo of a hill silhouetted against a black background, sharp edges, striated surface, side view, high resolution, 4k, rule of thirds, masterpiece, black and white, monochrome, scanning electron microscope style
Overtraining artifacts were first visible at epoch 13. Epoch 13 produced the best results.
Instead of generating landscape images using text prompts, I wanted to use famous landscape photographs to guide image generation.
I started with perhaps the most famous landscape photo of all time, Windows XP’s default wallpaper, bliss.jpg:
I could imagine an electron microscope image of a surface with a topology similar to this, but the clouds in the sky would be hard to deal with. I simplified the image in Photoshop:
I fed this simplified image through ControlNet to guide image generation. In this case I used the Canny preprocessor to isolate the edges of the reference image:
Not bad, but it was missing some characteristics of true electron microscopy imagery: diffuse glow, noise, and that iconic black bar with the measurement details along the bottom. I added all of this in Photoshop:
I applied this workflow to photos of other famous locations.
Half Dome, as photographed by Ansel Adams:
Laila Peak:
This workflow is not limited to photos. Here is Hokusai’s “The Great Wave off Kanagawa” woodcut:
When visiting Napa Valley in California I felt obligated to track down the actual location of bliss.jpg.
Here is what it looked like when I visited in the summer of 2016: