The fuse algorithm was inspired by gnu-emacs dissociation mode and the movie Society. There are several variant forms, but they share this basic idea: start with a set of scanned source images, break them into many small tiles (typically 25x25 pixels), and then build an output image by assembling the tiles in a way that is locally visually coherent.
All the variants are built from the same fundamental operation: associative image matching. This match function takes a key image tile and searches the source images for a matching tile (minimizing sum of squares of pixel difference). how matching works.
So far I have three variants. Most of the high quality images on my web page are made by growing the output image from the center out. The output starts all black, then one seed tile is blitted to the center. The output image grows by picking tiles on the edge, half in the output image (the overlap) and half in blackness. The source tile that matches the overlap best is blitted into place, thus extending the image.
The second variant tries to reconstruct a template image from the source tiles. As such, the match function tries to minimize both the difference between the tile and its context, and between the tile and the template. A blending function combines these two optimization criteria. The face made out of parts images were made like this.
The third variant is used by the bomb program. It doesn't grow from a seed, but uses whatever is in the framebuffer and continuously modifies it. It picks a random tile on the screen, finds the best match for it, and blits the match to the screen. The result has a bit of positive feedback to it, and has a tendency to go all white or all black. Basically, it still needs a lot of work.
The fuse images are all built by associative image matching. This match function takes a key image tile and searches the source images for a matching tile (minimizing sum of squares of pixel difference).
The match for a key tile is found by picking a collection of random candidate tiles (say 10000) in the source images, comparing each of them to the key, and returning the most similar tile. This allows easy speed/quality trade-off. Similarity is measured by subtracting the two tiles, then summing the squares of these differences (In RGB space. Perhaps better would be to weight intensity) (this is called diff-dot (what's the real name?)).
In order to improve the memory access pattern, the tiles are compared in scan-line order. Typically the number of candidates is larger than the height of the source images (say n times larger). So we still get a good distribution just by looping over the scan lines, and picking n tiles on each line.
To run much faster, the interactive version uses a multi-level algorithm: all the source images are prefiltered down by 5, and the search takes place with 5x5 tiles instead of 25x25 ones. The top few matches are saved, and compared again at full resolution to find the `best'. Of 10000 candidates currently it saves 10000/25 = 400 tiles for comparison at full rez.