It might look like a very bad GAN but I am quite astonished that it worked at all:

These are two GANs that try to invent their own "language". Both of them learn only from each other, the center image is part of how model A encodes the input, on the right is model B's output.

But admittedly it is not really what I was hoping the intermediate image would look like. The plan was that model A learns some kind of a semantic map in an unsupervised fashion, but it seems it has learned to encode the data in a more "compressed" form.
So I think this is a bit like that Facebook experiment where they "had to turn it off". Well, I will let it train for a bit longer, maybe they will learn some meaningful labels after all.
Both models are shallow ResNets derived from #Pix2PixHD. I first tried #pix2pix UNets, but there the models learned to cheat very quickly and just abused the first skip connection to pass the information almost uncompressed.
Oops, I just noticed a logic error in my code. So the intermediate representation in this case was more like "some-lukewarm" than "one-hot". 😳
Let's try this again. This time making sure that the representation is binary and adding some blur to prevent that they pass information in the pixels. It will take some time, but this already looks more like what I had in mind:

The encoder model did not make use of the 32 dimensions of the one-hot vector and put pretty much the same information in all the dimensions. Now I try to discourage that with what I call "VarietyLoss" (it probably has a different name).

Seems to work somewhat.

@Quasimondo how are you initiating the training? with random data or grey?