Artificial intelligence (AI), right now, is pretty darn good at two things: logic decisions, and pattern recognition. Through machine learning, where an AI’s programming allows it to repeatedly teach itself based on a selection of inputs, we now have software that can recognize personality traits by just observing someone’s eye movements.
It also brings us delights such as novel fiction, including Harry Potter and the Portrait of What Looked Like a Large Pile of Ash. Now, thanks to another research group, we have a system wherein text you type in transmogrifies into a series of images. The objective is noble; the results are, well, a freaky work-in-progress with surprisingly deep implications.
You can try it out for yourself first. Click here, and you’ll be taken to a prototypical example of the AI. Type in a basic sentence describing an object, like “a bunny with a purple ear and a missing eye,” and as you do, you’ll see the AI attempt to come up with a version of that image. This particular attempt brings up a cauliflower-like collection of fluff that looks like a rabbit inside a wormhole.
In fact, while all entries turn out quite haphazard, animals generate particularly strange outcomes. “A cat that has an eye patch and a blue mouth” just throws out a cat paw covered in fragments of cat mouth.
This AI has been uploaded to the web courtesy of Cristobal Valenzuela, a researcher at New York University that builds free-to-use machine learning tools for anyone who may put their hand to them. The AI software itself, though, was based off the work of a team led by Lehigh University, who hoped to build an algorithm that could improve the ability of machine learning programs to recognize and understand images.
This Attentional Generative Adversarial Network, or AttnGAN, has some impressive results displayed in its arXiv paper. It’s demonstrated how an evolving sentence, like “this bird is red with white and has a very short beak” can produce a series of sequential images that match each part of the sentence as it’s typed out.
GANs are all the rage in the world of AI. Rather than have a single network learning to identify or manufacture images over time, GANs tend to use two networks, wherein one generates a fake image, and another tries to tell if the images are fake or real. This allows for a more fluid “conversation” between the two in order to accelerate learning, and to make it more precise.
As the team itself notes, “automatically generating images according to natural language descriptions is a fundamental problem.” The authors explain that a common approach for GANs is to use the entire sentence or text file to work out what image is being asked for, which has so far got mixed results.
They decided to take a different approach, with each sliver of the sentence being analyzed as it is being typed out. This is unimaginably complex; trying to essentially replicate what the human mind does in the exact same situation as it reads over a description of something requires an enormous suite of algorithms.
Yes, their product isn’t exactly perfect, but it’s an impressive step up. Thanks to their dense forest of mathematics, the software appears to seriously outperform pre-existing similar image-generation AIs, particularly when it comes to fine-grained details.
The general idea here is that recognizing images is one thing. Being able to generate your own? That demonstrates a rudimentary understanding of what makes an image in the first place.
Clearly, there's still a long path to perfection.