Wut?
Jokes aside the way this shit learns is really only by being told or trained by a dataset. So basically to train the AI you have to take hundreds or thousands of labeled images and compile them into a dataset using markers. So for each image it'll look at the features that are common between all the images in the dataset with the same label.
When shown a new image not in the dataset it will report a percentage that it matches 0 being no match .5 being a partial match and I being an exact match.
It gets more complicated when there are multiple subjects within the image and then it'll try to match sub features with the composite image.
So for example the hamburger ball gag image was viral and it must have been trained on it specifically. But images where the gag is obscured or not large in the frame don't score well and don't get identified.
This is sort if the problem with AI or machine learning. Or perhaps explains why we see faces in tree bark. Or get scared by laundry piled in a chair when it's a darkened room.