Visual search is nothing new these days

samiaseo55 · Post by **samiaseo55** » Thu Mar 27, 2025 7:10 am

Just take a picture, upload it to Google Lens, and the tool will find you the perfect match. For example, if you see a nice dog bed while visiting your neighbor, the same product will immediately appear in the SERP based on the picture.

But Google is going even further and introduced search from its own videos in 2024. How does it work?

You are recording a video.
You ask directly in the video.
After you insert a video into Google Lens, you get a response.
Let's say your coffee maker isn't working. You business owner data record a short video about it, asking if it can be repaired. Google's advanced features analyze the video and give you an answer right away.

The so-called vision language models (VLM) are becoming increasingly widespread, but at the same time other methods of multimodal search are also being added to them:

text search on image,
image search for text,
search among images.
Imagine you sell women's clothing. A user takes a photo of a floral pattern and adds a query to the image: floral dress in this style with blue cornflowers . If you have well-crafted, clear visuals with optimized product descriptions, your products will be displayed to the customer during this type of multimodal visual search.