With the renewed interest in AI, vector search is becoming popular again. LLMs and vector search rely on text embeddings — converting concepts into vectors.
Instead of keyword matching, vector search uses vector similarity to find relevant content. It uses a vector space model to represent the documents as points in an N-dimensional space. Words closer to each other in this space are more relevant, and documents with similar words are more relevant. This allows searching for concepts and phrases instead of exact words. Models like GPT-3 have large embeddings (12288 dimensions)!
You can use vector search for
- text similarity (clustering)
- text search
- code search
- image search
Most search engines (like Google) incorporate vector search somewhere in the pipeline, but you still usually need other traditional search methods in addition. However, it’s been interesting to see the tangential advancements in vector search due to higher-dimensional embeddings.
There’s a lot of new (and old startups) in the space: