The previous article I share mentioned that unstructured data are usually store in vector database. I quickly google to search for opensource database, to have a look at how vector database look like. And I found PineCone.
In vector database, data are store in index, which is different from normal database where data is store in table.
From the example given, I manage to create 1 index, and load some sentences into the index.
As we can see, the search results list the sentences I load into the index. We can see there is score at every sentences. According to Chatgpt, the sentences(data) are converted into vectors (embeddings).
- When you search, your query is also turned into a vector
- The system compares your query vector with stored vectors
đŸ‘‰ The score is the result of that comparison.
The score depends on the similarity metric used:
Example:
1. Cosine Similarity (most common)
2. Dot Product
A vector database score is simply:
A numerical measure of how close your query is to stored data in vector space
No comments:
Post a Comment