Entity/Vector Embeddings
Overview
The Entity embeddings feature of the MemberJunction framework facilitates automatic vector embeddings for all entities within the architecture. It integrates an existing embeddings model -- OpenAI's ada-02 -- to generate and retain vector embeddings for every record associated with each entity.
Vector Embeddings
Vector embeddings are numeric depictions of text that encode both the semantic and syntactic details of the text. These embeddings find utility across a spectrum of applications including mathematical modelling, facilitating comparisons, conducting similarity searches, detecting duplicates, clustering, and beyond.
The key concepts involved in Vector Embeddings are discussed below:
Vector Database
This database stores and queries vector embeddings. Member Junction uses Pinecone database, renowned for its optimization for vector operations and scalability. The database is connected with the MemberJunction framework and allows the user to perform various tasks with the embeddings, such as similarity search, duplicate detection, clustering, and more.
Entity Documents
Entity Documents in Member Junction are templates that define how to generate a text document for each record in a given entity. The templates specify the fields and values to be included in the document, as well as the format and style of the document. The templates can be written in a domain-specific language that is easy to use and understand.
Embeddings Model
The Embeddings Model is a pre-trained neural network model that can convert text documents into vector embeddings. The model has been chosen from a list of available models, such as OpenAI's ada-02, that have been proven to produce high-quality embeddings for various domains and tasks. The model can be hosted on a cloud service or a local server, depending on the preference and budget of the user.
Updated 1 day ago