Create Entity Template


To effectively manage multidimensional data and operationalize uniquely embedded models, the following elements are essential within an Entity Document:

1. Entity Document Types

Entity documents can be created for tables as well as individual entities and must contain:

ID

Name Description

CreatedAt, UpdatedAt

Entity Documents

Table: EntityDocumentType

Entity: Entity Documents

ID

Name (unique key)

EntityID

TypeID (Entity Document Type ID)

Status – Pending, Active, Disabled

Template (nvarchar-max) – has the actual template in it

CreatedAt

UpdatedAt

Template is a format like this

This is a document that has markup from fields in the entity, like this: ${fieldName} and can also contain mark-up from other entity documents like this ${EntityDocument(“Document Name”, LinkedField, FieldValue)}

${EntityDocument(“Account_Persons”, “AccountID”, record.ID)}

We plan to upgrade AI Models to incorporate a fresh category of embeddings, thereby offering a selection of one or more embedding models. Furthermore, there's a necessity to introduce a new interface, referred to as IEmbeddings, which AI Models can choose to implement as per their discretion.

Vector Databases

Table: EntityDocumentType

Entity: Entity Document Types

Table: VectorDatabase

Entity: Vector Databases

ID

Name

Description

DefaultURL – base URL For API

ClassKey – key to use for CreateInstance in Class Factory

CreatedAt

UpdatedAt

Vector Indexes

ID

Name

Description

VectorDatabaseID

EmbeddingModelID – a link to the AI Models.ID for the embeddings used for this particular vector.

CreatedAt

UpdatedAt

Entity Record Documents

ID

EntityID

RecordID

Document (nvarchar max) – actual output of the Entity Document template creation – stored here

VectorIndexID

VectorID (nvarchar --- key value for vector in vector database)

VectorJSON – json representation of the actual vector – an array of numbers

RecordUpdatedAt – timestamp that is equal to the UpdatedAt of the Entity Record when the vector was created

CreatedAt

UpdatedAt

Entity Document Runs

ID

EntityDocumentID

StartedAt

EndedAt

Status (Pending, In Progress, Completed, Failed)

CreatedAt

UpdatedAt

this entity will have data in it that tracks the “Runs” for each entity’s vectorization

Vector DB Wrapper

We're going to construct a VectorDatabaseBase class to provide fundamental implementations necessary for utilizing any vector database. The primary aim is to establish an abstraction layer encompassing common operations typically performed in a vector DB, such as enumerating indexes and adding indexes, enabling programmatically feasible operations.

Additionally, we'll create a VectorIndexBase class with basic functionalities essential for managing a specific vector index. These functionalities will include operations such as upsert (update + insert), query, fetch, update, and delete.

We can effectively replicate the necessary operations from Pinecone v and proceed to implement the base class methods, which will serve as placeholders for subclasses to implement. Opting for a base class rather than an interface allows for the inclusion of higher-order methods at the base level. Subsequent subclasses can then focus on API-specific tasks, while the base class encapsulates reusable higher-order functionalities. This approach ensures modularity and reusability across different implementations.

@memberjunction/vectors – We will develop a "vectors" package within MJ, housing the base classes for vector-related functionalities.

@memberjunction/vectors-pinecone – we will implement the sub-classes for Pinecone here

(Note, in MJAI I want to break out our various company-specific classes into similar separate packages so people only import what they want)

We will employ the @RegisterClass decorator to associate specific classes with implementations of vector databases.

We must establish a batch process tasked with identifying new, modified, or deleted records within a specified entity and subsequently updating the corresponding vector. Addressing this involves several pivotal requirements:

To have a Vector support, an entity must have CreatedAt/UpdatedAt fields in it

When we want to run the EntityVectorSync process (another package called @memberjunction/vectors-sync) we will do the following:

Identify all active Entity Documents across the Entities.

For every active Entity Document, generate an Entity Document Run.

This metadata will monitor the advancement of vectorization within an entity.

We will query the base view for the given entity and only include records where UpdatedAt for the entity record is > the Entity Record Document.RecordUpdatedAt field (or where there is NO EntityRecordDocument matching the EntityID and RecordID)

For each record in the resulting set, we will:

Generate a document

Get embeddings from the doc

Save a record in Entity Record Documents containing the document and its corresponding vector (formatted as JSON).

“upsert” the new vector into the vector database using the above methodology

Store the key value from the vector index alongside the document in Entity Record Documents

Additionally, we will search for deleted records by:

Querying the Entity Record Documents for records that exist. In case the RecordID no longer exists in the source entity, we will iterate through those and perform the following:

Delete the entry from the vector database

Delete the entries for entity record documents from our database to synchronize the data.

The outlined process will lead to a synchronized vector store corresponding to any specified Entity or Entity Document.

Note: It is possible to have more than one Entity Document per entity. This flexibility proves useful when employing different documents for distinct purposes, such as duplicate checking versus professional networking, among others.