📌 What You’ll Need

  • A Postgres database with data you’d like to embed (you can use Supabase for a free account).
  • An account with Pinecone (you can sign up for a free account here).
  • Postgres credentials including host, database name, port, username, and password.
  • A Pinecone API key.

Note: Make sure you have all the necessary permissions to connect to your databases and generate API keys.

🚀 Getting Started

Step 1: Access the New Sync Form

Signup for a free (or paid) account on Embedding Sync.

Navigate to the “New Sync” tab in the Embedding Sync dashboard to get started.

Step 2: Database Connection Details

Fill in the following fields related to your Postgres database:

  1. Host: The address of your database server.
  2. Database: The name of your Postgres database.
  3. Port: The port number on which your database is running.
  4. Username: Your Postgres username.
  5. Password: Your Postgres password. (Note: We store this securely using AES256 industry-standard encryption.)

If you’re using Supabase, you can find this info in Project Settings on the sidebar > Database > Connction Info.

Step 3: Embedding Details

Now, specify the data you want to embed:

  1. Table Name: The database table containing the data to be embedded.
  2. Embedding Column Name: The specific column you’d like to use for generating embeddings.
  3. Primary Key Column Name: The column that serves as the unique identifier for each row.
  4. Updated At Column Name: The column used to determine newly added or updated rows to sync. If a previously embedded row is updated, the new value will be updated in the vector store overriding the previous value.
  5. Pinecone Key: Your Pinecone API key. Once logged into Pinecone, go to API Keys in sidebar > + Create API Key.
  6. Embedding Model: Choose the embedding model you’d like to use. Options include OpenAI’s text-embedding-ada-002 and HuggingFace sentence-transformers/all-MiniLM-L6-v2.

Choosing an Embedding Model: Understanding the trade-offs can help you pick the right model for your needs:

  • text-embedding-ada-002: This is a more powerful, larger model by OpenAI. It generates embeddings with 1536 dimensions. While it offers better performance, it will cost more to store these embeddings in Pinecone.

  • sentence-transformers/all-MiniLM-L6-v2: This is a smaller, faster model that’s good for most simple use cases. It produces embeddings with 384 dimensions. The embeddings are around a fifth the size of OpenAI’s, costing roughly 1/5th as much to store in Pinecone.

Query Path Considerations: If you’re using OpenAI, generating an embedding for your query string to query Pinecone is straightforward using their API. For the sentence-transformers model, you can either self-host to generate the query embedding or use our query API, which wraps the query embedding and Pinecone API call for you.

Step 4: Submit

Click the “Submit” button to initiate the sync. The process will now begin, and your data will be synchronized with your Pinecone vector database. It may take a few minutes for the sync to begin processing.

đź›  Troubleshooting