Create a Sync - 🔗 Embedding Sync

📌 What You’ll Need

A Postgres database with data you’d like to embed (you can use Supabase for a free account).
An account with Pinecone (you can sign up for a free account here).
Postgres credentials including host, database name, port, username, and password.
A Pinecone API key.

Note: Make sure you have all the necessary permissions to connect to your databases and generate API keys.

🚀 Getting Started

Step 1: Access the New Sync Form

Signup for a free (or paid) account on Embedding Sync. Navigate to the “New Sync” tab in the Embedding Sync dashboard to get started.

Step 2: Database Connection Details

Fill in the following fields related to your Postgres database:

Host: The address of your database server.
Database: The name of your Postgres database.
Port: The port number on which your database is running.
Username: Your Postgres username.
Password: Your Postgres password. (Note: We store this securely using AES256 industry-standard encryption.)

{
  "host": "db.yourhost.supabase.co",
  "database": "postgres",
  "port": 5432,
  "username": "postgres",
  "password": "yourpassword"
}

If you’re using Supabase, you can find this info in Project Settings on the sidebar > Database > Connction Info.

Step 3: Embedding Details

Now, specify the data you want to embed:

Table Name: The database table containing the data to be embedded.
Embedding Column Name: The specific column you’d like to use for generating embeddings.
Primary Key Column Name: The column that serves as the unique identifier for each row.
Updated At Column Name: The column used to determine newly added or updated rows to sync. If a previously embedded row is updated, the new value will be updated in the vector store overriding the previous value.
Pinecone Key: Your Pinecone API key. Once logged into Pinecone, go to API Keys in sidebar > + Create API Key.
Embedding Model: Choose the embedding model you’d like to use. Options include OpenAI’s text-embedding-ada-002 and HuggingFace sentence-transformers/all-MiniLM-L6-v2.

Choosing an Embedding Model: Understanding the trade-offs can help you pick the right model for your needs:

text-embedding-ada-002: This is a more powerful, larger model by OpenAI. It generates embeddings with 1536 dimensions. While it offers better performance, it will cost more to store these embeddings in Pinecone.
sentence-transformers/all-MiniLM-L6-v2: This is a smaller, faster model that’s good for most simple use cases. It produces embeddings with 384 dimensions. The embeddings are around a fifth the size of OpenAI’s, costing roughly 1/5th as much to store in Pinecone.

Query Path Considerations: If you’re using OpenAI, generating an embedding for your query string to query Pinecone is straightforward using their API. For the sentence-transformers model, you can either self-host to generate the query embedding or use our query API, which wraps the query embedding and Pinecone API call for you.

{
  "table_name": "articles",
  "embedding_column_name": "content",
  "pk_column_name": "id",
  "updated_at_column_name": "updated_at",
  "pinecone_key": "your-pinecone-key",
  "embedding_model": "sentence-transformers/all-MiniLM-L6-v2"
}

Step 4: Submit

Click the “Submit” button to initiate the sync. The process will now begin, and your data will be synchronized with your Pinecone vector database. It may take a few minutes for the sync to begin processing.

🛠 Troubleshooting

Sync not working

Getting an unknown error

Get Started

​📌 What You’ll Need

​🚀 Getting Started

​Step 1: Access the New Sync Form

​Step 2: Database Connection Details

​Step 3: Embedding Details

​Step 4: Submit

​🛠 Troubleshooting

📌 What You’ll Need

🚀 Getting Started

Step 1: Access the New Sync Form

Step 2: Database Connection Details

Step 3: Embedding Details

Step 4: Submit

🛠 Troubleshooting