Create a Sync
Quickstart guide to getting your data and embeddings in sync.
📌 What You’ll Need
- A Postgres database with data you’d like to embed (you can use Supabase for a free account).
- An account with Pinecone (you can sign up for a free account here).
- Postgres credentials including host, database name, port, username, and password.
- A Pinecone API key.
Note: Make sure you have all the necessary permissions to connect to your databases and generate API keys.
🚀 Getting Started
Step 1: Access the New Sync Form
Signup for a free (or paid) account on Embedding Sync.
Navigate to the “New Sync” tab in the Embedding Sync dashboard to get started.
Step 2: Database Connection Details
Fill in the following fields related to your Postgres database:
- Host: The address of your database server.
- Database: The name of your Postgres database.
- Port: The port number on which your database is running.
- Username: Your Postgres username.
- Password: Your Postgres password. (Note: We store this securely using AES256 industry-standard encryption.)
If you’re using Supabase, you can find this info in Project Settings
on
the sidebar > Database
> Connction Info
.
Step 3: Embedding Details
Now, specify the data you want to embed:
- Table Name: The database table containing the data to be embedded.
- Embedding Column Name: The specific column you’d like to use for generating embeddings.
- Primary Key Column Name: The column that serves as the unique identifier for each row.
- Updated At Column Name: The column used to determine newly added or updated rows to sync. If a previously embedded row is updated, the new value will be updated in the vector store overriding the previous value.
- Pinecone Key: Your Pinecone API key. Once logged into Pinecone, go to
API Keys
in sidebar >+ Create API Key
. - Embedding Model: Choose the embedding model you’d like to use. Options include OpenAI’s
text-embedding-ada-002
and HuggingFacesentence-transformers/all-MiniLM-L6-v2
.
Choosing an Embedding Model: Understanding the trade-offs can help you pick the right model for your needs:
-
text-embedding-ada-002: This is a more powerful, larger model by OpenAI. It generates embeddings with 1536 dimensions. While it offers better performance, it will cost more to store these embeddings in Pinecone.
-
sentence-transformers/all-MiniLM-L6-v2: This is a smaller, faster model that’s good for most simple use cases. It produces embeddings with 384 dimensions. The embeddings are around a fifth the size of OpenAI’s, costing roughly 1/5th as much to store in Pinecone.
Query Path Considerations: If you’re using OpenAI, generating an embedding for your query string to query Pinecone is straightforward using their API. For the sentence-transformers model, you can either self-host to generate the query embedding or use our query API, which wraps the query embedding and Pinecone API call for you.
Step 4: Submit
Click the “Submit” button to initiate the sync. The process will now begin, and your data will be synchronized with your Pinecone vector database. It may take a few minutes for the sync to begin processing.