Create a Sync
Quickstart guide to getting your data and embeddings in sync.
📌 What You’ll Need
- A Postgres database with data you’d like to embed (you can use Supabase for a free account).
- An account with Pinecone (you can sign up for a free account here).
- Postgres credentials including host, database name, port, username, and password.
- A Pinecone API key.
Note: Make sure you have all the necessary permissions to connect to your databases and generate API keys.
🚀 Getting Started
Step 1: Access the New Sync Form
Signup for a free (or paid) account on Embedding Sync.
Navigate to the “New Sync” tab in the Embedding Sync dashboard to get started.
Step 2: Database Connection Details
Fill in the following fields related to your Postgres database:
- Host: The address of your database server.
- Database: The name of your Postgres database.
- Port: The port number on which your database is running.
- Username: Your Postgres username.
- Password: Your Postgres password. (Note: We store this securely using AES256 industry-standard encryption.)
If you’re using Supabase, you can find this info in Project Settings
on
the sidebar > Database
> Connction Info
.
Step 3: Embedding Details
Now, specify the data you want to embed:
- Table Name: The database table containing the data to be embedded.
- Embedding Column Name: The specific column you’d like to use for generating embeddings.
- Primary Key Column Name: The column that serves as the unique identifier for each row.
- Updated At Column Name: The column used to determine newly added or updated rows to sync. If a previously embedded row is updated, the new value will be updated in the vector store overriding the previous value.
- Pinecone Key: Your Pinecone API key. Once logged into Pinecone, go to
API Keys
in sidebar >+ Create API Key
. - Embedding Model: Choose the embedding model you’d like to use. Options include OpenAI’s
text-embedding-ada-002
and HuggingFacesentence-transformers/all-MiniLM-L6-v2
.
Choosing an Embedding Model: Understanding the trade-offs can help you pick the right model for your needs:
-
text-embedding-ada-002: This is a more powerful, larger model by OpenAI. It generates embeddings with 1536 dimensions. While it offers better performance, it will cost more to store these embeddings in Pinecone.
-
sentence-transformers/all-MiniLM-L6-v2: This is a smaller, faster model that’s good for most simple use cases. It produces embeddings with 384 dimensions. The embeddings are around a fifth the size of OpenAI’s, costing roughly 1/5th as much to store in Pinecone.
Query Path Considerations: If you’re using OpenAI, generating an embedding for your query string to query Pinecone is straightforward using their API. For the sentence-transformers model, you can either self-host to generate the query embedding or use our query API, which wraps the query embedding and Pinecone API call for you.
Step 4: Submit
Click the “Submit” button to initiate the sync. The process will now begin, and your data will be synchronized with your Pinecone vector database. It may take a few minutes for the sync to begin processing.
🛠 Troubleshooting
Sync not working
Sync not working
Check your database credentials and Pinecone API key. Make sure all details are correctly entered.
If your Pinecone project is not in the us-west1-gcp
Environment, please (contact us)[mailto:vimota@embeddingsync.com].
Getting an unknown error
Getting an unknown error
Please (contact us)[mailto:vimota@embeddingsync.com] with the error details, we’ll debug you asap.