Backlogs grow fast. Finding the themes behind requests, bugs, and "nice to haves" shouldn't require spelunking. In our previous post, we built semantic issue search with PostgreSQL + pgvector and OpenAI — great if you want to run it yourself and keep everything in your database. This article shows the fully managed path on Google Cloud: BigQuery stores and searches your embeddings, Vertex AI generates them and answers with Gemini. Same outcome; far less ops, elastic scale, and pay‑per‑query simplicity. If you prefer the self‑hosted route, check out the pgvector/OpenAI version here: Improve GitHub Issues search with CloudQuery, PgVector and OpenAI.
What we’ll build #
Sync: CloudQuery pulls open issues from the cloudquery/cloudquery GitHub repository into BigQuery.
Embed: As part of the sync, BigQuery calls Vertex AI through a remote model to create text embeddings and store them alongside chunks on a separate table.
Ask: A small Python script takes a question, embeds it, searches against existing embeddings with VECTOR_SEARCH in BigQuery, and has Gemini write the answer based on retrieved snippets.
Prerequisites #
CloudQuery CLI: Install from the site (cloudquery.io). If above free-tier usage, a CLOUDQUERY_API_KEY is required.
GitHub Access Token: PAT with repository read access for fetching issues.
GCP project & dataset: We’ll call it ai-playground. Also a BigQuery dataset named github_issues needs to be created.
APIs enabled in the project:
BigQuery API
BigQuery Connection API
Vertex AI API
IAM permissions:
For you (the auth’d user) within the project:
roles/bigquery.admin
roles/resourcemanager.projectIamAdmin
For the BigQuery connection’s service account (we'll create this soon): Vertex AI User role
See also: BigQuery ML remote models (reference), remote model tutorial (guide), text embeddings overview (guide), and embeddings API reference (guide).
Step 1 — Create a BigQuery remote model to Vertex AI #
It’s a bit of setup, but once you get here the rest is trivial. We need a BigQuery dataset containing a remote model that points at Vertex AI’s embeddings endpoint.
Open your GCP project
Create a BigQuery dataset (or pick an existing one, e.g., github_issues)
Fill out the dataset form (choose location and defaults)
Enable required APIs
In BigQuery, click "+ Add data" to start creating a connection
Add a Vertex AI connection (Business Applications → Vertex AI Models: BigQuery Federation)
Grant the generated service account Vertex AI User in IAM
Create the remote model in your dataset (name it textembedding)
Note that the project ID is not the project name (which is ai-playground in this case). GCP picked triple-shift-469512-k6 as ID for this project.
If this succeeds, you’re ready for embeddings directly from SQL. Note the model name (i.e. textembedding) is a friendly name for the remote model in BigQuery's dataset, not the name of the model in Vertex AI.
Step 2 — CloudQuery config #
Create github_to_bigquery.yaml with the following content:
Notes:
GITHUB_TOKEN and any other environment variables will be expanded by CloudQuery when the sync is run.
Your environment must be authenticated against the GCP project. Instructions for this can be found in the BigQuery destination plugin docs.
Run the sync:
Example successful output:
Step 3 — Ask questions (RAG) with a tiny Python script #
The script below embeds your query via ML.GENERATE_EMBEDDING, searches your embeddings table with VECTOR_SEARCH (cosine distance), and asks Gemini to answer using the retrieved chunks.
Let's test it to see how our BigQuery plugin is doing!
Err..but at least we have text-embeddings support now! :|
Notes, gotchas, and tips #
Project ID: The name that appears in the GCP console is NOT the project ID, but the project name.
Remote model name: The remote model name is the name of the remote model in BigQuery. It is not the name of the model in Vertex AI. You create a remote model in BigQuery by specifying the connection to Vertex AI, and give it a friendly name.
Regions must match: Dataset, connection, remote model, and Vertex AI location should align.
Cost and quotas: Embeddings and vector search consume resources. Monitor usage in GCP.
Why BigQuery + Vertex AI? At scale, embedding and search directly in BigQuery lets you rely on Google’s infra for parallelism and resource management rather than operating your own vector database. In theory it should be cheaper, faster, and simpler to manage.
Where to go next #
GitHub source plugin docs: hub.cloudquery.io
BigQuery destination plugin docs: hub.cloudquery.io
Text embeddings: how-to, API reference
Vector search in BigQuery: overview
With a single config and a small script, you get high-quality semantic search over GitHub issues—backed by BigQuery and Vertex AI.
No clusters to size, no indices to tune, and no vector database to run. You keep governance and cost controls in BigQuery, while Vertex AI handles embeddings and answers.
Start with issues today, then join them with product analytics, support tickets, or docs you already have in BigQuery to power calmer triage, faster prioritization, and sharper roadmaps. If your team runs on Google Cloud, this is the most frictionless way to add RAG to your workflow: sync with CloudQuery, search with BigQuery, and answer with Gemini.
Ready to turn your backlog into insight? Install the CLI, run the sync, and start asking smarter questions.
.png)

