Show HN: Zingle – an AI code reviewer for data teams (SQL/dbt/Airflow/Spark)

2 hours ago 1

Hi HN, we’re Anant and Atishay, the co-founders of Zingle, an AI code reviewer for data teams.

It automatically checks SQL, dbt, Airflow, and Spark code changes in github PRs for cost regressions, logic issues, data-quality gaps, and downstream breakages before they merge into the production.

Here's a demo - https://youtu.be/dS0NnBjG2p4

You can try it on top 100 PRs for free at: https://getzingle.com

We built this after managing 60+ dbt PRs per week for an enterprise client. Senior data engineers had very limited time to review PRs, and with AI-assisted coding the amount of code being written each day grew a lot. This left teams choosing between two costly outcomes: let PRs through with minimal review and risk warehouse cost spikes or broken pipelines, or slow everything down with long review cycles.

Both outcomes ended up being costly, either in real dollars or in lost engineering time. While rushing to keep up with the volume, we shipped a PR that triggered repeated full refreshes on a large model and it turned into a $50k Snowflake bill.

We realized that AI code reviewers exist for software engineers, but nothing existed for data teams, whose PRs carry very different risks.

A SQL or dbt change is not just about correctness. You have to understand billing behavior, table sizes, lineage, cardinality, governance rules, and how the change interacts with real data. A SQL diff can look fine in code review but become wrong or expensive when it runs at scale.

What Zingle does on every PR:

* Predicts how the change will affect warehouse cost

* Detects full refreshes, missing predicates, exploding joins, and row-growth risks

* Runs new SQL in a safe sandbox and analyzes real data diffs

* Traces lineage to see which dashboards or models break downstream and notifies owners

* Flags missing data-quality checks (nulls, uniqueness, business tests) and redundant tests

* Enforces governance rules (PII rules, documentation, ownership, merge-key requirements)

Nothing leaves the customer warehouse. We do not store SQL, data, metadata, or queries.

What Zingle has caught so far:

* A repeated full refresh that would have cost tens of thousands

* Duplicate rows introduced in a fact table that would distort revenue

* Missing filters that would have doubled table sizes and slowed pipelines

* A column rename that would have broken 14 downstream dashboards

* Exploding joins from low-cardinality dimensions

* Undocumented models feeding finance metrics

* Incremental models missing merge-key dedupe logic

Across our user base, Zingle has already saved more than $2M+ dollars in avoided warehouse costs and broken pipelines.

Impact users have reported:

* 37% drop in warehouse cost

* 75% fewer data incidents

* SQL correctness confidence: 65% → 95%

* Model test coverage: 45% → 90%

* Governance coverage: 50% → 95%

* Review cycle time: 4 days → 1.5 days

* Mean time to resolve: 10h → 3h

Who we are: We’re Anant (PhD in AI, UIUC - published multiple AI papers) and Atishay (ex-Lead Data Engineer at Goldman Sachs, 8 years in data engineering + previously built in text-to-SQL). We did undergrad together.

We believe most data teams think they have strong best practices, but in reality the entire discipline - governance, testing, lineage, observability - is still evolving. The learning curve is costly: bad reviews waste senior engineers’ time, and missed issues cost teams money.

You can try Zingle here: https://getzingle.com

We’d love feedback - especially around false positives, rules you think should exist, and cases where Zingle should alert but doesn’t.

Read Entire Article