LLM Code Review vs. Deterministic SAST Security Tools

2 hours ago 2

Oct 2, 2025

How do the latest models stack up against traditional code scanners?


Many complaints about LLM-based tools today revolve around the results being non-deterministic. This is a especially true in security, where consistency and reproducibility is paramount. We’ve spent the last few months building Fraim to explore how we can use AI to tangibly increase a team’s security posture.

Fraim is an open source toolkit for Security teams that includes built in AI workflows to help supercharge your security practices

For both appsec and cloudsec, SAST scanners like Semgrep (appsec) and Checkov (cloudsec) are the de facto standard to catch issues before they get merged and deployed. Add the scanner to your CI pipeline, customize the rules if you’re feeling daring, and profit.

Sadly, the reality is not quite that clean. Many security policies and best practices are hard to encode as deterministic rules. It’s easy for a security engineer to “know it when they see it”, but not to describe precisely. You get stuck with either an overly broad rule that leads to false positives (teaching developers to ignore all of the findings) or a narrow rule that misses many violations.

If a security engineer can “know it when they see it it”, why not an LLM?

If a security engineer can “know it when they see it”, why not an LLM? It’s been trained on the same set of best practices — passed down through books, blog posts, and forum discussions — as the human. Turns out, LLM evaluation is a great candidate for these subjective, arguably under-specified, policies.

Let’s explore some specific examples.

Cloud Compliance Controls

Imagine you are a cloud security engineer trying to enforce the security controls from CIS AWS Foundations or C5 frameworks across your Terraform IaC (infrastructure-as-code).

Each has a control stating that admin ports should not be exposed to the public internet. How would you implement a rule to enforce that?

You get Checkov’s rules CKV_AWS_24 and CKV_AWS_25 for free, but those only cover ports 22 (SSH) and 3389 (RDP). What about databases, docker/k8s control planes, etc?

Like most engineers today, you could ask an LLM to write a better rule. We tried this by prompting GPT-5 with “Write a Checkov rule that disables admin ports on security groups. Make sure to include them all and account for edge cases.” It generated the following ports, IP addresses, and resources to filter (full rule here).

PUBLIC_V4 = "0.0.0.0/0" PUBLIC_V6 = "::/0" # A conservative, extensible set of sensitive/admin ports. # (Covers remote admin, DBs, common admin consoles, k8s, docker, etc.) SENSITIVE_PORTS: List[int] = sorted(set([ 22, # SSH 3389, # RDP 5985, 5986, # WinRM 2375, 2376, # Docker daemon API 10250, 10255, # Kubelet 6443, # Kubernetes API server 2379, 2380, # etcd 5432, # PostgreSQL 1433, # MS SQL Server 1521, # Oracle 3306, # MySQL 27017, # MongoDB 6379, # Redis 11211, # Memcached 9200, 9201, # Elasticsearch 5601, # Kibana 5900, 5901, # VNC 8080, 8081, # Common admin UIs / proxies 8443, # TLS admin UI 8888, # Jupyter / dev consoles 9000, # SonarQube et al. 9090, # Prometheus 4505, 4506, # SaltStack ])) supported_resources = ["aws_security_group", "aws_security_group_rule"]

At first glance this rule appears to be more robust. Many more ports are covered, and it includes the obvious public IPv4 and IPv6 strings. However, someone intimately familiar with security might notice a few shortcomings.

  • Is this an exhaustive list of admin ports?
  • What if someone configures an app to use a non-default port?
  • What if an ingress rule allows 0.0.0.0/1 and/or 128.0.0.0/1, essentially the same as 0.0.0.0/0? (The CIS AWS control itself codifies this flaw.)
  • What happens when an engineer decides to use CloudFormation, not Terraform? CDK? Pulumi?
  • What happens when a project uses Google Cloud or Azure?

Since the rule only checks precisely what is stated, your static analysis could now happily approve a dangerous configuration. The rule author would need to envison and handle every possible case ahead of time.

A Better Way - AI Evaluation

Now let’s flip it: instead of writing a rule (or iteratively asking the LLM to “try extra hard to be exhaustive”), we ask AI to evaluate the code directly against the intent of the control.

“Disallow admin ports being open to the internet in security groups. Cover any edge cases.”

We’ll use the Fraim risk_flagger workflow, which uses an LLM to analyze a PR (aka git diff) for a customizable set of “risks”, passing our control as the custom risk. (Here is the full system prompt.)

fraim run risk_flagger <snip> --custom-risk-list-json '{ "Disallow Public Admin Ports on Security Groups": "Disallow admin ports being open to the internet in security groups. Cover any edge cases." }'
Follow Along - Instructions

The code for these tests are published in branches of the fraim repo. To follow along, clone the repo and run the commands in the “Follow Along” blocks below.

Let’s see if this AI evaluation will solve the shortcomings from the original rule.

Is this an exhaustive list of admin ports?

No. Here’s a PR that opens port 5938, commonly used for Teamviewer, a remote management software. While this software isn’t as common as something like Postgres, it represents a popular service with a bespoke admin port. Notably, one that wasn’t included in the generated rule.

Follow Along - Run This Example

You can follow along by running Fraim in its own repo:

fraim run risk_flagger <snip> --base demos/admin-ports-detect-5938/before --head demos/admin-ports-detect-5938/after --custom-risk-list-json '{ "Disallow Public Admin Ports on Security Groups": "Disallow admin ports being open to the internet in security groups. Cover any edge cases." }'

)

But the AI detects this with no problem.

> fraim run risk_flagger ... Security group exposes admin port 5938 (and all protocols) to the internet over IPv4 and IPv6. Severity: high Location: demos/admin-ports-detect-5938/main.tf:6

Using AI to evaluate this case is utilizing the full potential of the LLM. It sees port 5938 is open and knows that 5938 is usually used by TeamViewer, a remote administration software.

Why didn’t it include that port in the original hardcoded rule? Think about a human doing this. Is it easier to generate a truly exhaustive list of ALL possible admin ports a priori? Or to simply take one port and decide whether it is possibly an admin port? The latter is much easier for humans and also for LLMs. The connection from 5398 to “admin port” is much stronger than connection from “possible admin ports” to 5398.

What if someone configures an app to use a non-default port?

The advantages of analyzing the actual open port go a step further. The hardcoded rule detects default ports only. What if someone runs a service on a non-default port?

Let’s take a scenario where a Redis cluster is already running on custom port 7000, instead of the default 6379. Here’s a PR that opens that port 7000 to the public internet.

Follow Along - Custom Port Detection Example

You can follow along by running Fraim in its own repo:

fraim run risk_flagger <snip> --base demos/admin-ports-non-default-port/before --head demos/admin-ports-non-default-port/after --custom-risk-list-json '{"Disallow Public Admin Ports on Security Groups": "Disallow admin ports being open to the internet in security groups. Cover any edge cases and cover any non-default ports used in the codebase"}'

The initial context for the LLM contains only the diff exposing port 7000, nothing about the pre-existing Redis cluster behind it. Can the LLM use its tools to query the full code base and determine that the open port is mapped to the Redis cluster and thus is an admin port?

> fraim run risk_flagger ... Publicly exposed admin port 7000 for Redis service. Severity: critical Location: `demos/admin-ports-non-default-port/main.tf:6` Explanation: * The code change introduces a new AWS Security Group that exposes port 7000 to the public internet (`0.0.0.0/0` and `::/0`). * A search of the codebase reveals that port 7000 is used for a Redis service. * Exposing a Redis database to the public internet is a serious security risk, as it can lead to unauthorized access, data exfiltration, or denial-of-service attacks. * This is a non-standard port for Redis, but the risk is the same..

Yes, but in full transparency, we did have to coerce the prompt a bit to get this result to work correctly. The LLM did not want to call tools with the original prompt, so we changed it to be

Disallow admin ports being open to the internet in security groups. Cover any edge cases and cover any non-default ports used in the codebase.

which then works as expected. Luckily, Fraim makes it easy to customize your prompt based on feedback.

Even though we did need to slightly modify the prompt, that is much easier than trying to write a rule that would cover this use case (and you’d need to do it for EVERY resource that has a configurable admin port).

What if both 0.0.0.0/1 and 128.0.0.0/1 are opened, essentially the same as 0.0.0.0/0?

0.0.0.0/0 (or ::0 for IPv6) is often used as an alias for the public internet. The CIS AWS Foundation benchmarks use it. The corresponding built-in Checkov rules use it. And so did the AI-generated rule above.

It’s easy to (accidentally or maliciously) write a firewall rule that bypasses such a check. Instead of explicitly listing out 0.0.0.0/0, list out slightly smaller CIDRs that cover all (or just many) IP addresses. For example, 0.0.0.0/1 and 128.0.0.0/1. You’ve violated the intent of the policy, but not the letter codified in the scanner rule.

Follow Along - CIDR Edge Case Example

You can follow along by running Fraim in its own repo:

fraim run risk_flagger <snip> \ --base demos/admin-ports-cidr-edge-case/before \ --head demos/admin-ports-cidr-edge-case/after \ --custom-risk-list-json '{"Disallow Public Admin Ports on Security Groups": "Disallow admin ports being open to the internet in security groups. Cover any edge cases."}'

The following PR showcases a malicious attempt at this workaround. Is the LLM fooled by this sneaky change?

> fraim run risk_flagger ... Security group opens PostgreSQL (5432) to the public internet using split CIDR blocks (0.0.0.0/1 + 128.0.0.0/1), an edge case equivalent to 0.0.0.0/0. Severity: high Location: `demos/admin-ports-cidr-edge-case/main.tf:6` Explanation: * The change adds an aws_security_group with an ingress rule allowing TCP port 5432 (PostgreSQL) from two CIDR blocks: 0.0.0.0/1 and 128.0.0.0/1. * These two CIDRs together cover the entire IPv4 internet (equivalent to 0.0.0.0/0), which is an edge-case pattern used to bypass simple 0.0.0.0/0 checks. * Opening a database administration/service port to the public internet violates the policy to disallow public admin ports on security groups. * The security team should investigate and restrict access to trusted ranges or remove public exposure.

Nope. No more sneaky (or just misguided) developers trying to bypass standard procedures!

You could write a better static rule. Define the ranges considered public (i.e., not RFC1918, not the VPC CIDR, etc.) and then check if the exposed CIDR overlaps those ranges. Almost no one does though; too much work.

The LLM naturally interprets the intent of the policy, so doesn’t require any extra work to do the right thing.

Does this cover Cloudformation? CDK? Pulumi?

In any reasonably large organization, there’s often more than just one standardized IaC language being used. So what happens? Well, the rule needs to work for every language used within the org.

Say one team starts using CDK for a new project. Your scanner won’t cover that infra until you’ve updated every rule to work with CDK. Or imagine a company wide initiative to migrate from Terraform to CDK. Again, you’d have to update all the rules, adding more time and risk to the migration. Having the AI evaluate the rules, however, is IaC agnostic.

Follow Along - Pulumi Example

You can follow along by running Fraim in its own repo:

fraim run risk_flagger <snip> \ --base demos/admin-ports-pulumi/before \ --head demos/admin-ports-pulumi/after \ --custom-risk-list-json '{"Disallow Public Admin Ports on Security Groups": "Disallow admin ports being open to the internet in security groups. Cover any edge cases."}'

This PR shows an example with the same risky configuration using Pulumi.

> fraim run risk_flagger ... Security group ingress opens PostgreSQL admin port 5432 to the internet over both IPv4 and IPv6. Severity: critical Location: `demos/admin-ports-pulumi/main.ts:12` Explanation: * This change adds two aws.vpc.SecurityGroupIngressRule resources that allow inbound TCP traffic on port 5432 (PostgreSQL) from 0.0.0.0/0 (lines 12-18) and ::/0 (lines 19-25). * Port 5432 is a database administration/service port and opening it to the entire internet violates the policy to disallow public admin ports on security groups. * The security team should investigate and restrict access to specific trusted CIDRs or security groups, and remove the world-open IPv6 rule if not explicitly required.

Without any changes to the rules, the LLM caught the same issues in the Pulumi that it did in the Terraform examples above. With the LLM evaluating, you can safely allow a team to try a new IaC or do that company-wide migration, knowing that the same rules you already wrote will still be enforced.

What happens if we migrate to a different Cloud provider?

AI is cloud agnostic, so the rules will apply no matter which clouds you use (or which clouds you may migrate to in the future). We’re going to skip an example for this case, because we think you get the picture.

Principle of Least Privilege

Let’s look at another cloud control that is even more subjective and hard to codify: IAM policies should follow the principle of least privilege. This principle is often talked about, but difficult to enforce. A lot of context about a use case is required to understand the intent of the policy.

It’s easy to write a static rule for a naive interpretation of this policy. For example, the CIS AWS Foundations Benchmark #1.15 says “Ensure IAM policies that allow full "*:*" administrative privileges are not attached”. More specifically, the recommended audit looks for policies with "Effect": "Allow", "Action": "*", and "Resource": "*". Checkov implements this with rules CKV_AWS_1 and CKV_AWS_62.

These static rules boil down to the bare minimum of disallowing wildcards actions and resources. Easy to write, but too blunt to catch many types of over permissioning. This is a classic case where an expert human reviewer would be able to spot discrepancies easily, but trying to write code that will catch the same discrepancies is very difficult.

What if the reviewer was able to write these discrepancies out in plain English, and simply have the LLM evaluate them. Below we look at 3 different examples of this:

Instance profile vs container role examples

This one is an issue we have struggled with in the past. When using ECS, what is the difference between the instance profile role and the container role, and what permissions need to be used in each? In the past we’ve done the opposite of what any security team would want, we granted all the necessary permissions to BOTH roles.

Follow Along - Instance Profile vs Container Role

You can follow along by running Fraim in its own repo:

fraim run risk_flagger <snip> \ --base demos/instance-profile-vs-container-role/before \ --head demos/instance-profile-vs-container-role/after \ --custom-risk-list-json '{"Overly Permissive IAM": "Understand the types of policies that should be in an instance profile vs a container role. And flag when a user is putting policies that should be intended for one into the other."}'

We tested to see if the LLM would catch a case like this (PR) with the prompt:

Understand the types of policies that should be in an instance profile vs a container role. And flag when a user is putting policies that should be intended for one into the other.

And the results:

> fraim run risk_flagger ... Container role includes policies intended for an instance profile (EC2 Describe* and ECR pull actions). Severity: medium Location: `demos/instance-profile-vs-container-role/main.tf:1` Explanation: * The container/application role (aws_iam_role.test_role_container) is granted managed policies that include EC2 Describe* (policy_one) and ECR image pull permissions (policy_three). * EC2 Describe* and ECR image pull permissions are typically required by the host/instance profile or task execution role, not by the application container role. * Assigning these to the container role broadens its permissions beyond what the application should need and mixes responsibilities between container and instance/execution roles. --- Instance profile role includes policies intended for a container/application role (broad S3 write on all buckets). Severity: high Location: `demos/instance-profile-vs-container-role/main.tf:7` Explanation: * The instance profile role (aws_iam_role.test_role_instance_profile) is granted a managed policy (policy_two) that includes "s3:Write*" on Resource "*". * Broad S3 write access is typically application-specific and should be granted to the container/task role with least privilege to specific buckets, not to the instance profile. * Placing this on the instance profile unnecessarily elevates the host's privileges and mixes responsibilities between instance and container roles. --- Both roles are attached to the exact same set of managed policies, indicating role boundary violations. Severity: medium Location: `demos/instance-profile-vs-container-role/main.tf:1` Explanation: * The container role and the instance profile role are both attached to the same three managed policies. * This suggests a conflation of responsibilities between an instance/execution role and an application/container role. * Security should verify and realign policies so that: (1) host/instance/execution responsibilities (e.g., EC2 Describe*, ECR image pull) are on the instance/execution role only, and (2) application data access (e.g., S3 write) is scoped and placed on the container role only.

The results show how familiar the LLM is with the differences between the two types of roles. It was able to flag that the two roles were using the same policies, and also that the instance profile had permissions attached that only made sense in the context of that container role.

We were worried about false positives for this case, so we fixed the issues flagged by Fraim (commit), and reran the same prompt. To my surprise, it returned back no risks as expected.

Unnecessary Actions

Sometimes, when looking at the role attached to a resource, it becomes clear that there’s no need for certain policies to be attached. For example, why would a role attached to an EC2 instance need access to create an SQS Queue? Sure, it might need to read or write to that queue, but creating a queue is unnecessary.

Follow Along - Unnecessary Actions

You can follow along by running Fraim in its own repo:

fraim run risk_flagger --model openai/gpt-5 --temperature 1 --diff \ --base demos/iam-unnecessary-actions/before \ --head demos/iam-unnecessary-actions/after \ --custom-risk-list-action replace \ --custom-risk-list-json '{"Overly Permissive Role": "Look at any role attached to a cloud resource, and flag it if there is no reason a policy in that role should be attached to that cloud resource. ie destructive permissions to other cloud resources."}'
fraim run risk_flagger <snip> \ --base demos/iam-unnecessary-actions/before \ --head demos/iam-unnecessary-actions/after \ --custom-risk-list-json '{"Overly Permissive Role": "Look at any role attached to a cloud resource, and flag it if there is no reason a policy in that role should be attached to that cloud resource. ie destructive permissions to other cloud resources."}'

This is yet another case where an expert human reviewer could easily detect something that a static rule might now. So we ran a test on this (PR) with the prompt:

Look at any role attached to a cloud resource, and flag it if there is no reason a policy in that role should be attached to that cloud resource. ie destructive permissions to other cloud resources.

And got the results:

> fraim run risk_flagger ... EC2 instance attaches an IAM instance profile whose role includes destructive and broad permissions (e.g., SQS DeleteQueue on all resources). Severity: high Location: `demos/iam-unnecessary-actions/ec2.tf:5` Explanation: * 1) Trigger: A new aws_instance (app_instance) was created and configured with iam_instance_profile = aws_iam_instance_profile.test_role_instance_profile.name. * The referenced role (aws_iam_role.test_role_instance_profile in main.tf) has managed_policy_arns including policy_three. * 2) Relation to risk: policy_three grants SQS management actions, including sqs:CreateQueue and sqs:DeleteQueue, with Resource = "*". * DeleteQueue is a destructive permission on another cloud resource and is broadly scoped. * Attaching this to an EC2 instance constitutes an overly permissive role. * 3) Security implications to investigate: Whether this EC2 instance legitimately requires creating/deleting SQS queues; if not, remove or scope down these actions. * At minimum, restrict to specific queues (by ARN) and drop DeleteQueue if unnecessary. * Validate that only required ECR read actions remain and that EC2 Describe* is acceptable.

This one actually took a bit of prompt engineering. We tried getting it to work without the example but it was too vague for the LLM to take an action on. Instead we did add the example to look at admin permissions and with that level of detail it was able to determine that the role should be flagged.

Again we see that with a small amount of prompt engineering, helping the LLM to use additional context, we’re able to achieve reasoning beyond what a static rule would be able to do. Just like you would need to train a Security Professional to understand to look for certain cases, there is some amount of training you need to do to get appropriate results from the LLM. However, now instead of needing to sift through the code manually, you can just point an LLM at your code and only tell it to loop you in if there’s something that needs to be changed. No more needing to worry about vulnerabilities sneaking in, and no need to write overly broad rules that everyone ends up ignoring anyway.

Overly Permissive Actions Statements

Follow Along - Overly Permissive IAM

You can follow along by running Fraim in its own repo:

fraim run risk_flagger <snip> \ --base demos/overly-permissive-iam-actions/before \ --head demos/overly-permissive-iam-actions/after \ --custom-risk-list-json '{"Overly Permissive IAM": "Disallow IAM roles from having a wildcard statement with Write or Create actions."}'

Let’s start with a fairly straightforward example from this (PR). Here we’re creating a policy that allows all “write” actions on S3 buckets. Checkov has a rule CKV_AWS_111 to “Ensure IAM policies does (sic) not allow write access without constraints”. This rule calls another tool, Cloudsplaining, to find unconstrained cloud actions, which then uses another helper function from Policy Sentry to filter out the actions..

Our PR is sneaky though. It uses a wildcard s3:Pu* action instead of s3:Put*.

Let’s see how the LLM does with the following policy.

“Disallow IAM roles from having a wildcard statement with Write or Create actions.”

> fraim run risk_flagger ... IAM policy grants wildcard Write permissions (s3:Pu*) on all S3 resources and is attached to a role. Severity: high Location: `demos/overly-permissive-iam-actions/main.tf:29` Explanation: * The change introduces aws_iam_policy.policy_two with an Action of "s3:Pu*" and Resource "*" (lines 29-35), which is attached to aws_iam_role.example via managed_policy_arns. * The wildcard "Pu*" matches S3 Put* actions (write/create operations), granting broad write permissions across all S3 resources. * This directly falls under the risk of Overly Permissive IAM where wildcard statements grant Write/Create actions. pick 3f80e69 # Add demos/overly-permissive-iam-actions pick 071a91e # Add IAM role with two policies * Security should investigate and require least-privilege: replace the wildcard with explicit required actions and scope Resource to specific buckets/ARNS, potentially adding conditions.

Again, the LLM catches on.

To be fair, Checkov also catches this case, but with a decent amount of engineering effort that most security teams simply cannot afford, especially if they wanted to customize the rule even further.

With natural language prompts, this extends easily to other conditional constraints. As an example, you may want to disallow a wildcard Resource when a Write action attached. Or you may want to disallow a specific suffix (ie -prod) from having certain actions applied. You can explain these conditions simply, and the LLM will consider the underlying intent when applying them. You don’t need to explicitly cover every edge case or variant.

Static rules definitely have their place - but denouncing AI altogether because it’s non-deterministic is living in the past. The new flagship models are good, and they can produce deterministic results given the right inputs and prompts. We can then use the advantages of both AI and static rules to keep our environments safer than ever.

Deterministic rules are great for common cases that are easy to codify. “Port 3389 (RDP) is not exposed to 0.0.0.0/0” or “IAM policy does not use a wildcard * resource”. These will catch the most egregious violations of the intended policy. But they leave open a long tail of ways to accidentally violate that policy. AI-based scanning won’t catch all of those all of the time, but will catch some of them enough of the time to be worth adding to your SAST toolkit.

If you’d like to use the power of AI without building it yourself (or paying for yet another tool), use Fraim! We are constantly looking for new ways to bring the power of AI to security teams with as little effort as possible.

All of the prompts you see above (and more) are built into Fraim’s risk_flagger workflow. And we even have easy-to-use Github and Slack integrations so you can get notified of risks wherever it’s convenient for you.

Read Entire Article