AI Frontend Generator tools are experiencing a significant surge in popularity and adoption across the industry, with what appears to be multiple new tools being released and announced each week. It seems that every major WYSIWYG Platform provider is now developing and launching their own proprietary solution in this space.
With the recent surge in attention around Claude Code and similar AI-powered development tools, now is the perfect time to examine these platforms in depth and understand how they truly compare.
I have taken the time to thoroughly examine and test several of these tools, and I want to share my detailed insights and findings with you.
Hypothesis
I am working under the assumptions outlined in my post The AI Future of Frontend Development. I want to understand in how far the Frontend Development domain becomes obsolete over time.
My hypothesis is that from design, building a design system to the production code frontend, all could be done using AI in the future — obviously guided by experts.
As more developers explore the potential of AI and incorporate it into their workflows, we will undoubtedly see a range of new innovations that will continue to shape the industry.
Well, here we are.
With more tools on the rise that are able to provide user interfaces, that look well, perform great and are accessible, it is more likely that designers, business people and engineers jump to use these tools quickly without considering another engineer to build interfaces for them.
This represents a fundamental shift in software development—what Andrej Karpathy recently termed "vibe coding". Instead of manually writing code, developers describe their project to an AI, which generates code based on the prompt. The developer then evaluates results and requests improvements without manually reviewing the code. This approach emphasizes iterative experimentation over traditional code structure, allowing amateur programmers to produce software without extensive training, though it raises questions about understanding and accountability.
This experiment focusses purely on Web. I am using a greenfield project and am not looking into complex long-term projects and maintainability. This is a test with one person, not looking at collaborative features.
Structured approach
For this experiment, I used a structured approach. I generated a list of features for a greenfield project that I wanted the tools to build.
I then fed each tool the same prompt along with the feature list.
I generated this feature list using Google's Gemini AI, asking it to create a markdown file that works well with LLMs.
Here is the feature list: speakit-feature-list.md
Here is the prompt I used:
Task: Build the Minimum Viable Product (MVP) for the application named Speakit. The complete and definitive feature set, constraints, and scope are provided in the attached document, "Speakit MVP Feature List".
Final Output: Generate the complete, runnable App that fulfills these requirements.
I intentionally did not assign a role to the LLM—I believe the tools should handle this themselves.
After the initial generation, I re-iterated and asked each tool to verify that all features were implemented.
Task: Go through the list of features one more time thoroughly to verify that the app is feature complete. Implement what is missing.
I then compared the code using the following criteria:
Objective Metrics
- Performance: Did the application perform well under modern standards on Mobile and Desktop. Using Lighthouse scores, local machine
- Code Quality: Maintainability, readability, and adherence to best practices. Measure using: Maintainability Index, code complexity analysis (cyclomatic complexity), Lines of Code, and manual code review
- Export/Portability: Can you extract clean code and deploy elsewhere? Measure: Ability to run exported code without modifications, dependency management
- Cost: Pricing models and value for money. Measure: Cost per project, feature limits at each tier
- Tech Stack: What frameworks/libraries does it use? Are they modern and well-supported? Measure: Framework versions, community size, maintenance status
- Version Control Integration: Git support and collaboration features. Measure: Native Git support, commit history, branching capabilities
- Accessibility: Was accessibility implement by the tool? Using Lighthouse scores, automated accessibility testing tools
- Error Handling: How well does generated code handle edge cases and errors? Measure: Test coverage, error boundary implementation, validation logic
Subjective Metrics
- Developer Experience: Ease of iteration, debugging capabilities, and learning curve. Measure: User surveys, time to first successful deployment, qualitative assessment
- Iteration Speed: How quickly can you make changes and see results? Measure: Time from prompt to preview, number of iterations needed
Tooling Landscape
Here are the tools I looked at:
- Lovable
- Replit
- Vercel v0
- base44
Additionally I also used two editors to generate:
- Cursor Editor
- GitHub Copilot through VS Code
Plus, and that’s mainly due to the late hype: Claude Code
Obviously there are more tools. I left out Locofy.ai, Anima App and Bolt just due to the effort. And others like Framer, Stitch, UIzard or Canva, all more Design-leaning.
Comparison
(scroll horizontally)
Feature Completeness
(scroll horizontally)
Insights
Lovable
Easy to use—backend integration is a huge plus. The app worked only partially but looked decent. The core functionality didn't work, which is a major drawback. With iterations, it seems possible to make the app fully functional.
Replit
The "Design First" approach is smart—it lets you iterate on design without full code integration in the background.
I really liked the overall design of the application. It was clean and functional. The errors seem easily solvable, except perhaps the Firebase backend. Here, built-in functionality could be beneficial.
Overall, creating the application took very long—especially compared to other tools.
Vercel’s v0
v0 was thorough in checking that functionality exists and works, though it missed some integration pieces—specifically Summary and Login. These issues could likely be fixed by explicitly pointing them out again, but that shouldn't be necessary for a tool like this. v0 integrates extremely well with the Vercel platform. Not using Firebase might have made it easier for the tool to leverage Supabase, which integrates seamlessly with Vercel.
base44
I added this tool because I see frequent ads for it on YouTube. That was a mistake.
As an engineer, this tool is a nightmare—there's no code access at all. While it has some functionality, it missed critical features like Guest Mode and Bookmarks. The inability to view or publish the generated code makes it a non-starter for me. This also prevented me from running code quality metrics.
The app itself was unusable since content didn't play reliably. Some functionality felt over-engineered. However, it was the only tool that could play some PDFs—though inconsistently. The pricing is steep. I don't recommend it.
Cursor
I have used Cursor extensively for creating applications on my own, well beyond what I tested here—I'm speaking from long-term use.
This editor is developing its own features remarkably fast. What I really love is its planning mode, where you can first iterate on the architecture of an implementation, refine it, and then—once it's ready—either build it yourself or let the tool handle it. For me, this feels like the architectural discussions I would have with a team or co-worker, except you're doing it with an AI that knows your code.
For the given scenario, the page presentation was well executed and looked nice. However, some features weren't implemented well, which caused failures. For an AI that runs in the editor, it did surprisingly well.
GitHub Copilot
This is the result I expected from Cursor as well, to be honest. Copilot seems to excel at other tasks. The landing page was somewhat of a disaster. The functionality didn't come close to meeting the requirements, and the tool wasn't straightforward in telling me what to do with the code.
This tool is built for a different job—it's more of an AI pair-programmer.
Cursor Code
It's unfortunate you can't test the tool without a subscription. I used the API connection and loaded the account with €5 plus taxes. Generating this app cost me $3.34—a significant amount. I'd definitely recommend the subscription instead.
When testing the app, I was surprised by how accurately the features were executed—except for the failing signup/login. But this seems to be a common issue across most tools.
Though it wasn't fast, this tool comes close to what I'd expect from an agentic code generator.
Limitations
This evaluation has several important limitations that should be considered when interpreting the results:
Testing Timeframe Constraints: This review represents a snapshot in time during October 2025. The testing was conducted over a limited period, which means that some tool features or capabilities may have been missed or not fully explored. Each tool received the same initial prompt and one follow-up iteration, which may not fully represent their capabilities in extended development scenarios, e.g. see Plan mode in Cursor.
Evolving Nature of AI Tools: The AI frontend generation space is rapidly evolving, with frequent updates, new features, and model improvements being released weekly. By the time you read this, some of the tools may have significantly improved or changed their capabilities.
Subjective Nature of Some Assessments: Several metrics in this evaluation, particularly in the "Subjective Metrics" section, rely on personal experience and judgment.
Conclusion
Interestingly, all tools defaulted to the same tech stack—even though it wasn't specified. A common tech stack for web apps appears to be emerging. What does this mean for the engineering community?
Here are all patterns I identified:
Patterns
- All tools defaulted to the same tech stack (React, Vite, Tailwind CSS, Firebase) without being explicitly asked, suggesting an emerging standard for AI-generated web apps
- Authentication and login functionality was a consistent weak point across multiple tools, indicating this remains a challenging area for AI code generation
- Tools fell into distinct categories: productivity enhancers (Copilot, Claude Code), refactoring specialists (Cursor), and prototype generators (v0, Lovable) → see Recommendations below
- Editor-based tools (Cursor, Claude Code) provided more control and transparency but required more technical knowledge, while standalone generators (v0, Lovable) were faster but less flexible
- Cost models varied dramatically—from subscription-based to pay-per-generation—with significant implications for different use cases — $20-25 / months seems to be a good middle ground though
- Code quality and maintainability varied widely, with some tools producing production-ready code while others generated prototypes requiring significant refactoring
- Most tools struggled with complex feature integration and edge cases, requiring multiple iterations or manual intervention
- Visual design quality was surprisingly high across most tools, but functional completeness often lagged behind aesthetic presentation
Recommendations
What is your primary use case?
For boosting day-to-day productivity: If you need completion, chat, and suggestions to enhance your daily workflow → choose Copilot or Claude Code.
For transforming how you build: If you want to refactor legacy code, automate feature branches, generate test suites, or handle cross-module design → choose Cursor.
For building from scratch or prototyping: If you're starting a new app, launching a business, need inspiration, want to modify an existing codebase without deep knowledge, or need rapid prototyping → choose v0.
Wrapup
So -- this is my experience with AI Frontend Generators. What is yours? Did I miss any tools? Did one tool work exceptionally well for you? Let me know on LinkedIn.
.png)
 2 days ago
                                1
                        2 days ago
                                1
                     
  
/https://tf-cmsv2-smithsonianmag-media.s3.amazonaws.com/filer_public/02/b0/02b0ad5f-1007-4a21-ba98-1d65dc4b4f14/oldrieve.png)

