
Introduction
The landscape of AI-powered coding assistants has evolved rapidly in 2025, moving beyond simple code completion to fully agentic development experiences. After the announcements of GitHub Copilot’s coding agent general availability and OpenAI GPT-5 Codex integration, I decided to conduct a comprehensive comparison of the leading AI coding tools.
This hands-on evaluation examines six major players in the agentic AI coding space:
- GitHub Copilot – The pioneer with new agentic capabilities
- Claude Code – Anthropic’s terminal-based AI coding agent
- Cursor – The popular AI-first IDE
- Windsurf – The rising challenger with auto-run capabilities
- Kiro – AWS’s spec-driven development IDE
- Gemini CLI – Google’s open-source terminal AI agent
My testing methodology prioritized minimal intervention, allowing each agent to handle implementation autonomously. I used Exercism Rust challenges as a consistent benchmark across all platforms, plus a React-based weird animals quiz app for deeper comparison between Kiro and GitHub Copilot.
GitHub Copilot
The Experience
GitHub Copilot impressed with its proactive approach to gathering context. When implementing Exercism tasks, it recommended adding detailed instructions to improve code quality – a thoughtful touch that shows maturity in the product.
Key Implementations:
- Issue #1: Basic implementation
- Pull Request #2: Context recommendations
- Issue #3 & PR #4: Complex implementations
- Issue #5: Detailed specifications
- Issue #7: Minimal specification test
All solutions demonstrated high-quality code generation with appropriate Rust idioms.
Pragmatic Problem-Solving in React Development
When faced with a non-functional React application generated through Kiro’s spec-driven approach, GitHub Copilot proved its value through rapid iteration and real-time feedback. Working with Copilot allowed for immediate course-correction when issues arose, rather than discovering problems only after completing an entire task sequence.
The key advantage: Copilot’s interactive nature allows you to test frequently and adapt implementation based on actual runtime behavior, making it particularly effective for rapid prototyping and iterative development.
Configuration & Context
GitHub Copilot offers organization-wide instructions that apply across all repositories. The main template allows you to specify:
- Prefer writing <language> if no language is specified.
- Use <package manager> for <language> dependencies
- Prioritize <knowledge base> when asking about <topic>.
- Respond with <bullet points/minimal preamble>.

See more at Copilot Coding Agent Documentation
Claude Code
The Experience
Claude Code operates through the terminal, offering a developer-friendly approach to AI-assisted coding. While you can define configuration files, the tool doesn’t proactively suggest setting them up during initial use.
Notable Behaviors:
- Requires CLI restart or explicit reload when configuration files are modified
- Offers auto-execution for subsequent command runs (a significant time-saver)
- Provides clear options for command automation preferences

Command Automation
The tool excels at learning your workflow patterns. When running tests, it offers three options:
- Yes – Run once
- Yes, and don’t ask again for
cargo test
commands in this project - No, and tell Claude what to do differently

This granular control over automation strikes a good balance between safety and efficiency.
See more at Claude Code Documentation
Cursor
The Experience
Cursor delivered a smooth, frictionless experience with minimal setup required. Simply input your prompt, and the AI handles the implementation effectively.
Key Observations:
- No automatic configuration file proposals
- No discovered option to whitelist safe commands for auto-execution
- Supports cursor-specific configurations, but doesn’t guide users to set them up
- Requires manual confirmation for every command execution
- update: with 1.7.28 I could configure command allow lists
The IDE’s VS Code heritage shows in its familiar interface, making adoption easy for existing VS Code users. However, the lack of command automation options means more manual intervention compared to some competitors.

See more at Cursor IDE Documentation
Windsurf
The Experience
Windsurf stood out with its flexible automation controls and the Grok coding model. The IDE didn’t propose configuration or context files automatically, but its implementation capabilities impressed.
Code Style Adaptability:
- Initial implementation used iterative coding style
- Successfully adapted to functional programming style upon request
- Demonstrated strong understanding of different paradigms
Automation Features
Windsurf’s command execution modes offer excellent flexibility:
- Off – Manual approval required
- Auto – Smart automation based on context
- Turbo – Maximum automation for trusted operations

This granular control over automation levels addresses a common pain point in AI-assisted development.
See more at Windsurf IDE Documentation
Kiro
The Spec-Driven Revolution
Kiro is an agentic IDE that promises to bridge the gap between prototype and production through spec-driven development, agent hooks, and natural language coding assistance. What sets it apart isn’t just another AI coding assistant, but its unique approach to structured development workflows that prioritize planning over improvisation.
Two Development Paradigms
Kiro offers two distinct development approaches:
- Vibe Coding: Traditional chat-based interaction where you collaborate directly with the AI agent to build software iteratively
- Spec-First Development: A structured approach where you define requirements and design in markdown files, then let the agent implement based on these specifications
The Three-Layer Spec Framework
Kiro’s spec-driven approach follows a hierarchical structure:
- Requirements with EARS Syntax: Kiro adopts the Easy Approach to Requirements Syntax, which brings clarity, testability, traceability, and completeness to requirements
- Technical Design Blueprint: The design.md file serves as your system’s technical blueprint, documenting architecture, component interactions, and sequence diagrams
- Implementation Tasks: The tasks layer becomes your implementation roadmap that the Kiro agent uses to generate actual application code
Real-World Testing: The Weird Animals Quiz Project
For a React-based quiz application featuring unusual animals, Kiro’s structured approach generated comprehensive documentation including requirements specification with clear user stories and acceptance criteria, design documentation with UI definitions and component architecture, and implementation tasks with granular, step-by-step development plans.
However, the execution fell short of the promise. Despite methodically completing each generated task, the final React application wouldn’t start. The disconnect between Kiro’s structured planning and practical implementation became apparent—having great specs doesn’t guarantee working code.
Security-First Agent Control
One of Kiro’s standout features is its approach to AI Security through granular command control. The platform offers two execution modes:
- Autopilot Mode: The agent makes autonomous decisions and executes commands without supervision
- Supervised Mode: You maintain control over agent actions, with the ability to trust specific commands at different granularity levels
Kiro’s trust system operates at three levels:
- Base Trust: All flags and subcommands of a specific terminal command (e.g.,
npm *
) - Partial Trust: All flags and follow-up commands of a specific terminal subcommand (e.g.,
npm run *
) - Full Trust: The exact terminal command with all subcommands and flags
These trust settings can be configured at both user and workspace levels, providing flexibility for individual preferences and team standards.
Current Limitations:
- Sequential Task Execution: Only one task can run at a time, requiring careful task ordering. When a task gets stuck, the entire workflow halts
- Token Limitations: Complex projects quickly hit token limits, even when switching between different language models
- Development Velocity: Compared to rapid prototyping approaches, the spec-driven methodology requires significantly more upfront time investment
Lessons Learned
One key learning for future spec-driven approaches is to create task templates that contain health checks. For example:
- Implement feature according to specification
- Run automated tests
- Start application locally
- Perform smoke tests on implemented functionality
- Verify integration with existing components
Steering and Conventions
Kiro includes a steering feature that allows you to define organizational conventions and coding standards. This could be particularly valuable for teams wanting to maintain consistency across AI-generated code.
Gemini CLI
The Terminal-First Approach
Gemini CLI is an open-source AI agent that brings the power of Gemini directly into your terminal, providing lightweight access with the most direct path from prompt to model.
Built-in Capabilities
Gemini CLI includes built-in tools for Google Search grounding, file operations, shell commands, and web fetching. The integration with MCP (Model Context Protocol) allows for custom tool extensions.
Unique Features
ReAct Loop Implementation: The CLI uses a reason and act (ReAct) loop with built-in tools and MCP servers to complete complex use cases like fixing bugs and creating new features.
Cross-Platform Consistency: Available through multiple access points:
- Direct terminal usage via CLI
- VS Code integration through Code Assist
- Cloud Shell Editor with pre-installed setup
- Integration with Firebase for full-stack development
Conclusion
After extensive testing, several key insights emerged:
The Right Tool for the Right Job
This experiment reinforced a fundamental truth: tools should serve our development process, not dictate it. Kiro’s structured approach has merit for complex projects requiring careful planning, while Copilot and other related approaches excel at rapid prototyping and iterative development.
The real value lies in understanding when to apply each approach. For greenfield projects with unclear requirements, rapid prototyping’s flexibility shines. For enterprise applications with strict architectural constraints, GitHub Copilot’s approach would be my goto solution at the moment.
Common Strengths
- All tested approaches deliver production-ready code
- Results are consistently usable with or without configuration files
Differentiation Points
Development Philosophy:
- Best for Planning: Kiro excels at structured, spec-driven development with comprehensive documentation
- Best for Rapid Iteration: GitHub Copilot for quick prototyping and real-time feedback
- Terminal-First: Gemini CLI and Claude Code for CLI enthusiasts
- IDE-Integrated: Cursor and Windsurf for visual development
- Enterprise-Ready: GitHub Copilot with organization-wide settings
Current Professional Choice
My professional context remains focused on GitHub Copilot due to:
- Deep GitHub integration
- Organization-wide configuration capabilities
- Mature ecosystem and proven track record
- Seamless CI/CD pipeline integration
- Proven ability to deliver working code quickly
However, this is a rapidly evolving space. The introduction of spec-driven development (Kiro), massive free tiers (Gemini CLI), and innovative automation controls (Windsurf) suggests future features may shift the competitive landscape.
Tools Not Tested
Due to time constraints, I didn’t evaluate:
- Devin, OpenDevin, Devika
- SWE-Agent
- Amazon Q Developer
- Tabnine, Cline, Bito Wingman
- Zencoder, Replit AI Agent, Anterion
These and potentially other tools remain valuable candidates for future testing as the agentic AI coding space continues to mature.
Final Thoughts
As AI-assisted development continues to evolve, the winners won’t be the tools that replace human judgment, but those that amplify our ability to make informed decisions throughout the development lifecycle.
Not only the AI-assisted development evolves, also the security related reports. Watch out for reports about the tool of your choice and make sure you add these to your consideration. Below examples of recent security related reports:
- Researchers Disclose Google Gemini AI Flaws Allowing Prompt Injection and Cloud Exploits
- DEFCON 33 – Mind the Data Voids: Hijacking Copilot Trust to Deliver C2 Instructions
What’s your experience with AI coding agents? Which features matter most for your workflow? Share your thoughts and experiences in the comments below.
Leave a Reply