ai-benchmark/tests/summarization/https___venturebeat.com_infrastructure_claude-code-costs-up-to-usd200-a-month-goose-does-the-same-thing-for-free.txt

Featured Claude Code costs up to $200 a month. Goose does the same thing for free. Michael Nuñez January 19, 2026 Credit: VentureBeat made with Midjourney The artificial intelligence coding revolution comes with a catch: it's expensive. Claude Code, Anthropic's terminal-based AI agent that can write, debug, and deploy code autonomously, has captured the imagination of software developers worldwide. But its pricing — ranging from $20 to $200 per month depending on usage — has sparked a growing rebellion among the very programmers it aims to serve. Now, a free alternative is gaining traction. Goose, an open-source AI agent developed by Block (the financial technology company formerly known as Square), offers nearly identical functionality to Claude Code but runs entirely on a user's local machine. No subscription fees. No cloud dependency. No rate limits that reset every five hours. 0:03 / 14:09 Keep Watching "Your data stays with you, period," said Parth Sareen, a software engineer who demonstrated the tool during a recent livestream. The comment captures the core appeal: Goose gives developers complete control over their AI-powered workflow, including the ability to work offline — even on an airplane. The project has exploded in popularity. Goose now boasts more than 26,100 stars on GitHub, the code-sharing platform, with 362 contributors and 102 releases since its launch. The latest version, 1.20.1, shipped on January 19, 2026, reflecting a development pace that rivals commercial products. For developers frustrated by Claude Code's pricing structure and usage caps, Goose represents something increasingly rare in the AI industry: a genuinely free, no-strings-attached option for serious work. Anthropic's new rate limits spark a developer revolt To understand why Goose matters, you need to understand the Claude Code pricing controversy. Anthropic, the San Francisco artificial intelligence company founded by former OpenAI executives, offers Claude Code as part of its subscription tiers. The free plan provides no access whatsoever. The Pro plan, at $17 per month with annual billing (or $20 monthly), limits users to just 10 to 40 prompts every five hours — a constraint that serious developers exhaust within minutes of intensive work. The Max plans, at $100 and $200 per month, offer more headroom: 50 to 200 prompts and 200 to 800 prompts respectively, plus access to Anthropic's most powerful model, Claude 4.5 Opus. But even these premium tiers come with restrictions that have inflamed the developer community. In late July, Anthropic announced new weekly rate limits. Under the system, Pro users receive 40 to 80 hours of Sonnet 4 usage per week. Max users at the $200 tier get 240 to 480 hours of Sonnet 4, plus 24 to 40 hours of Opus 4. Nearly five months later, the frustration has not subsided. The problem? Those "hours" are not actual hours. They represent token-based limits that vary wildly depending on codebase size, conversation length, and the complexity of the code being processed. Independent analysis suggests the actual per-session limits translate to roughly 44,000 tokens for Pro users and 220,000 tokens for the $200 Max plan. "It's confusing and vague," one developer wrote in a widely shared analysis. "When they say '24-40 hours of Opus 4,' that doesn't really tell you anything useful about what you're actually getting." The backlash on Reddit and developer forums has been fierce. Some users report hitting their daily limits within 30 minutes of intensive coding. Others have canceled their subscriptions entirely, calling the new restrictions "a joke" and "unusable for real work." Anthropic has defended the changes, stating that the limits affect fewer than five percent of users and target people running Claude Code "continuously in the background, 24/7." But the company has not clarified whether that figure refers to five percent of Max subscribers or five percent of all users — a distinction that matters enormously. How Block built a free AI coding agent that works offline Goose takes a radically different approach to the same problem. Built by Block, the payments company led by Jack Dorsey, Goose is what engineers call an "on-machine AI agent." Unlike Claude Code, which sends your queries to Anthropic's servers for processing, Goose can run entirely on your local computer using open-source language models that you download and control yourself. The project's documentation describes it as going "beyond code suggestions" to "install, execute, edit, and test with any LLM." That last phrase — "any LLM" — is the key differentiator. Goose is model-agnostic by design. You can connect Goose to Anthropic's Claude models if you have API access. You can use OpenAI's GPT-5 or Google's Gemini. You can route it through services like Groq or OpenRouter. Or — and this is where things get interesting — you can run it entirely locally using tools like Ollama, which let you download and execute open-source models on your own hardware. The practical implications are significant. With a local setup, there are no subscription fees, no usage caps, no rate limits, and no concerns about your code being sent to external servers. Your conversations with the AI never leave your machine. "I use Ollama all the time on planes — it's a lot of fun!" Sareen noted during a demonstration, highlighting how local models free developers from the constraints of internet connectivity. What Goose can do that traditional code assistants can't Goose operates as a command-line tool or desktop application that can autonomously perform complex development tasks. It can build entire projects from scratch, write and execute code, debug failures, orchestrate workflows across multiple files, and interact with external APIs — all without constant human oversight. The architecture relies on what the AI industry calls "tool calling" or "function calling" — the ability for a language model to request specific actions from external systems. When you ask Goose to create a new file, run a test suite, or check the status of a GitHub pull request, it doesn't just generate text describing what should happen. It actually executes those operations. This capability depends heavily on the underlying language model. Claude 4 models from Anthropic currently perform best at tool calling, according to the Berkeley Function-Calling Leaderboard, which ranks models on their ability to translate natural language requests into executable code and system commands. But newer open-source models are catching up quickly. Goose's documentation highlights several options with strong tool-calling support: Meta's Llama series, Alibaba's Qwen models, Google's Gemma variants, and DeepSeek's reasoning-focused architectures. The tool also integrates with the Model Context Protocol, or MCP, an emerging standard for connecting AI agents to external services. Through MCP, Goose can access databases, search engines, file systems, and third-party APIs — extending its capabilities far beyond what the base language model provides. Setting Up Goose with a Local Model For developers interested in a completely free, privacy-preserving setup, the process involves three main components: Goose itself, Ollama (a tool for running open-source models locally), and a compatible language model. Step 1: Install Ollama Ollama is an open-source project that dramatically simplifies the process of running large language models on personal hardware. It handles the complex work of downloading, optimizing, and serving models through a simple interface. Download and install Ollama from ollama.com. Once installed, you can pull models with a single command. For coding tasks, Qwen 2.5 offers strong tool-calling support: ollama run qwen2.5 The model downloads automatically and begins running on your machine. Step 2: Install Goose Goose is available as both a desktop application and a command-line interface. The desktop version provides a more visual experience, while the CLI appeals to developers who prefer working entirely in the terminal. Installation instructions vary by operating system but generally involve downloading from Goose's GitHub releases page or using a package manager. Block provides pre-built binaries for macOS (both Intel and Apple Silicon), Windows, and Linux. Step 3: Configure the Connection In Goose Desktop, navigate to Settings, then Configure Provider, and select Ollama. Confirm that the API Host is set to http://localhost:11434 (Ollama's default port) and click Submit. For the command-line version, run goose configure, select "Configure Providers," choose Ollama, and enter the model name when prompted. That's it. Goose is now connected to a language model running entirely on your hardware, ready to execute complex coding tasks without any subscription fees or external dependencies. The RAM, processing power, and trade-offs you should know about The obvious question: what kind of computer do you need? Running large language models locally requires substantially more computational resources than typical software. The key constraint is memory — specifically, RAM on most systems, or VRAM if using a dedicated graphics card for acceleration. Block's documentation suggests that 32 gigabytes of RAM provides "a solid baseline for larger models and outputs." For Mac users, this means the computer's unified memory is the primary bottleneck. For Windows and Linux users with discrete NVIDIA graphics cards, GPU memory (VRAM) matters more for acceleration. But you don't necessarily need expensive hardware to get started. Smaller models with fewer parameters run on much more modest systems. Qwen 2.5, for instance, comes in multiple sizes, and the smaller variants can operate effectively on machines with 16 gigabytes of RAM. "You don't need to run the largest models to get excellent results," Sareen emphasized. The practical recommendation: start with a smaller model to test your workflow, then scale up as needed. For context, Apple's entry-level MacBook Air with 8 gigabytes of RAM would struggle with most capable coding models. But a MacBook Pro with 32 gigabytes — increasingly common among professional developers — handles them comfortably. Why keeping your code off the cloud matters more than ever Goose with a local LLM is not a perfect substitute for Claude Code. The comparison involves real trade-offs that developers should understand. Model Quality: Claude 4.5 Opus, Anthropic's flagship model, remains arguably the most capable AI for software engineering tasks. It excels at understanding complex codebases, following nuanced instructions, and producing high-quality code on the first attempt. Open-source models have improved dramatically, but a gap persists — particularly for the most challenging tasks. One developer who switched to the $200 Claude Code plan described the difference bluntly: "When I say 'make this look modern,' Opus knows what I mean. Other models give me Bootstrap circa 2015." Context Window: Claude Sonnet 4.5, accessible through the API, offers a massive one-million-token context window — enough to load entire large codebases without chunking or context management issues. Most local models are limited to 4,096 or 8,192 tokens by default, though many can be configured for longer contexts at the cost of increased memory usage and slower processing. Speed: Cloud-based services like Claude Code run on dedicated server hardware optimized for AI inference. Local models, running on consumer laptops, typically process requests more slowly. The difference matters for iterative workflows where you're making rapid changes and waiting for AI feedback. Tooling Maturity: Claude Code benefits from Anthropic's dedicated engineering resources. Features like prompt caching (which can reduce costs by up to 90 percent for repeated contexts) and structured outputs are polished and well-documented. Goose, while actively developed with 102 releases to date, relies on community contributions and may lack equivalent refinement in specific areas. How Goose stacks up against Cursor, GitHub Copilot, and the paid AI coding market Goose enters a crowded market of AI coding tools, but occupies a distinctive position. Cursor, a popular AI-enhanced code editor, charges $20 per month for its Pro tier and $200 for Ultra—pricing that mirrors Claude Code's Max plans. Cursor provides approximately 4,500 Sonnet 4 requests per month at the Ultra level, a substantially different allocation model than Claude Code's hourly resets. Cline, Roo Code, and similar open-source projects offer AI coding assistance but with varying levels of autonomy and tool integration. Many focus on code completion rather than the agentic task execution that defines Goose and Claude Code. Amazon's CodeWhisperer, GitHub Copilot, and enterprise offerings from major cloud providers target large organizations with complex procurement processes and dedicated budgets. They are less relevant to individual developers and small teams seeking lightweight, flexible tools. Goose's combination of genuine autonomy, model agnosticism, local operation, and zero cost creates a unique value proposition. The tool is not trying to compete with commercial offerings on polish or model quality. It's competing on freedom — both financial and architectural. The $200-a-month era for AI coding tools may be ending The AI coding tools market is evolving quickly. Open-source models are improving at a pace that continually narrows the gap with proprietary alternatives. Moonshot AI's Kimi K2 and z.ai's GLM 4.5 now benchmark near Claude Sonnet 4 levels — and they're freely available. If this trajectory continues, the quality advantage that justifies Claude Code's premium pricing may erode. Anthropic would then face pressure to compete on features, user experience, and integration rather than raw model capability. For now, developers face a clear choice. Those who need the absolute best model quality, who can afford premium pricing, and who accept usage restrictions may prefer Claude Code. Those who prioritize cost, privacy, offline access, and flexibility have a genuine alternative in Goose. The fact that a $200-per-month commercial product has a zero-dollar open-source competitor with comparable core functionality is itself remarkable. It reflects both the maturation of open-source AI infrastructure and the appetite among developers for tools that respect their autonomy. Goose is not perfect. It requires more technical setup than commercial alternatives. It depends on hardware resources that not every developer possesses. Its model options, while improving rapidly, still trail the best proprietary offerings on complex tasks. But for a growing community of developers, those limitations are acceptable trade-offs for something increasingly rare in the AI landscape: a tool that truly belongs to them. Goose is available for download at github.com/block/goose. Ollama is available at ollama.com. Both projects are free and open source. Subscribe to get latest news! Deep insights for enterprise AI, data, and security leaders VB Daily AI Weekly AGI Weekly Security Weekly Data Infrastructure Weekly VB Events All of them By submitting your email, you agree to our Terms and Privacy Notice. Get updates You're in! Our latest news will be hitting your inbox soon. Credit: VentureBeat made with Midjourney TrueFoundry launches TrueFailover to automatically reroute enterprise AI traffic during model outages When OpenAI went down in December, one of TrueFoundry’s customers faced a crisis that had nothing to do with chatbots or content generation. The company uses large language models to help refill prescriptions. Every second of downtime meant thousands of dollars in lost revenue — and patients who could not access their medications on time. Michael Nuñez January 21, 2026 CleoP made with Midjourney Stop calling it 'The AI bubble': It's actually multiple bubbles, each with a different expiration date It’s the question on everyone’s minds and lips: Are we in an AI bubble? Val Bercovici, WEKA January 18, 2026 CleoP made with Midjourney Why reinforcement learning plateaus without representation depth (and other key takeaways from NeurIPS 2025) Every year, NeurIPS produces hundreds of impressive papers, and a handful that subtly reset how practitioners think about scaling, evaluation and system design. In 2025, the most consequential works weren't about a single breakthrough model. Instead, they challenged fundamental assumptions that academicians and corporations have quietly relied on: Bigger models mean better reasoning, RL creates new capabilities, attention is “solved” and generative models inevitably memorize. Maitreyi Chatterjee,Devansh Agarwal January 17, 2026 Image credit: VentureBeat with ChatGPT How Google’s 'internal RL' could unlock long-horizon AI agents Researchers at Google have developed a technique that makes it easier for AI models to learn complex reasoning tasks that usually cause LLMs to hallucinate or fall apart. Instead of training LLMs through next-token prediction, their technique, called internal reinforcement learning (internal RL), steers the model’s internal activations toward developing a high-level step-by-step solution for the input problem. Ben Dickson January 16, 2026 Credit: VentureBeat made with Midjourney Listen Labs raises $69M after viral billboard hiring stunt to scale AI customer interviews Alfred Wahlforss was running out of options. His startup, Listen Labs, needed to hire over 100 engineers, but competing against Mark Zuckerberg's $100 million offers seemed impossible. So he spent $5,000 — a fifth of his marketing budget — on a billboard in San Francisco displaying what looked like gibberish: five strings of random numbers. Michael Nuñez January 16, 2026 Shimon Ben-David, CTO, WEKA and Matt Marshall, Founder & CEO, VentureBeat Breaking through AI’s memory wall with token warehousing As agentic AI moves from experiments to real production workloads, a quiet but serious infrastructure problem is coming into focus: memory. Not compute. Not models. Memory. VB Staff January 15, 2026 Credit: VentureBeat made with Midjourney Salesforce rolls out new Slackbot AI agent as it battles Microsoft and Google in workplace AI The new Slackbot, now generally available to Business+ and Enterprise+ customers, is Salesforce's most aggressive move yet to position Slack at the center of the emerging "agentic AI" movement — where software agents work alongside humans to complete complex tasks. The launch comes as Salesforce attempts to convince investors that artificial intelligence will bolster its products rather than render them obsolete. Michael Nuñez January 13, 2026 CleoP made with Midjourney Why your LLM bill is exploding — and how semantic caching can cut it by 73% Our LLM API bill was growing 30% month-over-month. Traffic was increasing, but not that fast. When I analyzed our query logs, I found the real problem: Users ask the same questions in different ways. Sreenivasa Reddy Hulebeedu Reddy January 12, 2026 Credit: VentureBeat made with Midjourney Anthropic launches Cowork, a Claude Desktop agent that works in your files — no coding required The launch marks a major inflection point in the race to deliver practical AI agents to mainstream users, positioning Anthropic to compete not just with OpenAI and Google in conversational AI, but with Microsoft's Copilot in the burgeoning market for AI-powered productivity tools. Michael Nuñez January 12, 2026 Partner Content How DoorDash scaled without a costly ERP overhaul Presented by NetSuite VB Staff January 12, 2026 Credit:Image generated by VentureBeat with FLUX-2-Pro Nvidia’s Vera Rubin is months away — Blackwell is getting faster right now Nvidia has been able to increase Blackwell GPU performance by up to 2.8x per GPU in a period of just three short months. Sean Michael Kerner January 9, 2026 CleoP made with Midjourney Why AI feels generic: Replit CEO on slop, toys, and the missing ingredient of taste Right now in the AI world, there are a lot of percolating ideas and experimentation. But as far as Replit CEO Amjad Masad is concerned, they're just "toys": unreliable, marginally effective, and generic. Taryn Plumb January 8, 2026
==============