What Are Agent Skills? What Problems Do They Solve?

Over the past few months, one of the most discussed topics in the AI engineering field has been Agent Skills, an agentic pattern introduced by Anthropic.

From a high-level perspective, we can outline the following timeline:

Anthropic introduced Skills on October 16, 2025 (link)
The community began releasing Skills-related open-source projects in late October (link)
OpenAI added Agent Skills to ChatGPT and Codex on December 12 (link)
Anthropic further introduced a standardized, cross-platform Agent Skills standard on December 18 (link)

From this timeline, we can see that Agent Skills moved from launch to adoption by the AI engineering community in just about two months. This naturally raises the question: what exactly are Agent Skills, what problems do they solve, and why has the community adopted them so rapidly?

In this article, we'll start from the pain points developers face when building and using agents, then progressively address these challenges to help readers understand the value of Agent Skills.

What Are Agent Skills?

In Anthropic's first official presentation on Agent Skills (link), Barry Zhang, who championed Agent Skills, discussed a frustration many developers experience: why, despite having powerful AI models capable of solving PhD-level problems, do these models sometimes fail to execute tasks effectively when asked?

Barry Zhang's perspective is that, much like a capable generalist who lacks domain-specific knowledge and skills might struggle to solve problems in a particular field, AI agents face the same challenge. Agent Skills address this by equipping AI agents with professional skills and knowledge, enabling them to complete tasks more effectively.

Formally, Agent Skills are defined as "folders containing instructions, scripts, and resources." These folders allow AI agents to select and use them based on the situations they encounter.

For example, Anthropic's Claude recently added PDF editing capabilities, allowing users to upload PDFs and have Claude edit them. Before continuing, you might pause and think: if you were to implement this yourself, how would you enable an AI agent like Claude to edit PDF files?

Claude's PDF editing functionality is implemented through Agent Skills. Looking at the PDF Skills that the Anthropic team published (link), the structure is as follows:

.
├── SKILL.md
├── forms.md
├── reference.md
└── scripts/
    ├── check_bounding_boxes.py
    ├── check_bounding_boxes_test.py
    └── ...

The folder contains a SKILL.md Markdown file (link) that documents which packages can be used for PDF processing and how to use them. Additionally, the folder includes a /scripts subdirectory containing various custom scripts for handling PDF files that agents can call directly.

With this folder structure, whenever an AI agent needs to handle PDF processing, it can enter the folder, follow the specifications in SKILL.md, and directly call scripts in /scripts to perform specific PDF operations.

Conceptually, Agent Skills are quite straightforward, as described above. However, this seemingly simple design has considerable depth. Let's discuss it in more detail.

Why Were Agent Skills Introduced? What Problems Do They Solve?

After understanding what Agent Skills are, readers likely have many questions. Let's explore the context in which Agent Skills were introduced and address common questions. We'll examine different scenarios to discuss the problems developers encounter when implementing or using AI agents without Agent Skills, then trace back how Agent Skills solve these problems, helping readers progressively understand their value.

Solving Non-Idempotent Execution Results

Among the many problems Agent Skills solve, perhaps the most significant is "making AI agent execution results idempotent." As we know, large language models serve as the brains of modern AI agents, and they have non-deterministic characteristics that can produce different results each time.

From a technical perspective, this means that without special handling, AI agents' execution results for tasks are often non-idempotent. (Note: idempotent refers to operations that produce the same result regardless of how many times they're executed. This is crucial for ensuring stability in software development and certain task types. To achieve stability, idempotence is key. Idempotence means that API calls or operations always produce the same result, regardless of how many times they're invoked—or put another way, making it so that multiple requests don't produce side effects. Achieving idempotence ensures that during retries, no matter how many times you retry, the operation executes exactly once, avoiding unnecessary repetition when encountering various situations. We discussed this concept in an article here)

Because of this non-determinism, since the rise of large language models, engineers have used prompt engineering to standardize model outputs by providing personalities, styles, and other settings to make responses align better with expectations.

However, in the agent era, since agents now take actions and make operations, we need methods beyond prompts to standardize agent behavior and make execution results more idempotent. Agent Skills extend this further by expanding beyond textual instructions to include scripts and resources. This enables agents to act more closely to what users intend.

For example, if an AI agent needs to process a PDF, using only the prompt "Please modify this PDF according to the following requirements (with detailed requirements)" could lead the agent to take various approaches. It might write functions from scratch to parse PDFs, extract text, and convert modified content back to PDF format (but each implementation might differ, with no guarantee of correctness each time). Alternatively, it might choose from various open-source libraries (but since many options exist, it might choose differently each time).

Because of these multiple possibilities, relying solely on instruction-based prompts makes the final results unstable. Even with the same prompt, results may differ each time. However, with Agent Skills, by pre-placing necessary packages and scripts in the skill folder, when an agent needs to process a PDF, it consistently executes the same scripts, producing stable results.

Solving Limited Context Window Issues

When Anthropic released Agent Skills, they provided examples including one where a skill folder contained only a SKILL.md file with instructions. This example confused many in the community. After all, if a Skills folder only contains SKILL.md, how is it different from regular prompts?

After all, Cursor introduced Cursor Rules two years ago (see 2-1 What Are Prompts? How to Set Prompts via .cursor/rules?), and even Anthropic's Claude Code has Slash Commands, which are also just Markdown files.

In Anthropic's official response, they mentioned two key differences. First, slash commands are invoked by humans, so if not explicitly invoked, agents won't use them. Agent Skills, however, are selected and used by AI agents based on context.

Seeing this first difference, you might ask, "Why not just let agents automatically use slash commands? Why introduce a new concept called Agent Skills?"

The reason is that when slash commands are loaded, the entire Markdown file is loaded at once. This applies not just to Anthropic's Claude but to many popular AI tools in the community. The problem this creates is that if we want agents to automatically select which to use, they need to scan all Markdown files. In other words, before actually doing anything, agents would consume vast numbers of tokens and occupy a significant portion of the context window.

Consuming many tokens and occupying much of the context window creates two problems: it increases costs (more tokens consumed means more money spent) and decreases accuracy. However, if this design causes these problems, why doesn't automatically loading Agent Skills suffer from them?

The answer is that Agent Skills employ progressive disclosure, which is the elegant aspect of Agent Skills' design. Each skill's SKILL.md file must include name and description fields noting the skill's purpose. When agents load skills, they first scan these two fields rather than loading the entire SKILL.md, significantly reducing token consumption and occupying only a tiny portion of the context window.

In fact, by using folder structures, Agent Skills uphold progressive disclosure. By splitting scripts into separate files, SKILL.md only needs to indicate which script to use in which situations, rather than including complete scripts. This further reduces token consumption.

Progressive Disclosure

As mentioned above, Agent Skills use progressive disclosure as a design pattern to reduce token consumption and use less of the context window. This design pattern appears not just in Agent Skills but in other aspects of agents, so we'll dedicate a section to it.

While progressive disclosure might sound technical, most frontend and backend engineers have encountered this pattern in their work. Common pagination is based on this pattern: instead of loading all data at once, we load a portion first, then load more when users need it by going to the next page.

Beyond Agent Skills, the industry is progressively applying progressive disclosure to MCP loading (note: readers unfamiliar with MCP can review Agent Implementation — Tools and MCP Servers).

For example, in an AI agent's context window, the space most often occupied has been MCP-related content. For instance, MCP clients like Cursor load all tools at startup, often consuming tens of thousands of tokens for MCP before using any tools at all, wasting valuable context window space.

You might ask: when loading MCPs, couldn't we just load names and descriptions first without loading complete tool specifications? Wouldn't that reduce token consumption?

You're not wrong in thinking that, unfortunately reality is more complex. Take the popular GitHub MCP in the community, for example: the entire MCP has 35 tools, and their combined names and descriptions total 26,000 tokens. This led Cursor, when initially supporting MCPs, to limit loading to only 40 tools maximum to avoid exhausting the context window.

Therefore, one of Cursor's approaches (link) was to stop loading all MCP tool descriptions upfront. Instead, they treat MCP tool descriptions as lazily-loaded files and search for and load specific tools during agent execution based on needs, rather than loading everything initially. This approach of loading only when needed reduced token usage by up to 46.9%.

Similarly, Anthropic proposed an almost identical concept in their article "Introducing advanced tool use on the Claude Developer Platform" (link): instead of loading all MCPs requiring tens of thousands of tokens upfront, they load a search tool consuming only about 500 tokens initially, then search for needed MCPs during agent execution as required. Each search might find two or three tools, consuming at most a few thousand tokens rather than tens of thousands.

This approach also embodies progressive disclosure and provides substantial practical benefits.