Organizing Content with Astro Content Schemas

3 weeks ago 1

With Astro’s Content Collection Schema functionality, you can enforce structured data like uniform tags to keep your content organized and SEO-friendly. In this post, we’ll dive into how you can define and validate tags using Astro’s schema system, ensuring every article follows the same tagging conventions. Whether you’re managing a blog, documentation site, or any content-heavy project, understanding how to leverage Astro’s Content Collections can help you streamline your workflow and maintain consistency effortlessly.

Defining Content Collection Schemas in Astro

I recently decided to do some housekeeping on this site. I upgraded to Astro 5 (specifically Astro 5.4) and refined the tag functionality on this website.

After I created the tags index for this site I realized there was some duplication within the tags like Git, GitHub and Git/GitHub depending on the post.

Git Git/GitHub GitHub

and in some instances there may have been inconsistent casing for tags that were otherwise the same like:

JavaScript javascript

Zod Validation in Astro Content Schemas

One of the nice features of how Astro manages content is that it has built-in Zod support to enforce consistent schemas within a content collection like restricting valid content tags to a predefined list, including those compromised of Markdown or MDX files. Zod describes itself as “TypeScript-first schema validation with static type inference”.

“Schemas enforce consistent frontmatter or entry data within a collection through Zod validation. A schema guarantees that this data exists in a predictable form when you need to reference or query it. If any file violates its collection schema, Astro will provide a helpful error to let you know.” - Astro Content Collections Documentation

With Astro’s content collections you can define a specific schema to ensure that all of your content contains the necessary data. The configuration for content collections should be stored in src/content.config.ts. This is a special file that Astro uses to determine how your content should be structured.

One of the schemas I have defined for this site is for my bookshelf to make sure that the data for each book is an object containing string values for the title, author, and external link. The external link also should be a string that starts with "https://www.google.com/books/edition/".

// import { defineCollection, z } from "astro:content"; // import { glob } from "astro/loaders"; const bookshelf = defineCollection({ loader: glob({ pattern: "**/[^_]*.json", base: "./src/content/bookshelf" }), schema: z.object({ title: z.string(), author: z.string(), external_link: z .string() .startsWith("https://www.google.com/books/edition/"), }), });

a sample valid entry would be:

{ "title": "The Nature of Code", "author": "Daniel Shiffman", "year": 2024, "external_link": "https://www.google.com/books/edition/The_Nature_of_Code/Iv_REAAAQBAJ?hl=en&gbpv=0" }

However the below object would be invalid as it’s missing the external_link

{ "title": "The Nature of Code", "author": "Daniel Shiffman", "year": 2024 }

Articles have a different schema than books. Before normalizing the data for the tags the schema defined for this site’s articles looked similar to the below object which indicates that Astro expects the tag field in all of the posts to be an array of strings and tags are optional. The keys without optional() in their value like the title, description and pubDate are required for every valid post.

// import { defineCollection, z } from "astro:content"; // import { glob } from "astro/loaders"; const blog = defineCollection({ loader: glob({ pattern: "**/[^_]*.{md,mdx}", base: "./src/content/blog" }), schema: z.object({ title: z.string(), long_title: z.string().optional(), description: z.string(), featured: z.boolean().optional(), pubDate: z .string() .or(z.date()) .transform((val) => new Date(val)), tags: z.array(z.string()).optional(), }), });

Each key in the schema maps to the frontmatter of my articles. For example the below would be considered valid:

--- title: Organizing Content with Astro Content Schemas pubDate: 2025-03-06T18:12:21.580Z description: With Astro's Content Collection Schema functionality, you can enforce structured, uniform tags to keep your content organized and SEO-friendly. In this post, we’ll dive into how you can define and validate tags using Astro’s schema system, ensuring every post follows the same tagging conventions. Whether you’re managing a blog, documentation site, or any content-heavy project, this guide will help you streamline your workflow and maintain consistency effortlessly. tags: ["Astro", "TypeScript"] ---

Using Enums in Zod for Predefined Valid Content

The inconsistency within the tagging naming conventions revealed that limiting the tags to an array of strings was not restrictive enough so I decided to create a predefined list of valid topics and codified this by defining a Zod enum of valid topics.

The Zod enum looks something like:

// import { z } from "astro:content"; const topics = z.enum([ "Astro", "Animation", "Community", "Creative Coding", "CSS", "Developer Productivity", "Functional Programming", "Git/GitHub", "JavaScript", "MDX", "Netlify", "NextJS", "p5⁎js", "React", "VSCode", ]);

and once the topics enum was defined it could be used in the schema for articles to expect that valid tags existed within topics.

- tags: z.array(z.string()).optional() + tags: z.array(topics).optional()

With this set up I can easily view the list of valid tags and extend it as needed.

What happens if content is not aligned with the Astro schema?

Now that valid tags are restricted to a predefined list of topics whenever a new post is created it must match the schema or Astro will throw an error. Attempting to include an invalid tag results in an error that must be resolved before the app will build and the error looks something like:

[ERROR] [UnhandledRejection] Astro detected an unhandled rejection. Here's the stack trace: InvalidContentEntryDataError: blog → tagging-schema-in-astro data does not match collection schema. tags.1: Invalid enum value. Expected 'Astro' | 'Animation'| 'Community' | 'Creative Coding' | 'CSS' | 'Developer Productivity' | 'Functional Programming'| 'Git/GitHub' | 'JavaScript' | 'MDX' | 'Netlify' | 'NextJS' | 'p5⁎js' | 'React' |'VSCode' | 'WebMention', received 'Git'

Astro’s InvalidContentEntryDataError errors can help developers quickly identify any content that has invalid data such as tags. In my case, the errors saved me time since there was a limited number of articles that needed to be edited and I was manually (or finding/replacing) to make sure all content was valid with the new schema.

Enforcing Character Limits with Zod

I am looking forward to further refining the schemas that I use in Astro and exploring more of Zod’s functionality. In addition to normalizing tags, Zod can define other specific data types. For example, I am considering updating my schema to distinguish how a valid long_title and title diverge by using .max(chars) from Zod. When a long_title is defined for a post it’s displayed throughout my site, however, a shorter title might be better for meta images. Every article must have a valid title but the long_title is an optional field that can be used, when space is not limited, to convey more context than the 47-character limit set for titles below.

long_title: z.string().optional() title: z.string().max(47),

Zod can also be configured to enforce a .min(chars) or an exact .length(chars). I encourage you to explore the Zod documentation and the Astro Content Collection Schema documentation for more details and ideas.

Read Entire Article