Generating sitemaps in Next.js with Markdown powered content
Generating and submitting a sitemap to Google is one of the crucial tasks to complete when you want Google to crawl and ultimatly rank your website. Submitting your sitemap to Google directly helps Google's bots to discover and index your entire site structure. This in theory helps Google to understand your content, its relevance and ultimately help to determine your sites search engine ranking.
Next.js doesn't natively provide sitemap support as it has no opinion as to where your content resides and is piped in from. This could be a headless CMS via REST API or GraphQL, a database, static files from AWS S3, or my personal favourite for blogs built with Next.js - Markdown.
Markdown
If you're a developer or someone who has been around in tech for a while you've probably formatted text using Markdown at least once or have heard of it.
Markdown is a lightweight markup language for creating formatted text using a plain-text editor. John Gruber and Aaron Swartz created Markdown in 2004 as a markup language that is appealing to human readers in its source code form. Markdown is widely used in blogging, instant messaging, online forums, collaborative software, documentation pages, and readme files.
Your first experience of Markdown may have been similar to mine: writing up README files in projects for GitHub. I find it easy to remember, the syntax is simple to pick up and as a solution for blogging, it can live completely in the repo, be version controlled and be previewed (with extensions) in VS Code.
It has also just been fully supported by Google Suite, so you can write and format in Google Docs without having to reach for your mouse or touchpad, awesome.
Next.js configuration overview
I wasn't planning on going to deep into the general Next.js config for handling the sitemap route. My inclination is that the sitemap should be generated on an API route as if being particular, a Sitemap isn't a page. So we will work with defining an API route /pages/api/sitemap.js
and by leveraging the rewrite ability in next.config.js
or Middleware handle a rewrite on an incoming request to this API URL. We can make use of the ability to set headers within the handler function in the API route; which will allow us full control to set the content type, any cache control and finally return our valid XML correctly.
Reading Markdown with Front Matter
You'll need to decide where to house your markdown content, I typically always settle on a content
directory at root and then split this into sub directories which align with my routes in /pages
. This is where we'll need to read our Markdown files from using Node fs and Gray-matter. Node fs is the standard file system package for Node - allowing you read, write and access the filesystem. Gray-matter is a library that reads Front Matter from strings or files. It's especially useful when using Markdown for blog posts as we can define a YAML formatted block of key value meta data at the top of our files.
---
title: "Blog Post Title"
excerpt: 'Blog post excerpt...'
meta: "Blog post meta description..."
date: '2022-00-00'
slug: 'blog-post-slug'
status: 'draft'
---
This is the first sentence of the post...
To read our file based blog posts, we'll need to put together an API to query the filesystem and return the values that we want to populate our sitemap with. The general premise is to read all the files from the directory and then return the fields we need from the defined key/values of Front-matter.
To best handle this configuration, you will need to name your blog post files with a matching slug, eg. this-is-my-article.md
and the `slug: "this-is-my-article" defined in Front-matter.
First up we'll create a new file in lib/api.ts
at the root of your Next.js application. We'll need to import in libraries that we'll be using, fs
, path
, and gray-matter
.
We can make use of path.join()
to handle joining of our current working directory (cwd) and the directory our posts reside in. The getPostsMeta()
function immediately below is the key piece of functionality for reading Markdown files from the filesystem and generating the meta data we need for our sitemap.
It intially generates an array of filenames using fs.readdirSync()
from our postsDirectory
. After this it then reduces through the returned array of filenames, calling fs.readFileSync()
to return each files contents. From this point, we import in the matter
function from the gray-matter
library and run it against the returned file contents.
Gray-matter will parse the file contents and return an object with properties that contain the Front matter in key value pairs, this is the meta data we'll need to populate our sitemap. From this point on, we conditionally check that the data object contains the field we want to return, if it does we add it to a new object and finally push it into the accumluated value of the reduce call. This function will then return a reduced array of objects containing key value pairs of the requested fields - ready for use in our sitemap.
// services/api/posts.ts
import * as fs from "fs";
import * as path from "path";
import matter from "gray-matter";
const postsDirectory = path.join(process.cwd(), "content/posts");
const getPostsMeta = (fields: string[], status: string) => {
const names = fs.readdirSync(postsDirectory);
const postData = names.reduce((acc, filename) => {
let fileContents = fs.readFileSync(`${postsDirectory}/${filename}`, "utf8");
let { data } = matter(fileContents);
let next = {};
fields.forEach((field: Fields) => {
if (field in data && data.status === status) {
next[field] = data[field];
}
});
if (next["slug"]) acc.push(next);
return acc;
}, []);
return postData;
};
export { getPostsMeta };
Next.js API route and rewrites
API routes in Next.js can be added in the /pages/api/
directory. Any file created here will be available as a endpoint which can be requested.
Any file inside the folder pages/api is mapped to /api/* and will be treated as an API endpoint instead of a page. They are server-side only bundles and won't increase your client-side bundle size.
As API routes are server-side only, any code we add here will not become part of the client-side bundle, meaning no impact to performance or penalty for importing in libraries for functionality, meaning our client-side JS bundle filesizes are unaffected.
With the above in mind, we can go ahead and create /pages/api/sitemap.ts
to handle our sitemap generation. To keep the request url for our sitemap succient we can now add a rewrite rule in next.config.js
which will route a request from somedomain.com/sitemap.xml to somedomain.com/api/sitemap. In Next.js, rewrites are defined within a async function in the config file which returns an array of rewrite objects.
// next.config.js
module.exports = {
async rewrites() {
return [
{
source: '/sitemap.xml',
destination: '/api/sitemap',
},
]
},
}
With that configured, all incoming requests to the source url will rewrite to the destination and Next.js will do the heavy lifting for us.
Generating the XML
Generating the XML for our sitemap is made easier by leveraging a library called xml2js which takes the guess work out of generating valid XML markup. The library has a function which allows you to pass in an object and it easily returns XML markup, perfect for this application.
Sitemap XML specifics
For generating sitemap XML correctly, we need to make sure we define the correct schema, the XML version, and the encoding type. Thankfully xml2js allows you to pass in a settings object when you invoke the builder function where we can define these fields.
const sitemapObj = {
'urlset': {
$: {
'xmlns': 'http://www.sitemaps.org/schemas/sitemap/0.9'
},
'url': {
'loc': url,
'lastmod': date,
}
}
};
const builder = new xml2js.Builder({
xmldec: {
'version': '1.0',
'encoding': 'UTF-8'
}
})
const xml = builder.buildObject(sitemapObj)
Putting it all together
Now we have all the pieces in place, we can construct the complete functionality in the Next.js API route. API Routes require a handler()
function to be defined and set as thee default export of the file. The handler functions takes in a request and a response params which can used to find data from the request and to generate the response.
Reviewing the type definition files will help to provide a good overview of NextApiRequest
and NextApiResponse
and what they provide, you can find the definitions in the Next.js repo.
// file: /pages/api/sitemap.ts
import { NextApiRequest, NextApiResponse } from "next";
import { getPostsMeta } from "../../lib/api";
import * as xml2js from "xml2js";
export default function handler(req: NextApiRequest, res: NextApiResponse) {
// set content type header
res.setHeader("Content-type", "text/xml");
// get our posts
const posts = getPostsMeta(["date", "slug"], "publish");
// abort with a 500 if no posts found
if (posts.length === 0) {
res.status(500).send("Error generating sitemap.");
}
// construct sitemap schema
const sitemapObj = {
urlset: {
$: {
xmlns: "http://www.sitemaps.org/schemas/sitemap/0.9",
},
url: posts.map((post) => {
return {
loc: `${process.env.POSTS_URL}${post.slug}`,
lastmod: post.date,
};
}),
},
};
// build with xml2js
const builder = new xml2js.Builder({
xmldec: {
version: "1.0",
encoding: "UTF-8",
},
});
// generate final XML
const xml = builder.buildObject(sitemapObj);
// send our response
res.status(200).send(xml);
}
Now when you visit /sitemap.xml Next.js will rewrite this request internally to our /api/sitemap route and the handler function defined above will be executed, to finally return your complete and valid sitemap!