command line blog

Check your websites for broken links

How I use linkchecker to help me keep my links fresh.

Gabriel Augendre

Mar 17, 2024 • 3 min read

Let's say you've created your website, published a few pages or articles some time ago, and included links to other websites. Maybe, in the meantime, you also changed your publishing platform.

The bottom line is that you may have broken links without knowing it. It may be bad for SEO (I don't know, nor care), but it's especially not a good experience for your readers! They'll want to click on that shiny link! But they'll end up on a 404 error page and they'll be sad 😢

Enter linkchecker. It's a CLI tool designed to check outbound links recursively on your website. Here's how I run it on my website:

#!/bin/bash

# ignores:
# images because it looks like that's how ghost works
# ovhcloud because of geographical redirect (I want to keep the root link)
# cv because that's the URL I want to share
# youtube because it redirects to a consent page
# ifttt because it redirects to a login page
# webmentions because I don't know what it is

linkchecker --threads=50 --timeout=5 \
  --ignore-url="/content/images/.*"  \
  --ignore-url="https://www.ovhcloud.com/" \
  --ignore-url="https://cv-gabriel.augendre.info/" \
  --ignore-url="https://www.youtube.com/.*" \
  --ignore-url="https://ifttt.com/.*" \
  --ignore-url="https://gabnotes.org/webmentions/receive/"  \
  --check-extern \
  https://gabnotes.org

By default, linkchecker will return an error for broken links and warn about redirections – the rationale being that you could rewrite your link to the redirected page. I've done that for a few links but chose to ignore others, as mentioned in the comment in my script.

Here's what the output looked like today (2024-03-17):

$ ./linkchecker
LinkChecker 10.0.1
Copyright (C) 2000-2016 Bastian Kleineidam, 2010-2021 LinkChecker Authors
LinkChecker comes with ABSOLUTELY NO WARRANTY!
This is free software, and you are welcome to redistribute it under
certain conditions. Look at the file `LICENSE' within this distribution.
Get the newest version at https://linkchecker.github.io/linkchecker/
Write comments and bugs to https://github.com/linkchecker/linkchecker/issues

Start checking at 2024-03-17 10:58:40+002
50 threads active,     5 links queued,   23 links in  78 URLs checked, runtime 1 seconds
50 threads active,    48 links queued,   73 links in 171 URLs checked, runtime 6 seconds
50 threads active,    85 links queued,  100 links in 235 URLs checked, runtime 11 seconds
50 threads active,    73 links queued,  113 links in 236 URLs checked, runtime 16 seconds
50 threads active,    90 links queued,  156 links in 296 URLs checked, runtime 21 seconds
50 threads active,   117 links queued,  209 links in 376 URLs checked, runtime 26 seconds
50 threads active,    65 links queued,  261 links in 376 URLs checked, runtime 31 seconds
50 threads active,    23 links queued,  303 links in 377 URLs checked, runtime 36 seconds
21 threads active,     0 links queued,  355 links in 378 URLs checked, runtime 41 seconds

URL        `https://drewdevault.com/make-a-blog'
Name       `You should make a blog!'
Parent URL https://gabnotes.org/im-starting-a-blog/, line 249, col 188
Real URL   https://drewdevault.com/make-a-blog
Check time 5.273 seconds
Size       20B
Result     Error: 404 Not Found

Statistics:
Downloaded: 1.95MB.
Content types: 18 image, 229 text, 0 video, 0 audio, 14 application, 1 mail and 114 other.
URL lengths: min=15, max=205, avg=54.

That's it. 376 links in 378 URLs checked. 0 warnings found. 1 errors found.
Stopped checking at 2024-03-17 10:59:26+002 (45 seconds)

Looks like I have a broken link on one of my early articles. Luckily, someone saved it on the Internet Archive's Wayback Machine, so I updated the link to this version. Now visitors to this page will have a working link instead of a bland 404.

Thanks linkchecker!

Sign up for more like this.