Web Directories, Linking, Rot

10/22/19

Tags Indieweb

I’ve recently become interested in web directories. I would be remiss not to mention my excellent neighbor, Web-Site-Ring, and there is also the NeoCities districts, which I hope continues to see updates. I also discovered peelo paalu’s directory when they updated recently. There are a TON of great sites hidden in NeoCities, but it is so easy for things to get buried.

How do you find things without search engines? Topically. The more listings there are, the more necessary it is to drill down to sub-categories. It takes time to add content, but that’s okay, because a directory provides curated content–a human is bypassing all the spam and nonsense that still gets gobbled up by search engine spiders, which must later be evaluated by algorithm.

Maintenance is another issue. Is there a way to handle link-rot programatically? Furthermore, is there a style convention?

Out of curiosity, I ran a quick check with DrLinkCheck and found many broken links here at neonaut, and it took about 5 minutes to check over 600 links. It found a lot of broken links related to issues with ugly-url/hugo that I haven’t resolved yet. Hugo-style tags/categories make it easier to find content but increase link management issues considerably. All these internal links need to be fixed, no big deal, but how to handle the external links?

I have a mixed mind about replacing link rot. You should still give people credit for things, even if the thing isn’t up anymore, you don’t necessarily need to remove the rotted link but ideally one adds a note and an updated link whenever possible. I try to credit code snippets, for instance, and I don’t remove that credit if the source goes down. The amount of link rot I’ve seen in just a few years on NeoCities does suggest to me the value of snapshotting valuable sources with archive.is, or of reproducing the source content in your own page in case the original goes down (blockquote, code blocks, etc.).

As I wrote this post I came across the perfect example. It seems appropriate my attempt to access: https://www.createdbypete.com/simple-way-to-find-broken-links-with-wget/

returns a 404. Peter Rhoades’s page is not archived in the Wayback Machine. However, bmcgurik kindly saved this gist for posterity:

#find_404s_using_wget.sh
# Full credit to Pete here: http://www.createdbypete.com/articles/simple-way-to-find-broken-links-with-wget/
wget --spider -nd -o ~/wget.log -e robots=off -w 1 -r http://www.example.com
grep -B 2 '404' example.com.log >> example.com.404.log

So we have it still. If they had not made the gist, I probably wouldn’t have been able to find it. I haven’t tried the method yet, maybe Peter took it down because it doesn’t work well or maybe the site was simply restructured, or maybe he got mad because wget still owes him $20, whatever. It’s an example.

Anyway, I’m feeling inspired by href.cool and my neighbors.