Writing a lightweight related content function for WordPress using tags to match content

Strangely, WordPress doesn’t ship with it’s own related content feature so there are a number available to download and use from the plugin directory. However, in my experience, most of these are quite resource-hungry, largely due to the fact that they build up the list of related content by string matching. This is a very expensive process and depending on the number of posts the script has to match against can lead to a slowdown in the database.

I’ve opted to write my own related content plugin using WordPress tags. Generally, a post should not have more than 5 tags attached to it. This makes the matching process much faster and less intensive on the database as tags are only ever matched against other tags attached to posts and never against the content of the posts themselves which could be lengthy.

Now this has the potential to get out of hand as tags can be easily entered by post authors leading to duplicates and unnecessary tagging which can then lead to poor relative matching. However, any professional organisation will have a publishing process and part of that should be for authors to avoid adding tags before checking if a suitable one already exists in the system.

View in Instagram ⇒

Redirect, don’t 404, when a web page is not found

The ‘Page Not Found’, or 404, error page is the default response for any request made to your website that can not be completed due to a missing file or resource.

Plain 404 error pageAll web servers issue the 404 error and the website, most often a content management system framework nowadays, like WordPress, acts on the 404 server response by displaying a page informing the user that the page requested via the url cannot be found.

Back in the day, the 404 error page was a plain, white page with some technical information. Worse than the lack of design, the page didn’t provide the user with any options or direction as to what to do next.

This was somewhat remedied when digital designers began creating 404 pages that had attractive design elements as well as information like links to the latest content and a search box to help the user find what they were looking for using keywords.

404 error page with space invaders gameMany of these 404 pages are works of art in their own right – some going so far as to incorporate interactive games. Many even encouraged the user to share the 404 page on social media!

The user-experience- and business-minded among you will immediately see why these attractive, interactive, share-worthy 404 error pages actually exacerbate the problem that a user is having; They’ve asked to see a certain page on your site and are instead presented with options to take them further away from the desired resource or to waste their valuable time with games. Any frustration your visitor is feeling by hitting a 404 page is likely to be amplified by these options.

A better solution is to automatically redirect the user to content that they are trying to access, but may not know the correct web address to get there. To illustrate I’ll use a simple example:

A user wants to see your About Us page and types http://mysite.org/aboutus.html into their browser. However, this page doesn’t exist; the real address of your About Us page is http://mysite.org/about-us, so instead of showing them a 404 page how about taking the web address they’ve typed in, assessing which page they might actually be trying to get to and then redirecting them to it. So, even when your user types in /aboutus.html they are taken to /about-us by the site. This makes for a much better user experience.

Now, I did say the example is a simple one, since the user even when hitting a 404 can still use the top-level navigation of your site to find the correct About Us page.

However, say your site has undergone a restructure e.g. it may have a been a static site and it now sits on a CMS that forces a particular url structure that your site did not previously observe, or indeed the site has moved from one CMS to another. If you have a well established online brand then there may be a number of emails and other web sites that carry old links to your site, links still utilising the old structure. So we want a way to map the old site links to the new site links, so that an old url called is automatically redirected to its newer format, so a user never doesn’t see a frustrating 404 page.

This problem is easily solved when using open source content management systems like WordPress, which has a number of redirect plugins that make adding this functionality to your website simple.

One such plugin of which have some experience is Redirection for WordPress. This is a powerful plugin that allows you to set up custom redirects, to set auto-redirects so the user is sent to the closest matching url, and it logs 404 errors to help set up those custom redirects. It’s free and worth checking out if you’re in the market to improve your user experience and help visitors find information on your site rather than hitting them with a frustrating 404, no matter how attractive.

Is your WordPress site slow? XML-RPC could be under attack!

Recently, a WordPress site I manage was having serious downtime issues. Calling the site from a browser resulted in a lag time of many minutes!

Upon looking at the running processes on the server, the list contained multiple Apache processes, around 20, all running at around 20MB each. The maths comes in at 400MB of RAM being used for the the Apache processes and that was resulting in all the allocated server RAM, as well as 100% of the CPU, being consumed. This meant no new client connections were served.

Multiple hard reboots of the server did not solve the problem. The Apache processes were back almost as soon as the server came back up. This was fishy as that meant the connections were sustained in some manner.

On further inspection it appeared that 18 of the Apache processes were connections from a single IP address and they were all requesting a single file – xmlrpc.php

As it happens this is quite a popular way to attack a website and crash it. Although I’m still looking for a long-term solution, in the short term I’ve blocked that IP. Another way to safeguard your site is to control access to xmlrpc.php via your .htaccess file. However take care with this file as it can prove to be quite useful. More here – https://wordpress.org/support/topic/what-is-xml-rpc-good-for

I’ll post more when I have a real solution.

Update: The WordFence site has some more information on XML-RPC as a security risk and how to disable it.