11 Nov, 2007
I've had an instinctive feel for a long time that some URL structures are better than others and whilst URLs which map to the structure of the code on the filesystem are clearly bad I thought it would be interesting to think about exactly what makes a good URL structure and what doesn't, both from a usability and technical point of view. My motivation for this today rather than at any other time is that I'm writing a chapter on Routes for the Pylons Book so I thought it would make useful background reading.
First a definition of the parts of a URL:
http://jimmyg.org:80/some/url#fragment?foo=bar |--| |---------|--|--------|--------|------| | | | | | | protocol | port | fragment | domain name path info query string
Describe the content
An obvious URL is a great URL. If a user can glance at a link in an email and know what it contains you have done your job. This means choosing URL parts which accurately describe what is contained in each folder and always using a descriptive word rather than an ID in the URL. For example, if you were designing a blog you should try to use apr instead of 04 to represent April and you should use the name of a category rather than its ID. This makes your URLs more intuitive to your users and give search engines a better chance of understanding what the page is about.
You might think that direct use of URLs is likely to decrease as people use search engines and social bookmarking sites more frequently but research this year by Edward Cutrell and Zhiwei Guan from Microsoft Research where they conducted an eyetracking study of search engine use that found that people spend 24% of their gaze time looking at the URLs in the search results. If your URLs describe their content, uses can make a better guess about whether or not your content is what they are after.
Keep it short
Try to keep your URLs as short as possible without breaking any of the other tips here. Short URLs are easier to type in or to copy and paste into documents and emails. If possible, keeping URLs to less than 80 characters is ideal so that users can pase URLs into email without having to use URL shortening tools like qurl.com or tinyurl.com.
Hyphens separate best
It is best to use single words in each part of a URL but if you have to use multiple words, for example for the title of a blog post, then hyphens are the best characters to use to separate the words. e.g. /2007/nov/my-blog-post-title/. Unfortunately the - character cannot be used in Python keywords so if you intend to use the URL fragments as Python controller names or actions you might want to convert them to _ characters first. Incidentally using hyphens to separate words is also the most readable way of separating terms in CSS styles.
Static-looking URLs are best
Regardless of how your content is actually generated it is worth structuring URLs so that they don't contain lots of &, = and ? characters which most visitors won't properly understand. If you can write a URL like ?type=food&category=apple as /food/apple then users can see much more quickly what is about.
Keeping URLs lowercase makes your life easier
The protocol and domain name parts of a URL can technically be entered in any case but the part after the # is case sensitive. How a particular server treats anything between the two depends on the server, operating system and what the URL resolves to. UNIX is case-sensitive, while Windows isn't so if the URL resolves to a file, Windows servers will generally allow any case whilst UNIX ones won't. Query string parameters are also case sensitive. You can generally save yourself a headache by keeping everything lowercase and issuing a 404 for anything which isn't. Of course if you are writing a wiki where the page names depend on the capitalisation then you'll need to make the URLs case sensitive.
Keep the underlying technology out of the URL
Use singular terms rather than plural
This is a matter of personal preference but rather than having a URL like /people/james use /person/james. It is likely that the last part of a URL will describe one thing, so the previous parts of the URL should describe that thing too. In this case james is a person, not a people so /person/james is more appropriate. You can use this convention throughout your application in naming controllers, database tables etc.
Only use Disambiguated URLs
Any piece of content should have one and only one definitive URL, with any alternatives acting as a permanent redirect. In the past features like Apache's DirectoryIndex have meant that if you entered a URL which resolved to a folder, the default document for that folder would be served. This means that two URLs would exist for one resource (discussed by J Tauber here). To make matters worse servers are configured so that http://www.example.com/someresource and http://example.com/someresource both point to the same resource. This means there can easily be 4 URLs for the same resource.
There are three good reasons why this is bad:
Browser or server caches will have to cache 4 versions of the page. Put another way this means they can't improve performance if you visit a different version of the same URL the second time.
All versions of the page will be treated by web browsers as different resources so the user's browsing history won't be accurate.
Search engines and social bookmarking sites give pages that are linked to more frequently a higher rank. If you have 4 different URLs for the same page you are effectively dividing your rank by 4.
Never change a URL
Otherwise your users won't be able to find the page you bookmarked and any page rank you built up in social bookmarking sites or search engines will be lost. If you absolutely have to change a URL, ensure you set up a permanent 301 redirect to the new one so that your user's don't get 404 errors. The w3c put it best: Cool URLs don't change.
Treat the URL as UI.
Navigation links, sidebars and tabs are all well and good but if you have a good URL structure your users should be able to navigate your site by changing parts of the URL. There are a few rules about how best to do this:
Ensure that for every part of the path info a user might remove, a useful page is returned.
For example if the URL /2007/nov/entry gives a blog entry, 2007/nov might give a list of all the November entries and /2007 might give a list of all entries from 2007.
Never have a URL on your domain which gives a 500 Error
It doesn't take a genius to realise that you don't want any URL in your website to crash cause a server error but developers don't always think about what will happen if a user starts hacking the URL to contain different values. For example if you have a URL /food/apple and a user changes it to /food/pizza when the application is only set up to deal with fruit it should give a 404, not a 500. If you've followed rule 3. above then all the variables your application code uses to generate the content will be part of the path info portion of the URL so you can issue a 404 Page Not Found response if someone enters an invalid value.
Users are much more likely to stop trying to guess URLs if they get a 500 page because they are worried they might be breaking something and the moment they stop hacking the URL it has lost its usefulness as a UI component. Obviously you can't issue 404s if a variable in a query string is incorrect because that would imply that whether or not the resource existed depended on the query string which it shouldn't.
I hope that's useful. If you have any extra tips, feel free to leave them in the comments and if you have any extra evidence to support any of these tips I'd be interested to hear it. I think the most important tip though it that you should use common sense when designing a URL structure and don't apply any of the tips too rigidly, after all you know your application's and user's requirements better than me so you are better placed to make a judgment about what will work best for you.
Copyright James Gardner 1996-2020 All Rights Reserved. Admin.