Best Practice for Good URL Structures
++++++++++++++++++++++++++++++++++++++++
:Posted: 2007-11-11 13:20
:Tags: Web
I've had an instinctive feel for a long time that some URL structures are
better than others and whilst URLs which map to the structure of the code on
the filesystem are clearly bad I thought it would be interesting to think about
exactly what makes a good URL structure and what doesn't, both from a usability
and technical point of view. My motivation for this today rather than at any
other time is that I'm writing a chapter on Routes for the Pylons Book so I
thought it would make useful background reading.
First a definition of the parts of a URL::
http://jimmyg.org:80/some/url#fragment?foo=bar
|--| |---------|--|--------|--------|------|
| | | | | |
protocol | port | fragment |
domain name path info query string
1. Describe the content
An obvious URL is a great URL. If a user can glance at a link in an email and
know what it contains you have done your job. This means choosing URL parts
which accurately describe what is contained in each folder and always using a
descriptive word rather than an ID in the URL. For example, if you were
designing a blog you should try to use ``apr`` instead of ``04`` to represent
April and you should use the name of a category rather than its ID. This makes
your URLs more intuitive to your users and give search engines a better chance
of understanding what the page is about.
You might think that direct use of URLs is likely to decrease as people use
search engines and social bookmarking sites more frequently but research this
year by Edward Cutrell and Zhiwei Guan from Microsoft Research where they
conducted an eyetracking study of search engine use that found that people
spend 24% of their gaze time looking at the URLs in the search results. If your
URLs describe their content, uses can make a better guess about whether or not
your content is what they are after.
2. Keep it short
Try to keep your URLs as short as possible without breaking any of the other
tips here. Short URLs are easier to type in or to copy and paste into documents
and emails. If possible, keeping URLs to less than 80 characters is ideal so
that users can pase URLs into email without having to use URL shortening tools
like qurl.com or tinyurl.com.
3. Hyphens separate best
It is best to use single words in each part of a URL but if you have to use
multiple words, for example for the title of a blog post, then hyphens are the
best characters to use to separate the words. e.g.
``/2007/nov/my-blog-post-title/``. Unfortunately the ``-`` character cannot be
used in Python keywords so if you intend to use the URL fragments as Python
controller names or actions you might want to convert them to ``_`` characters
first. Incidentally using hyphens to separate words is also the most readable
way of separating terms in CSS styles.
4. Static-looking URLs are best
Regardless of how your content is actually generated it is worth structuring
URLs so that they don't contain lots of ``&``, ``=`` and ``?`` characters which
most visitors won't properly understand. If you can write a URL like
``?type=food&category=apple`` as ``/food/apple`` then users can see much more
quickly what is about.
5. Keeping URLs lowercase makes your life easier
The protocol and domain name parts of a URL can technically be entered in any
case but the part after the # is case sensitive. How a particular server treats
anything between the two depends on the server, operating system and what the
URL resolves to. UNIX is case-sensitive, while Windows isn't so if the URL
resolves to a file, Windows servers will generally allow any case whilst UNIX
ones won't. Query string parameters are also case sensitive. You can generally
save yourself a headache by keeping everything lowercase and issuing a 404 for
anything which isn't. Of course if you are writing a wiki where the page names
depend on the capitalisation then you'll need to make the URLs case sensitive.
6. Keep the underlying technology out of the URL
Your users don't care which specific technology you are using to generate your
pages or whether it is a ``.html`` or ``.xhtml`` so the basic rule is don't use
a file extension for dynamically generated pages unless you are doing something
clever in your application internally like determining the format to represent
the content based on the extension. It is also generally best to choose names
which represent what the URL is rather than its technology so you might
consider ``style`` and ``script`` to be better choices than rather than ``css``
and ``js`` for your CSS and JavaScript files.
7. Use singular terms rather than plural
This is a matter of personal preference but rather than having a URL like
``/people/james`` use ``/person/james``. It is likely that the last part of a
URL will describe one thing, so the previous parts of the URL should describe
that thing too. In this case ``james`` is a person, not a people so
``/person/james`` is more appropriate. You can use this convention throughout
your application in naming controllers, database tables etc.
8. Only use `Disambiguated URLs `_
Any piece of content should have one and only one definitive URL, with any
alternatives acting as a permanent redirect. In the past features like Apache's
``DirectoryIndex`` have meant that if you entered a URL which resolved to a
folder, the default document for that folder would be served. This means that
two URLs would exist for one resource (discussed by `J Tauber here `_). To make matters worse servers are
configured so that http://www.example.com/someresource and
http://example.com/someresource both point to the same resource. This means
there can easily be 4 URLs for the same resource.
There are three good reasons why this is bad:
* Browser or server caches will have to cache 4 versions of the page. Put
another way this means they can't improve performance if you visit a
different version of the same URL the second time.
* All versions of the page will be treated by web browsers as different
resources so the user's browsing history won't be accurate.
* Search engines and social bookmarking sites give pages that are linked to more
frequently a higher rank. If you have 4 different URLs for the same page you
are effectively dividing your rank by 4.
9. Never change a URL
Otherwise your users won't be able to find the page you bookmarked and any page
rank you built up in social bookmarking sites or search engines will be lost.
If you absolutely have to change a URL, ensure you set up a permanent 301
redirect to the new one so that your user's don't get 404 errors. The w3c put
it best: `Cool URLs don't change `_.
10. Treat the `URL as UI `_.
Navigation links, sidebars and tabs are all well and good but if you have a
good URL structure your users should be able to navigate your site by changing
parts of the URL. There are a few rules about how best to do this:
* Ensure that for every part of the path info a user might remove, a useful page is returned.
For example if the URL ``/2007/nov/entry`` gives a blog entry, ``2007/nov``
might give a list of all the November entries and ``/2007`` might give a list
of all entries from 2007.
* Never have a URL on your domain which gives a 500 Error
It doesn't take a genius to realise that you don't want any URL in your website
to crash cause a server error but developers don't always think about what will
happen if a user starts hacking the URL to contain different values. For
example if you have a URL /food/apple and a user changes it to /food/pizza when
the application is only set up to deal with fruit it should give a 404, not a
500. If you've followed rule 3. above then all the variables your application
code uses to generate the content will be part of the path info portion of the
URL so you can issue a ``404 Page Not Found`` response if someone enters an invalid
value.
Users are much more likely to stop trying to guess URLs if they get a
500 page because they are worried they might be breaking something and the
moment they stop hacking the URL it has lost its usefulness as a UI component.
Obviously you can't issue 404s if a variable in a query string is incorrect
because that would imply that whether or not the resource existed depended on
the query string which it shouldn't.
I hope that's useful. If you have any extra tips, feel free to leave them in
the comments and if you have any extra evidence to support any of these tips
I'd be interested to hear it. I think the most important tip though it that you
should use common sense when designing a URL structure and don't apply any of
the tips too rigidly, after all you know your application's and user's
requirements better than me so you are better placed to make a judgment about
what will work best for you.
Comments
========
Best Practice for Good URL Structures by 3stuff
-------------------------------------------------------------------
:Posted: 2007-11-13 17:18
[...] check the full story here [...]
:URL: http://3stuff.cn/?p=1568
apple » Best Practice for Good URL Structures
------------------------------------------------------
:Posted: 2007-11-17 12:26
[...] Read the rest of this great post here [...]
:URL: http://apple.wpbloggers.com/apple/?p=666
What is a URL?
-----------------
:Posted: 2007-11-24 00:56
Aren't there types of redirects if a URL has to change? So why do people always say that URLs should not change? Just use a redirect.
:URL: http://what-is-what.com/what_is/url.html
thejimmyg
------------
:Posted: 2008-01-11 12:21
Keeping the page at the URL is best, using an HTTP redirect is fine (although not all search engines follow them), it is just that removing the page and leaving a 404 error page is not very good, even if that page links to a search engine or similar.
:URL: http://jimmyg.org