----------------------------Original message----------------------------
Phil Hoehn recently commented on the use of special characters in URLs
on the Web and noted that special characters (the issue was tildes and
underscores) require that you use either a numerical representation of
the special character or the name of the special character.  Phil made
his comments to the California Map Librarians' list, but since this is
going to be a rather long explanation and there may be other interested
people, I figured I'd send it to MAPS-L as well.  That said, here goes.
 
There are two basic issues to consider.  Phil mentioned the fact that
the specification of the standard for representing characters on the Web
allows A-Z, a-z, 0-9, and a very small set of certain special characters
(details of which I won't go into here).  The other issue to consider is
that different operating systems (Macs, Windows, Unix, OS/2, OS/390)
assume different character sets.  Because of this, representing a
particular special character becomes quadruply difficult.  So, the very
safest way is to use the names of the characters (which are also
conveniently defined in the standard).
 
Some examples of character names are:
 
Å         197             Uppercase A with ring
Ñ        208             Uppercase N with tilde
æ         230             Lowercase ae ligature
è        232             Lowercase e with grave accent
é        233             Lower case e with acute accent
ñ        241             Lower case n with tilde
ö          246             Lowercase o with umlaut
 
The numeric representations in the center column are those that are used
by non-Microsoft and products that run on most non-Microsoft operating
systems (basically, Macs, Unix, and OS/2).  But, if somebody's reading
your URL (or any other Webness, for that matter) on a Mac or Unix box,
and, if you use Microsoft's numerical coding scheme when you code your
URL or other Webness, what you intend them to see won't happen. By the
same token, if your computer is a Mac and you use the non-Microsoft
code, what you intend your reader to see also will not be correct.
 
For example, since I'm in archaeology, I frequently need to use the ae
ligature which is æ for the lowercase and Æ for the
uppercase because the boss prefers the ae in archaeology to be
ligatured. (One of the things on my list to do for my own web site is to
go back and change all the instances of archaeolog... to
archæolog....)
 
As an example of how this problem shows up, make a plain text file
consisting of the five lines below (you can cut from this email and
paste into your favorite word processor and then save the resulting file
as a plain text file).  Then, save the file and load it into your
favorite Web browser (in Netscape, click on File and then on Open File
and then on the name of the file you've saved).  What you'll see is two
of the three lines with an actual ligatured ae.  The other line (which
of the first two is actually ligatured will depend on whether you're
using a Microsoft operating system or some other operating system on
your computer) will show up as some other strange special character
(definitely not a ligatured ae).
 
<html><body>
First, try it with &#145;, that is, 145 <br>
Second, try it with &#230;, that is, 230 <br>
Third, try it with &aelig;, that is, a name
</body></html>
 
As you can see, the special characters start either with an ampersand
and a pound sign (if they are numerical) or with an ampersand alone if
you use the names.  In all three cases, the specification of the special
characters terminates with a semi-colon.
 
So, even though it's a pain in some extremely strategic part of your
anatomy, the safest thing is to use the character names.  One way to do
this and to make sure you have it right is to write out everything as
you normally would and then use your word processor or text editor's
mass change function to change everything to the way it really needs to
be.  If you use one of the Web page tools such as Page Mill (from Adobe
and costs big-time) or AOLPRESS (http://www.aolpress.com -- really truly
for free, probably the best thing to come out of AOL), they generate the
special characters properly for  you.  Also, fortunately, many of the
special character names are mnemonic!
 
Another to do on my list is to take the two tables I have (one for
Microsoft's numbers and one for the numbers used by Unix, Mac, and OS/2)
and blend them into a single document which I'll then put out on my Web
site as a .pdf file.
 
Fortunately for those people who would like to have a copy of the
combined list, I have about 30 weeks of downtime coming up because of
getting a hip replacement, so I should be able to get to the end of my
to do list.  ;-)
 
I normally teach a sequence of classes for my professional society and
the papers for two of those classes are on my Web site as .pdf files.
(I'm working on the others ;-).  If you'll go to my Web site, click on
extracurricular activities and then on classes, you should be able to
get them properly. (I should mention that .pdf files are read using the
Adobe Acrobat Reader 3.0 or higher which is available for free download
from Adobe -- I left a pointer on my site in case you don't already have
this plug-in for your browser.)
 
If you have questions about this or related issues to Webness, please
feel free to send them to me directly, because I only read the MAPS-L
digests about every ten days to two weeks.  If you indicate the question
is from a member of MAPS-L, I'll send the answer back to the list if you
like.
 
HTH.
 
vh
--
\ /     Virginia R. Hetrick, here in sunny California
 0      Bellnet:  310.206.7588
 Oo     Email:    [log in to unmask]
        http://www.ioa.ucla.edu/~hetrick
        Site of the month: http://www.cbs.com/prime/murphy/index.htm