Sorry about the repulsive names, but that's the Java standard (or was, when these were written and one was supposed to use COM in all caps).
First, then, you download the program and put it on your classpath, however that's done in your system.
When you've got it installed, go to some directory in which you won't
mind getting new stuff stored by the program. Execute the program by,
You will get a dialogue in the middle of your screen, something like
In the text entry area at the top, enter the full URL for some simple, convenient page, like your home page. Click on the Mirror button to enable the mirroring feature. Leave the rest alone, and click Go.
Two windows will appear, hogging much of your screen. The one on the left, with the label "Error listing" will display any problems found with the page, like dead pointers, #reference items with no anchor defined, and so on. Since your site naturally is error-free, this window will show nothing until it says "Finished" when the job is done.
The other new window, "Progress log" will show what's currently happening. These messages will change at my whim, but they're likely to show what's being loaded from the Internet at any moment when nothing seems to be happening.
When the job is done, or before then, look at your working directory. It will have a new subdirectory named after your Website host, and if you follow the directory structure, you'll find a copy of the page you called for. Any pages that it incorporates (pages and images, but not hyperlinked pages) will allso be copied into their proper place in the directory structure; anything that's on a different host will be in a directory named after the host.
Now try mirroring that page plus everything that it directly links to on the same site. In the Depth field, enter 1. While you're up, click on the "Check all local pages" box, if you want it to check your whole site for bad pointers and the like. Then click Go. (A new pair of result windows will replace the old ones. By the way, it does no harm to close those windows at any time, but of course you'll lose the data.)
It won't load your home page a second time, because you already have a copy. But it will scan it for links, and load the things it points to, provided they're on the same site. Set Depth to 2, and it will go 2 links away; set Breadth to 1, and it will load from other sites that you link to, to the same total Depth.
Want to save the list of problems? Pull down the File menu on a result
window, and browse for a file to save to. If you select an existing
file, you'll get to decide whether to overwrite the file, append to it,
or forget it.
What the Programs Are About
PressMyButtons has two functions, which it can execute independently or at the
It's pretty obvious why you might want a program that checks your site
for links that don't link to anything. This type of function is
avavilable in some commercial web builders, but here's
a standalone version for anyone who wants one. And the program has two
other features that are not found in other programs I know of.
In the text area you can enter either a full URL or a simple filename with or without a directory path. Or you can press the "Or choose a file" button to bring up a file dialog.
Now, having aimed the program at something simple for a first test, hit the GO button. Two scrollable windows will pop up: one shows a string of progress messages so that you know what's going on, while the other shows all the errors that are found. When the job is finished, the Progress window displays "DONE", and the main window gets the GO button re-enabled. You can stop the scan by pushing the STOP button; the result windows will remain, with the progress window showing "STOPPED". You can terminate the whole program at any time with the Cancel button.
The operation of the program, as distinct from its user interface, is described a little more precisely in the section on WebLinkValid.
A local file is one that has a simple directory/file reference rather
than a full URL. For instance,
../foo/bar#baz. If the option "Check absolute refs" is
checked, it will also check pages pointed to by full URLs, so long as
they appear to be on the same host. See Check absolute
refs. The limitations are intended to keep the
program from checking the validity of everything on the World-Wide Web;
we leave that sort of thing to Alta Vista and Google.
If the box is not checked, then only the named file will be fully
checked. Other files may still be scanned for the existence of names in
...#name references; see the next section.
#someNameOrOther. It will log an error if
someNameOrOtheris not in the file.
If the box is not checked, then files will be scanned only if required by "Check all local pages", above.
When this option is on, pages are also checked if the reference is absolute, provided that the host is the same as the one on which the checking started. This can be problematic with large ISPs that host numerous Web sites, but it's unlikely to cause serious trouble, maybe.
If the reference to a file is via
http: , the file will be loaded into a
directory tree named after the host. If it's from
file: , and
it's on a DOS-like file system, the base of the tree will named after
the unit letter, as in
CCoLoN . There is no option to condense
the directory tree structure and load all files into one directory.
The things to be loaded are found in the checking process. Since it won't look at an off-site page to find references, there's no danger of trying to mirror the entire Web.
As of November, 2000, it also has a mirroring capability. See under the
-m option. See, more relevantly, the PressMyButtons
The names can be proper URLs
file:) or simple file
names. It generates two outputs:
<A>tags (ordinary hyperlinks) and
<AREA>tags (maps). Wherever an
hrefis of a type that the JDK URL class understands (a local reference or
http:), it tries to validate the reference as follows:
#namespec, and the file is
*.htm, scan it to check that the name is defined.
protocol:prefix), find and test all its references, recursively.
It doesn't know how to check the validity of a reference to a directory
http://www.nobody.glump/foo/bar/) because JDK 1.0 doesn't
know how, so it ignores them. If you leave off the / it will complain
that the file doesn't exist. It's a worse bummer that the program can't
ftp: links, but that's the way it is.
Suggestions will be appreciated unless they're obvious known problems, like "You should give it a decent interface and maybe offer it as an applet."
-moption on the command line before any URL spec, the program will download every referenced file on the same host. Specifically,
.htm, it examines that file in the same way. The order in which it looks at things is unspecified.
-moption is on and the file is on the same host as the original file. Note that the reference need not be local in the sense just given, and the file may or not be scanned for contents. The file may be of any type.
file:protocol), the local directory structure is mirrored. On DOS-ish filesystems, names like
C:are rendered as
-c(clobber) option is on.
%sign are considered to be queries of some kind, and are stripped to the part preceding the
%. This is a naïve approach, but works for the moment. Its crudeness may cause problems on Macs. If so, let us reason together.
adc/parser, an HTML tokenizing package written by Arthur Do, available at http://www-cs-students.stanford.edu/~do/. The classes are © 1997 and are used under the terms of the license.
The WebLinkValid program itself is freeware, but distribution is restricted under the copyright in the preceding paragraph. Enjoy.
If you have Java 2 (that is, JDK 1.2) in a recent release, this program should run. If you have JDK 1.1.7 or higher (or maybe 1.1.5; I happen to use 1.1.8), then all you need is the Swing classes. If you don't have them, get them at http://java.sun.com/products/jfc/#download-swing. They really are pure machine-independent Java.
If you don't have the right JDK level, you can get it for free somewhere.
Anyway, you can download these Java programs and unzip the package; or you can put the zip file in your class path without unzipping. If you do unzip it, you'll get a nice collection of subdirectories--and you'll need a decent non-FAT file system and an unzip program that isn't brain-damaged, namely InfoZip.
Since everything is written in Java version 1.0, it ought to run on anything that supports Java. We'll see. (Make that version 1.1: I haven't tested with 1.0 for a long time.)