Image of Hand Staff Handbook: Skills : File and Directory Management

Basic Terms

A "File" refers to a discrete unit of content which is recorded on physical media (a hard disk, a floppy disk). Images, textual content, even entire databases are often stored as files. Each file has a name ("filename") which must be unique within its context (i.e. its "path" -- see below). A filename typically includes a three or four character "suffix" or "extension" which describes the type of file (examples: .gif for GIF images, .html for text marked up in HTML format.) Valid file (and directory) names are composed of the letters a-z (upper or lower case) and the numbers 0 through 9 in any combination. It is also acceptable to use dashes (-) or underscore (_) characters in filenames. It is not acceptable for filenames to contain spaces or slashes (slashes are reserved to identify directories within a "path" -- see below.)

Sample Filenames:

A "Directory" is an organizational structure, a container for files and other directories. A file that is stored in a directory is completely distinct from any files that are stored in other directories, whether or not the files have the same name. Directory naming conventions are identical to file naming conventions, except that directory names do not usually include extensions.

A "Path" is a combination of slashes and directory names that can be used to identify the specific location of a file. A fully qualified path begins with a top-level (or "root") directory, typically referred to with a single slash (we will use / for all of our examples, as we are dealing with UNIX filesystems). Within a path, we can refer to any subdirectory using its position relative to the root directory (examples: /theses, /theses/available, /theses/submitted). We use genealogical terms to describe the relationships between directories. In the previous examples, the /theses directory would be considered a child of the root directory. The /theses/available and /theses/submitted directories would be considered children of the /theses directory, and could also be referred to as siblings of each other (i.e. /theses/available is a sibling of /theses/submitted). The /theses directory would be considered the parent of the /theses/available directory, and the / directory would be considered a parent of the /theses directory.

The path and filename are two critical components of every URL, even if they are implied rather than explicitly specified (as in the case of index.html files -- see below). When choosing a name for a file or directory, you want to be sure the final URL will be descriptive, without being overly wordy or redundant. The following sections give some guidelines for naming files and directories.

Tip 1: Avoid inserting redundant information in the name of each subdirectory and file.

Bad File + Directory Name: /ejournals/JTE/jte-v11n2/jte-v11n2-miscellany.html

Better File + Directory Name: /ejournals/JTE/v11n2/miscellany.html

In the first example above, we've added 14 extra characters of useless information to the path and filename, which will be reflected in the URL users see and may even have to type into their browser. In short: we're not going to put issues of any journal but the JTE in the directory /ejournals/JTE, so we can simply call the directory /ejournals/JTE/v11n2. We're not going to put articles that aren't a part of volume 11, number 2 in the directory /ejournals/JTE/v11n2, so we can simply name the file miscellany.html. So the better path and filename would be /ejournals/JTE/v11n2/miscellany.html.

Tip 2: Avoid overly general file and directory names.

Bad File + Directory Name: /site1/section1/file1.html

Better File + Directory Name: /exhibits/spring2001/synopsis.html

The first file and directory name given above gives us no hints as to the larger context for the page we're looking at. A simple, descriptive name that hints at the subject, time period, or broad category of materials we're dealing with is a great help for the next person who has to update and/or rearrange the files you work on.

Tip 3: Avoid overly terse file and directory names.

Bad File + Directory Name: /arch/blhist/timln/syn.html

Better File + Directory Name: /archives/black_history/timeline/synopsis.html

It used to be the case that file and directory names could only be 8 characters long. This is no longer the case, and names that are composed of full english words are easier to remember than odd abbreviations or even acronyms.

Tip 4: Avoid overly complex capitalization.

Bad File + Directory Name: /Exhibits/Fall1999/CulinaryCOLL/Cover.html

Better File + Directory Name: /exhibits/fall1999/culinary/cover.html

File and directory names on our web server are case sensitive. Failing to correctly capitalize even one letter in the first example above results in a "File not found" message. It's better practice to pick a case and stick to it. Lowercase is best for this, as it requires less fiddling with caps lock or shift keys on the part of users who are typing in the URL.

Tip 5: Avoid being too wordy.

Overly Long Directory Name: /manuscripts_and_guides/smithfield_preston_collection

Better Directory Name: /manuscripts/smithfield_preston

These days, it's possible to create very descriptive filenames by incorporating dashes (-) and underscores (_) in between common english words. Avoid the tendency to write small essays when creating files and directories, stick to no more than two longer words or three short words when naming a file or directory.

Tip 6: "index.html" files and why you should use them

Our web server is configured to recognize a special filename, index.html. When this file is found in a directory, the contents of the file are returned instead of a listing of the directory's contents. The first advantage of using index.html files is that it keeps users from inadvertantly browsing through outdated versions of pages, or from trying to make sense of a directory full of images instead of the page that describes the images. The second advantage is that URLs referring to an index.html file can omit the filename index.html at the end, as it is implied. Example: The URL http://scholar.lib.vt.edu/ejournals/index.html is equivalent to the shorter URL http://scholar.lib.vt.edu/ejournals/.

Tip 7: Divide larger groups of files using subdirectories.

It's a good idea to break up larger sites using a directory structure that corresponds to some common mental model that users can follow along with. If you're marking up electronic journals, you'll probably want a subdirectory for each journal, and then a subdirectory for each issue of the journal. A directory containing all the images for every issue, or all of the articles for every issue would be difficult to maintain without reading each and every page of the journal.

Think of the way written text is commonly broken up into paragraphs. A single long paragraph containing dozens of different ideas is hard to make sense of, there are no stopping points to allow people to digest information. If you have one major idea for a group of pages, it's probably OK if the pages share a directory with a page that covers another (related) concept. If you have one major idea for a group of pages, and more than two or three pages on another topic (whether or not it is related), it may be time to think about putting the pages related to the second topic in their own directory.