Posts Tagged ‘files’

Tagging on Windows

I spent some time recently investigating file tagging on Windows Vista and Windows 7, and I was less than happy with my findings. The problem is that only certain file formats that support metadata can be tagged, for file types without such support it’s simply not possible to do any tagging. This is a major limitation and without ubiquitous tagging support, the feature itself becomes all but useless; as an end-user, I don’t care about file type, if I’m searching for a document should I have to remember whether it’s a Word document or a plain-text file? JPEG or TIFF? Absolutely not. The confusing thing is, at some point, Microsoft seemed to believe this as well, by default Win 2000 and higher hides file extensions for know file types – isn’t it then completely counter-intuitive to present a limitation based on file type. What this ultimately boils down to is the question of why would I bother tagging anything if only a subset of my files, based on file type, an attribute I don’t care much about, would benefit from the additional metadata?

This isn’t to say that there is necessarily an easy solution. I found this blog post to be the best write up on the topic which explains several approaches that could be taken to implement tagging and why storing metadata in the files themselves was chosen as the best approach. However, I don’t think storing metadata in the file system would be such a bad idea, I can understand the issues about losing metadata when copying to another volume, typically one with a older file system, but at some point you have to do what’s best for the future, not the present, by that I mean the only way you’d actually start to see file systems with proper support for metadata (via. NTFS-esque alternative streams or whatever) in use is by providing an imperative for users to buy or format devices with those file systems; as-is, we’ll continue to use systems such as FAT32 and programmers will continue to remain agnostic to metadata. At the very least, a hybrid approach, storing metadata both in the file itself, when supported, and within the file system would seem a worthwhile solution.

Programs and Files

This post is about an idea, a simple idea:

End-users of a PC should be able to transfer their programs between computers and easily as they transfer their documents.

There is of course the issue of OS and hardware dependency for applications, but even from one x86, 32-bit, Windows machine to another, transferring a program is a non-trivial process. Yet, from a technical point of view, there seems to be little justification for things to be this way.

So what is a program really and why is it so difficult to move them around?
(Note: This is primarily dealing with Windows and I’ll be referring to Windows 2000/XP/Vista constructs, however, the situation isn’t any better on OS X or Linux)

Part 1: Back to Basics
A program is just data. Some of it is logic (executable files, stuff executed on the CPU) and some of it is just raw information (images, music, settings, user information, etc.). All of this data, at some level, is contained in files.

Files are great because users understand files, they understand the concept itself (a file is an atomic unit of data) and execution of the concept on almost any operating system is such that working with and manipulating files (moving, copying, etc.) is trivial for all but the most illiterate computer users. This is likely why all user documents are single files. This ease-of-use, I believe, surfaces due to the atomicity of a file; everything about a file is within the file itself, and the metadata for a file (file size, time stamp, etc.) is managed and/or computed dynamically by the file system.

When dealing with multiple files that share a common function or purpose, aggregating files becomes necessary and as such we get to the concepts of directories and tags. While there are ways to make either work for what I’m going to describe, I’ll stick to directories as it’s a more familiar concept in the realm of the desktop.

Like files, users understand directories because of their simplicity in concept and execution, and as with files users are familiar with manipulating them. This simplicity comes about due to the fact that, like files, directories can be seen as atomic units of data. While looking inside a directory and seeing a mishmash of files makes it seem very much non-atomic, it is also a homogeneous container (it stores only files) and therefore, it can be viewed as a single object, a container, and this container can, for the most part, be manipulated as a single object, much like an individual file can. While their atomicity is important for ease-of-use by an end-user, directories are also important because they relate to the coupling of files. As mentioned, files within a directory share a common function or purpose, this is undoubtedly true for programs; each directory for a program contains files which are essential to its function (e.g. all of the files in my \Program Files\XML Notepad 2007\ directory relate to the functionality of XML Notepad). This coupling is important because it allows for easy organization, maintenance, and access to the program and it’s program-specific data.

Now to the crux of the matter, if programs are just files, and files are coupled together in directories, it should follow that we can move programs around by just copying directories. Unfortunately this is not the case because the coupling between a program’s logic and some specific pieces of information belonging to it are not coupled together in the same directory. While Windows users may assume that each subdirectory in \Program Files contains all the files for a program, this is not (in general) true, the files in these subdirectories represent only part of their program. The additional pieces of information are contained in the the system registry and/or somewhere within the Documents and Settings folder. This lack of coupling is the reason transference of programs between PCs is difficult and to tackle the problem tighter coupling must be provided while maintaining the operating system features provided by the registry and the Documents and Settings directory.

Part 2: The Registry
First off, the system registry is a nightmare. I can’t imagine who conceived of the idea that storing application level data and critical system specific data in the same location and allowing free access to it all would be a good thing. So much of our current malware, adware, spyware, etc. headaches have been caused by programs doing bad things to the registry. Additionally, although you can technically backup the registry easily, I’ve encountered errors restoring it which makes the backup useless and the only recourse is to reinstall applications just so they can rewrite their registry information. It should also be noted that most typical users I’ve encountered don’t know what the registry is or have a backup of it, but the more important question is, why should they? It’s a construct that simply appeases the operating system, it’s functional benefits could easily be refactored in more user friendly ways.

So, relative to a program, what’s actually in the registry?

  • Program-specific settings.
  • System-wide settings related to a program (e.g. file associations, uninstallation information, stuff that shows up in the OS shell).
The registry is deprecated for the first issue (so I’ve read, but I can’t actually find a Microsoft link), it is no longer recommended that program settings be stored in the registry. For the second issue, it seems that with the CPU and memory of a modern PC, dynamically reading and applying such settings would be a much more ideal solution. This can be done by specifying a convention for the settings file (it can be XML or whatever), then dynamically scanning each folder in the \Program Files directory for these files and applying the settings specified.

There are, of course, technical benefits (application not messing around in the registry) and tradeoffs (performance cost of scanning program directories) to doing this, but from the end-user’s perspective the ultimate benefit comes from the tighter coupling of a program’s files (program settings that effect the OS are no longer part of the registry file, they now reside in their own file, which is tightly coupled together with other program files in the same directory).

Part 3: Documents and Settings
Tackling the registry is somewhat easy as you don’t have to look far to see the flaws of the registry. The Documents and Settings folder seems a bit trickier, but it’s not. The Documents and Settings folder is a manifestation of the needs of a multiuser operating system. Fundamentally, it stores user-specific settings and program-specific setting for a specific user or all users (the other stuff like My Documents, or the desktop, can be moved; well, to a certain extent, some programs still seem to use an absolute path to C:\Documents and Settings\<user>\My Documents instead of querying the OS for the path). User-specific settings are the domain of the operating system so it can whatever it will with these files. Program settings are different, they’re the domain of the program which owns them (that’s a contract between the user and the program, not the user and the operating system). The policy of having the OS handle delineation and storage of per-user program settings seems to be just that, a policy. There’s no reason applications can’t do it themselves. There are 2 possible solutions I see here:

  • Solution A: The program can be written from a multiuser perspective. Querying all users, creating a directory for every user (within the program’s main directory), setting appropriate permissions for the directories created, and storing user-specific data in these directories.
  • Solution B: Create one \Program Files directorie s for every user. In effect, every user can have a different set of programs installed and/or copies of the same programs. Programs store whatever user-specific settings they want (which will be for the currently logged in user) in their main directory.
Solution A is a pain in the ass for developers. No developer is going to want to have the additional burden on managing data for multiple users when it can be done transparently by the OS. Solution A is also not ideal in how it addresses the overall problem, as to move or copy programs users have to either log in as an admin and copy over all the user data folders or figure out their specific user data folder and only copy over that one.

Solution B is ideal in my opinion, even though it wastes hard drive space, but space is cheap and aside from games most programs don’t use a whole lot. With at one \Program Files directory for every user, applications will still maintain transparency in how they manage user data for multiple users and user data can be stored in the application’s main directory. This solution also provides for greater security and stability as users’ program files are independent and they don’t have to risk running something modified by another user. (i.e. every user is in their own sandbox both for programs as well as documents).

The ultimate benefit here is that copying a program becomes just a matter of copying the program’s main directory.

(For sensitive information, the program is then responsible for determining what user information to keep and what to erase after a copy, but that’s as it should be, that information is a contract between the user and the program, not the user and the operating system).

Part 4: Wishful Thinking
I wrote specifically about Windows, but the central idea here is coupling of data, which can be applied across operating systems. The tighter the coupling, the easier it is for the user to manipulate their data as a whole. There’s no reason the same agility afforded to a user’s documents can’t be given to their programs as well.

Also, although I mentioned one directory for all programs, I see no reason why the system I described couldn’t scale to multiple program files directories (for multiple drives or whatever else).

There is, of course, one issue I didn’t tackle, which is DRM. I spoke about copying and moving without mentioning licenses or copy protection, techniques which greatly benefit from the way things are. However, I think that’s a separate issue, and, simply put, I see something terribly wrong with licensing software to a CPU instead of a user.

Anyways, it’s just an idea. In the end, all of this is wishful thinking, and there’s so much inertia and missteps from the operating system vendors that I’m not holding my breath for any fundamental changes. However, there’s something really sad about the fact that it’s easier to install an application on Facebook than it is to install an application on my desktop.