Posts Tagged ‘file system’

Tagging on Windows

I spent some time recently investigating file tagging on Windows Vista and Windows 7, and I was less than happy with my findings. The problem is that only certain file formats that support metadata can be tagged, for file types without such support it’s simply not possible to do any tagging. This is a major limitation and without ubiquitous tagging support, the feature itself becomes all but useless; as an end-user, I don’t care about file type, if I’m searching for a document should I have to remember whether it’s a Word document or a plain-text file? JPEG or TIFF? Absolutely not. The confusing thing is, at some point, Microsoft seemed to believe this as well, by default Win 2000 and higher hides file extensions for know file types – isn’t it then completely counter-intuitive to present a limitation based on file type. What this ultimately boils down to is the question of why would I bother tagging anything if only a subset of my files, based on file type, an attribute I don’t care much about, would benefit from the additional metadata?

This isn’t to say that there is necessarily an easy solution. I found this blog post to be the best write up on the topic which explains several approaches that could be taken to implement tagging and why storing metadata in the files themselves was chosen as the best approach. However, I don’t think storing metadata in the file system would be such a bad idea, I can understand the issues about losing metadata when copying to another volume, typically one with a older file system, but at some point you have to do what’s best for the future, not the present, by that I mean the only way you’d actually start to see file systems with proper support for metadata (via. NTFS-esque alternative streams or whatever) in use is by providing an imperative for users to buy or format devices with those file systems; as-is, we’ll continue to use systems such as FAT32 and programmers will continue to remain agnostic to metadata. At the very least, a hybrid approach, storing metadata both in the file itself, when supported, and within the file system would seem a worthwhile solution.

Counting the number of files/folders in a directory

I was looking for a fast way to get the number of folders in a directory. I didn’t need anything else, just the number of folders – this was for a folder tree, and to determine whether or not to display an expansion arrow that would pull and list the subfolders when clicked. The brute-force (read: slow) way to accomplish this is just to get an array of folders and get its length. I thought such information would exist within the directory metadata, but it turns out it doesn’t, and the brute-force way is the only way.

folder tree

Quicker (quickest?) way to get number of files in a directory with over 200,000 files

How do I find out how many files are in a directory?

The stackoverflow questions refer to files, but files and folders are seen/enumerated the same way by Windows (FindFirstFile, FindNextFile), so the same limitation exists.

This post on The Old New Thing explains the reason for the lack of such metadata.

The file system doesn’t care how many files there are in the directory. It also doesn’t care how many bytes of disk space are consumed by the files in the directory (and its subdirectories). Since it doesn’t care, it doesn’t bother maintaining that information, and consequently it avoids all the annoying problems that come with attempting to maintain the information.

Another issue most people ignored was security. … If a user does not have permission to see the files in a particular directory, you’d better not include the sizes of those files in the “recursive directory size” value when that user goes asking for it. That would be an information disclosure security vulnerability.

Yet another cost many people failed to take into account is just the amount of disk I/O, particular writes, that would be required. Generating additional write I/O is a bad idea in general…

Completely understandable… but it still sucks on the application developer’s end. One solution I toyed around with is using the folder info to make a cache, instead of just getting the count and tossing the array to the garbage collector. This means, everytime a folder is expanded, the subfolders within the subfolders of the directory being expanded would be pulled and the necessary UI widgets would be created. However, this isn’t always ideal, especially as you go deeper into a folder tree, as you can end us wasting time pulling and creating UI widgets for a lot of folders which will never be seen by the user.