Fun with NTFS
Posted by Trejkaz Thu, 06 Jul 2006 03:43:00 GMT
Jnode is an operating system [almost] entirely written in Java. Whereas that might make some of you feel a little ill, we’re using filesystem code from Jnode in order to read various filesystems from disk images without having to mount the image. It turns out to work rather well, because Jnode was all designed to be easily debugged outside of the running operating system.
Anyway, we ran into one particular NTFS disk image which exhibited two features that Jnode hadn’t implemented yet. Because we want to handle almost anything and the original developer of their NTFS driver has retired from the project, the job fell on me to fill the holes. Luckily, the code for the NTFS driver in Jnode is several times more readable than the code for the NTFS driver in Linux – I was actually able to implement these features.
Sparse files – this is a fun feature which Windows apps don’t seem to use very often on NTFS. When the data chunk of a file is marked as residing at cluster #0, the data is in fact not stored. It took me a while to figure this out because the implementation was silently returning garbage data instead of zeroes.
The modification to fix this support in Jnode was under a dozen lines of code, all the time was spent finding what the actual issue was.
Highly fragmented files – this one’s interesting. The disk in question was “missing” the root directory listing. Turned out it was simply being stored in another MFT record, something which allegedly happens quite rarely. The cause for it is fragmentation – if there are too many chunks of data that need to be pointed to, listing all the chunks can’t fit in a single MFT record.
Fixing this was a little more nasty because in some situations, the attribute data got broken up with multiple data runs in a single attribute, whereas sometimes, it was stored as multiple attributes with a single data run in each – quite a messy situation. The code I ended up writing isn’t the best, I basically ended up tweaking the resulting attributes so that the multiple attributes’ data runs were appended onto the first attributes’ runs.
So the disk image in question basically works now, and I even managed to unit test these two conditions in some smaller disk images (the second one was rather hard to reproduce – funny how fragmentation doesn’t happen when you really want it to, yet a disk unchecked will become hopelessly fragmented in under a month.)
In any case I’m glad it’s almost over, as I’m sure that working on an NTFS driver must slightly increase a person’s chance of committing suicide.