Posted by Trejkaz
Thu, 06 Jul 2006 03:43:00 GMT
Jnode is an operating system [almost] entirely written in Java. Whereas that might make some of you feel a little ill, we’re using filesystem code from Jnode in order to read various filesystems from disk images without having to mount the image. It turns out to work rather well, because Jnode was all designed to be easily debugged outside of the running operating system.
Anyway, we ran into one particular NTFS disk image which exhibited two features that Jnode hadn’t implemented yet. Because we want to handle almost anything and the original developer of their NTFS driver has retired from the project, the job fell on me to fill the holes. Luckily, the code for the NTFS driver in Jnode is several times more readable than the code for the NTFS driver in Linux – I was actually able to implement these features.
Sparse files – this is a fun feature which Windows apps don’t seem to use very often on NTFS. When the data chunk of a file is marked as residing at cluster #0, the data is in fact not stored. It took me a while to figure this out because the implementation was silently returning garbage data instead of zeroes.
The modification to fix this support in Jnode was under a dozen lines of code, all the time was spent finding what the actual issue was.
Highly fragmented files – this one’s interesting. The disk in question was “missing” the root directory listing. Turned out it was simply being stored in another MFT record, something which allegedly happens quite rarely. The cause for it is fragmentation – if there are too many chunks of data that need to be pointed to, listing all the chunks can’t fit in a single MFT record.
Fixing this was a little more nasty because in some situations, the attribute data got broken up with multiple data runs in a single attribute, whereas sometimes, it was stored as multiple attributes with a single data run in each – quite a messy situation. The code I ended up writing isn’t the best, I basically ended up tweaking the resulting attributes so that the multiple attributes’ data runs were appended onto the first attributes’ runs.
So the disk image in question basically works now, and I even managed to unit test these two conditions in some smaller disk images (the second one was rather hard to reproduce – funny how fragmentation doesn’t happen when you really want it to, yet a disk unchecked will become hopelessly fragmented in under a month.)
In any case I’m glad it’s almost over, as I’m sure that working on an NTFS driver must slightly increase a person’s chance of committing suicide.
Tags driver, filesystem, java, jnode, ntfs | no comments
Posted by Trejkaz
Sun, 14 May 2006 23:49:00 GMT
In the software I develop at work, we extract data (and metadata) from all manner of different locations. Mail messages, office documents, and more recently, disk images. Extraction from disk images is a particularly interesting one, because it starts to become impractical to extract every file to the temp directory and process them from there.
So we went through a redesign such that many of our data handlers can handle arbitrary chunks of data which may not be ordinary files. Sometimes they come from a byte array, sometimes they come as a sub-slice of a larger chunk of data, and sometimes they come by gluing together multiple slices of data.
The data API we use is basically a cross between a RandomAccessFile and a ByteBuffer (in fact we have implementations which use each of these as a means of getting the data from a normal file) but supporting long indexes. Java’s MappedByteBuffer, amazingly, only supports int, which would mean not being able to process files larger than 2GB. Your 64-bit hardware is useless – Java won’t map past the first 2GB. (Sorry Sun, but disk images are much larger than 2GB, and we still need random access. This is why we have to write our own APIs for this stuff.)
A few parts of our system are still forced to extract files to disk, and largely there is nothing we can do about these. But I thought I would try to at least do something about handling zip files, as it’s fairly normal to encounter large zip files, and copying the files to the temp directory starts to become a serious overhead as the files become larger.
The ZipFile class in Java is, unfortunately, very limited. It can only take a File, and the data we’re processing is very rarely a real file. Ideally, I would hope to pass in something like a ByteBuffer, and we could make an implementation of a byte buffer backed by our own API.
So the only thing you can do in this situation is create a bug report (well, a feature request) and hope that Sun does something about it. But this is what happens.
It’s bad enough that they won’t even consider improving their API. But what really irks me is that they would assume we don’t need random access, when both of the suggested constructors I put forward allow random access. They didn’t even contact me to confirm this – just closed the bug at their own convenience.
So forget all the fluff about open source philosophy you will find on other sites. The real reason Sun needs to open source Java is so that bugs get fixed. That’s all the reason they need.
Tags java, opensource, sun | 4 comments
Posted by Trejkaz
Wed, 01 Mar 2006 22:35:00 GMT
As most of you who care probably knew already, the Server VM was somewhat faster than the Client VM for JDK 1.5.0. This has been a bit of a bother for us lately, as we have needed to use the Server VM for its speed, yet we’re also a GUI application which would benefit greatly from using the Client VM instead. In fact lately (JDK 1.5.0 updates 5 and 6) there have been some nasty compiler bugs which cause the Server VM to crash on some of our code.
This recent instability of the Server VM has resulted in me running benchmarks of the different Sun JVMs in order to determine exactly what we’d lose by switching back to the Client VM. This became the perfect excuse to test the new JDK 1.6.0 (Mustang) beta to compare it alongside the current version.
I have a little free time while waiting for the final one to finish so I thought I’d do a write-up so that people can see the kind of improvements that Java 6.0 will bring for general processing.
Read more...
Tags benchmark, java, mustang | no comments
Posted by Trejkaz
Tue, 15 Feb 2005 05:45:00 GMT
(Or, How MVC Ate My Memory)
Part of the problem with using excessive MVC in any language, and particularly with using Swing in Java, is that if you create and destroy a large number of GUI components over the lifetime of your application, you will end up with a lot of deadweight.
Read more...
Tags java, programming
Posted by Trejkaz
Fri, 28 Jan 2005 02:29:00 GMT
Sometimes the subtlety in Java is almost enough to be annoying.
For instance, let’s take the java.io.File constructors for today. Ignoring the versions of the constructors which take File objects themselves, you basically have a choice between three ways to construct your File path.
- Use an absolute path (almost never used if you want portability, since an absolute path inevitably requires knowing what OS you’re using before you create it);
- Use a relative path, providing the base path to resolve it relative to.
- Use a relative path, providing no base path.
The last option is where problems will occur. Suppose you create a file:
File file = new File("foo.txt");
As you can probably guess, this is supposed to point at the file called “foo.txt” relative to the current directory. But what you might not expect, is that this is not always the case.
Suppose you have a native library loaded at some point in your application. This native library then does the unthinkable, and changes the current working directory. As soon as this happens, everything goes to shit.
System.out.println(System.getProperty("user.dir"));
File file = new File("foo.txt");
System.out.println(file);
System.out.println(file.getAbsoluteFile());
InputStream stream = new FileInputStream(file);
stream.close();
If you run this after some cretinous native library has changed the path, you’ll see something like this:
/home/trejkaz/test
foo.txt
/home/trejkaz/test/foo.txt
java.io.FileNotFoundException: Cannot find the file: foo.txt
As far as the first three lines go, everything looks okay. And looking in the current directory, you’ll see that “foo.txt” is indeed present. Yet, the file doesn’t exist… why?
Well, it turns out that non-absolute file paths are resolved relative to the new current directory. And what’s funny, and perhaps wrong on the part of the Java API, is that calling getAbsoluteFile() on that File object actually results in Java resolving it relative to the value of the “user.dir” property, instead of the real current directory. So it will appear that the file path is right, even though it’s wrong.
So this is why you should always call getAbsoluteFile() on any paths which are created which weren’t already absolute, and/or construct them relative to some other explicit path.
And I suppose it’s also why native libraries are bad, and why everything should just be implemented in Java in the first place. :-)
Tags java, programming