Posted by Trejkaz
Sun, 14 May 2006 23:49:00 GMT
In the software I develop at work, we extract data (and metadata) from all manner of different locations. Mail messages, office documents, and more recently, disk images. Extraction from disk images is a particularly interesting one, because it starts to become impractical to extract every file to the temp directory and process them from there.
So we went through a redesign such that many of our data handlers can handle arbitrary chunks of data which may not be ordinary files. Sometimes they come from a byte array, sometimes they come as a sub-slice of a larger chunk of data, and sometimes they come by gluing together multiple slices of data.
The data API we use is basically a cross between a RandomAccessFile and a ByteBuffer (in fact we have implementations which use each of these as a means of getting the data from a normal file) but supporting long indexes. Java’s MappedByteBuffer, amazingly, only supports int, which would mean not being able to process files larger than 2GB. Your 64-bit hardware is useless – Java won’t map past the first 2GB. (Sorry Sun, but disk images are much larger than 2GB, and we still need random access. This is why we have to write our own APIs for this stuff.)
A few parts of our system are still forced to extract files to disk, and largely there is nothing we can do about these. But I thought I would try to at least do something about handling zip files, as it’s fairly normal to encounter large zip files, and copying the files to the temp directory starts to become a serious overhead as the files become larger.
The ZipFile class in Java is, unfortunately, very limited. It can only take a File, and the data we’re processing is very rarely a real file. Ideally, I would hope to pass in something like a ByteBuffer, and we could make an implementation of a byte buffer backed by our own API.
So the only thing you can do in this situation is create a bug report (well, a feature request) and hope that Sun does something about it. But this is what happens.
It’s bad enough that they won’t even consider improving their API. But what really irks me is that they would assume we don’t need random access, when both of the suggested constructors I put forward allow random access. They didn’t even contact me to confirm this – just closed the bug at their own convenience.
So forget all the fluff about open source philosophy you will find on other sites. The real reason Sun needs to open source Java is so that bugs get fixed. That’s all the reason they need.
Tags java, opensource, sun | 4 comments
Posted by Trejkaz
Mon, 08 May 2006 13:24:00 GMT
Want to send instant messages from your Ruby on Rails app with the minimum amount of code and the maximum amount of testability?
Then say hello to… Action Messenger
Read more...
Tags actionmessenger, jabber, rails, ruby, software | no comments
Posted by Trejkaz
Mon, 01 May 2006 00:11:00 GMT
It occurs to me that now I have a laptop, I finally have a solution for keeping my work and home RSS feeds in sync – use the same computer for both.
So I was just cleaning up my RSS feeds (and Atom, of course, which is cooler) imported over from work, basically putting similar sites grouped together in folders so that there aren’t so many groups to click on. And it seems that in the same time a site like Slashdot or Engadget takes to get 20 stories (of which perhaps 10 are worth reading), Digg manages to amass about 200 stories (of which perhaps 10 are worth reading.)
Digg was actually quite good when I first subscribed to its RSS feed. But it seems that the longer it sticks around, the easier it is for a piece of absolute dogshit to get to the front page. These days, it seems like it’s more of an extension to del.icio.us, only with a bunch of opinionated bullshit tagged under the story title.
I guess it has something to do with moderation being completely controlled by the community. As a site amasses more users, it amasses more idiots, and these idiots digg the wrong stories. There becomes a point where there are enough idiots that if they all clicked on a bad story, the bad story instantly makes it to the front.
So as a result of the crapflood, I’ve had to move Digg out of the “News” group into a new category which for now I’m calling “Link Flood”. So I guess I can resort to that category when there is absolutely nothing else to read.
(While I’m on this topic, I’ve started to wonder why I still subscribe to Engadget. Their comment system clearly no longer wants me to make comments, because their validation emails don’t ever arrive. And their staff never answer my questions when I ask why they’re not sending the validation emails anymore. Perhaps my email address has made it onto a spam list somewhere, I have no idea.)
So in any case, I think it’s time for Digg to either reinvent itself, or for someone to start a new alternative. Perhaps something like Advogato where the people who rate the worth of each other, only apply it to the news submissions so that people who are worth jack shit don’t get their stories on the front page so easily.
Any takers?
Tags digg, feeds | no comments
Posted by Trejkaz
Sat, 22 Apr 2006 14:36:00 GMT
Phew… that was a hard few hours. :-)
This afternoon, I had decided to give Boot Camp a spin, mainly to see if I could use my laptop as a Windows testing environment for the app I develop for work.
Windows worked relatively well. Problem was, the machine would no longer boot back into OSX. This is a typical problem with Windows – it has a habit of clobbering every other operating system when you install it. I was under the impression that Apple had found a way around that problem, but it seems that they haven’t.
Anyway, a lot of people had the same problem, and a number of potential solutions were found, but none of them actually worked in my case. Most of the solutions involved booting from the OSX install disk, but the system didn’t even want to boot the disk. Eventually out of frustration, I left the machine turned off for a while, and somehow (luck?) on the fifth or sixth attempt the install disk finally booted.
So I had managed to get into Disk Utility, but it couldn’t find an error. Getting it to “repair” the disk just resulted in it doing nothing. So that route was useless, but it did give me a way to at least erase Windows. I erased it and then tried rebooting to see if that worked. Nope, then it just tried to boot from the erased disk, and failed because there was no OS on it.
But there was one way out of it all, the installer had a feature which forces the machine to boot from a given partition. I pointed it at the partition which wouldn’t boot, and it booted up properly into my main OSX install.
Then I just went into the Boot Camp app and returned my disk to a single partition, and everything is back in order again. Close call, but I have to wonder if there might be other untold problems it’s caused on my system.
Tags bootcamp, macosx, windows | no comments
Posted by Trejkaz
Sat, 08 Apr 2006 09:45:00 GMT
The MacBook Pro out of the box is a very nice machine, but there was a lot of crap I had to do to get it up to my usual working environment. I think after almost a week I’m almost done, and I took a few notes on the way… you know, in case they might help others choose whether to go with this system or not.
Read more...
Tags macbookpro, macos | no comments
Posted by Trejkaz
Fri, 07 Apr 2006 04:13:00 GMT
Right about now, I would have been somewhere between Kyoto and Tokyo, had my original holiday date not been postponed (it’s now set for September, on the assumption that I find someone to go with.)
But instead, I’m stuck in Sydney in autumn, in the shitty part of the season where it starts out freezing cold in the morning (so you wear a coat) and then ends up boiling hot in the evening (so you end up carrying the coat home.) Great stuff. I can’t wait for winter, at least the weather will be consistently cold.
The last three weeks have been a bit frantic.
A whole week was eaten by a house cleanup in preparation for a random house inspection. It really drove home a few things.
People should clean their crap up after they’ve finished using it. I generally do this, so as a result I didn’t have as much work to do as everyone else.
Professional cleaners are actually worth paying for. Or at least, they are worth it if the house is big enough. Particularly in a situation where nobody follows the roster past about four weeks, it will be much easier to just have housemates share the cost of the cleaner doing the work.
The current house has way too few places in which to store cleaning stuff. For instance, it wouldn’t have killed the owners to put in a fixed cabinet somewhere for storage of things like brooms and vacuum cleaners.
There is never enough capacity to throw out all the trash generated during a cleanup. This was especially true this time around as we ended up filling the general waste bin, the recycling bin, and generating 2m3 of bulky trash, and still having about 2m3 left over. It would be better if the council just asked you how much you had and picked it up in one hit, instead of imposing arbitrary limits which then necessitate rebooking.
Moving house is certainly an option from here but it’s always expensive and almost always means two weeks downtime while the DSL gets connected. Buying a house is starting to become an option too, but I think I’ll wait for a few more months before looking into that sort of thing. I can’t imagine being able to afford a very good place anyway. My limit under my current income is somewhere around $300,000… certainly a lot of small places do fall under that, but not any terribly good ones. Plus I’d have to make sure I can cover the mortgage payments by myself, because the banks don’t seem to care about income from renting out the extra rooms.
Other than that, the past week has been eaten up by setting up my MacBook Pro. like so many other people who bought this laptop, I’m affected by the buzzing CPU issue, which I’ve temporarily worked around by hacking QuietMBP to use less CPU, but just enough to stop the noise. Hopefully Apple will fix the problem in a software update, because I’d hate for it to be entirely a hardware problem.
Otherwise, the laptop is practically perfect. Mine got delayed, and by the time it arrived they were already into revision E, and all the original problems people were experiencing had been fixed (well, except the buzz.)
Tags macbookpro, meta, trypticon | no comments
Posted by Trejkaz
Mon, 13 Mar 2006 00:53:00 GMT
Is it just me, or has this weekend been particularly heavy with spam attacks?
First, I have my email spam. Somehow, a whole bunch of spams throughout the weekend completely evaded by server-side spam filtering. Thunderbird picked them all up as spam by the time I logged in from work though, so perhaps I can just go and re-teach the filter being used on the server. Or perhaps I can implement something like greylisting and stop a few spammers before they even get the mail into the server.
Next, I had the misfortune of being notified by Jabber of several dozen comment spams being made to my blog (Jabber notification is quite good for this sort of instant notification – I managed to kill said spams in no time at all.)
The first surprising thing about this spam is that I have disabled non-AJAX commenting on this weblog. Therefore, spammers either (a) know how to execute JavaScript in order to submit forms (which is an extremely scary possibility) or (b) have figured out how to detect Typo-based weblogs and submit the spam via a direct POST in the same way that the JavaScript would do it. Either is possible, given the persistence of spammers.
The spams also cut straight through Typo’s spam filter, so either they weren’t from known IP addresses, or they weren’t linking to known spam URLs. And many of them, even though the content was the same, were from many different IP addresses (side-note: if anybody ever tries to tell you that Windows is no good for distributed applications, these world-wide networks of zombied Windows boxes should be proof enough that it works fine for such applications.)
The next annoying thing was a significant amount of trackback spam. Trackback spam is particularly irritating because the entire point of trackbacks is to be automatic. You can’t have something automatic and prevent spambots at the same time. Thankfully though, the trackback spam was performed as a large number of trackbacks on a small number of articles.
In any case, the band-aid measure I’ve taken is to now block comments and trackbacks after 30 days. That way at least I only have to monitor the past 30 days for new trackbacks and comments, which is all on the front page of Typo’s admin interface.
The measure I’m probably going to have to take, however, is requiring a CAPTCHA for posting comments. Perhaps I can go with the trivial math problem approach, if spammers haven’t figured that one out already. At least that one is accessible, unlike image-based CAPTCHAs. Another way would be to require OpenID authentication for all comments, but that would only stall spammers until they set up their own OpenID servers.
For trackbacks, though, I don’t know what I can do except for turning them off… perhaps we just need a better database of known spam URLs.
Tags blog, meta, spam, typo | no comments
Posted by Trejkaz
Wed, 01 Mar 2006 22:35:00 GMT
As most of you who care probably knew already, the Server VM was somewhat faster than the Client VM for JDK 1.5.0. This has been a bit of a bother for us lately, as we have needed to use the Server VM for its speed, yet we’re also a GUI application which would benefit greatly from using the Client VM instead. In fact lately (JDK 1.5.0 updates 5 and 6) there have been some nasty compiler bugs which cause the Server VM to crash on some of our code.
This recent instability of the Server VM has resulted in me running benchmarks of the different Sun JVMs in order to determine exactly what we’d lose by switching back to the Client VM. This became the perfect excuse to test the new JDK 1.6.0 (Mustang) beta to compare it alongside the current version.
I have a little free time while waiting for the final one to finish so I thought I’d do a write-up so that people can see the kind of improvements that Java 6.0 will bring for general processing.
Read more...
Tags benchmark, java, mustang | no comments
Posted by Trejkaz
Mon, 20 Feb 2006 13:00:00 GMT
Well, it took a lot of fannying about, but I eventually got Jabber notifications working in Typo by taking the advice of people on the newsgroups and killing Jabber4R. It’s no longer maintained, so bugs crop up (due to changes in Ruby itself, I guess) and never get fixed.
I made a simple port of the notifications to use XMPP4R instead, which seems to be behaving itself for the time being. I ultimately should be using NetXMPP-Ruby though, because it supports TLS.
Now I just wait and see if Typo trunk accepts my patch and migrates to XMPP4R. Then I can unleash a bunch of other patches I was working on which integrated with XMPP4R. :-)
Tags jabber, typo, xmpp | no comments
Posted by Trejkaz
Wed, 18 Jan 2006 01:33:00 GMT
It finally happened: Google flipped the switch to allow Google Talk to join the public XMPP network.
This means that people who are on servers outside of Google Talk can finally stop being signed into Google Talk’s server, and start subscribing directly to their contacts who are stuck over there.
It also means that the public XMPP network grew quite a bit today, although I have no idea of the actual numbers. :-)
Tags google, jabber