Jul 05

[Updated and moved to my work site.]

May 10

RedPill 1.4.2 is out. Adds Tiger compatibility. I haven’t upgraded to Tiger myself yet, so let me know if you find any problems…

I was quite amused by the guy who wrote saying he was trying to get the source code to work under Tiger, and confessed that he didn’t know any C and could I help him? Right, yeah, I’ll do that.

Also, yes, I know Tiger doesn’t include StuffIt. I didn’t pack it using StuffIt, I packed it as a bzipped disk image with welcome dialog, but the guys at info-mac apparently have a policy that all such things must be unpacked and repacked with StuffIt.

Source code is at info-mac too.

Apr 22
  • Many common tasks take an absurd amount of code. For example, try producing a date in RFC2822 format, in the local time zone. Isn’t Java supposed to be a good language for Internet programming?

  • What’s with the special types that aren’t objects, like int? I just want to have integers, and leave the compiler to determine the appropriate implementation. In the worst case, I don’t mind specifying how wide I want my integers in advance, but please, can we have them act like proper objects (i.e. everything else)?

    I mean, it’s not like anyone is going to be using Java for systems programming, and need access to raw machine words, especially since there are no unsigned integer types available. (Bytes are signed?!)

    In Java 1.5 they try to make it less ugly by having the compiler automatically wrap primitive types in object wrappers when necessary. Which is still ugly.

  • Why must arrays be unlike every other type? For instance, array objects use variable.size() to find out their upper bounds, but arrays use variable.length. They’re not like the other special types either, because they can’t be converted.

  • A related gripe is that a little more consistency in method naming would be nice. As it is, Strings have a .length(), while arrays have a .length, Arrays have a .getLength(), and ArrayLists have a .size().

    The class library is riddled with these kinds of little inconsistencies that make it difficult to remember how to use everyday objects. To pick another example, “HashMap” is camel-cased, but “Hashtable” isn’t–and of course, fussy Java will complain if you capitalize differently.

  • Apparently too little thought has been put into doing things right the first time. That makes for a horribly bloated and messy API. So we have arrays, Array objects, ArrayList objects, and Vector objects, all solving the same problem in subtly different and incompatible ways. Ditto Hashtable and HashMap.

  • No integer overflow detection. For a language designed to be safe that forces me to pick a size of integer in advance, that’s pretty stupid.

  • No abstract local functions. I happen to find map operations very useful at times.

  • Some arguments are passed by value, some are passed by reference; it depends on their type.

  • Similarly, the types that would be passed by value are copied during assignments, the types passed by reference are not. Hence, assuming you remember something has been passed in by reference and want to copy it, you specifically have to clone() it to make a copy–and if it’s in an array, you have to clone each dimension of the array manually.

  • Comparing pass-by-reference variables checks if they’re references to the same thing, rather than checking if they’re the same value. To check if they’re the same value, you have to remember that they’re ’special’ and that you have to use .equals() instead of ==. Except it doesn’t work on arrays.

  • Constructors have their own special unique syntax. The compiler spots immediately if I don’t use the special syntax, therefore it clearly knows that the method is the constructor anyway, therefore it’s just arbitrarily making me jump through hoops for it.

  • You have to include constructors in subclasses, even if the code is identical to that in the parent class.

  • Importing packages doesn’t import dependencies. For instance, org.xml.sax.XMLReader is no use without org.xml.sax.helpers.XMLReaderFactory, so why make me import them separately?

  • There’s no way to break string literals across lines, so you have to concatenate them at run time.

  • Some of the documentation is really awful. I mean, does anyone understand RuleBasedCollator? Quote: “& Indicates that the next rule follows the position to where the reset text-argument would be sorted”. Is that even English?

    There’s also an unfortunate tendency for people to assume that JavaDoc is documentation. Often, it’s more like an adventure game, where you click around in a maze of twisty turny JavaDoc pages for hundreds of classes, hoping that at some point you’ll find some documentation that actually tells you how to perform some task you need to perform, rather than just telling you what lots of random methods do.

  • Startup times are terrible. They have been improved in Java 5 and improved further in Java 6, but they’re still bad compared with pretty much any other commonly used language, even interpreted ones like Ruby.

  • Exceptions are a pain. The ability to throw and catch exceptions is a good thing in any language; the problem with Java is the insistence that every exception be explicitly dealt with or thrown up the chain.

    In J2EE code, for example, we have the SQLException. It either indicates a syntactic error in the SQL code, or that the connection to the database is broken. There’s really nothing the application can usefully do with SQLException except report an error and die; yet because of the explicit exception rules in Java, code ends up bloated with repeated catching and rethrowing of SQLExceptions.

    It’s not clear to me that forced exceptions do anything to make programmers deal with possible problems anyway; try running Azureus and watch how many stack traces it craps to the console. Good programmers don’t need to be force-fed exceptions, and bad programmers just catch and ignore the exceptions anyway.

    The other side of the problem: if you’re implementing an interface that doesn’t declare any exceptions, you have to eat any exceptions that occur, no matter how deadly.

    I’m not sure what the solution is, but with Java I get the feeling that the cure being tried is of marginal effectiveness, and possibly worse than the disease.

  • Generics were added too late. Now we’ve got a massive class library full of unnecessarily ugly APIs.

  • Convenient iteration syntax was added too late as well (in Java 6). There are all kinds of classes which sound like they ought to be iterable, but aren’t. Like Arrays, for example.

  • String and StringBuffer make straightforward string handling unnecessarily painful. You can split a string, but you can’t modify it. You can find a string within a StringBuffer, but only in case-sensitive fashion. You can’t search a StringBuffer for a regular expression; you have to convert it to a String first–but to modify what you find, you have to convert back to a StringBuffer again, or create a new String (and pay the cleanup costs).

    So in typical input parsing, you end up repeatedly converting between String and StringBuffer objects. Even assuming that’s what the compiler would have to do anyway, why should I be forced to do it all explicitly? And if “replace string” is only available for StringBuffer, how come “replace regexp” is available for String? The whole division between the two has long ceased to make any sense.

  • In Java, null is a primitive value. But, as mentioned above, primitives like ints aren’t objects, and so neither is null. Which, in turn, means you can’t use null for null objects (for example when making objects optional)–you need to clutter the API with more methods, create empty objects, or implement a null object class.

  • IO is ugly. To do the normal “buffered write to a file”, you have to create a FileWriter object, wrap it in a BufferedWriter object to add buffering, then wrap that in a PrintWriter object to get a stream you can print to. That is:

    PrintWriter pw = new PrintWriter(new BufferedWriter(new FileWriter(new File(filename))));

    And don’t forget to explicitly flush and close the PrintWriter, or you’ll lose data. So much for safety! Input is just as bad. Is it too much to ask that the 90% of cases (stream-based reading and writing of text) be optimized with some kind of 1-step BufferedPrintToFileFlushOnDispose class?

  • I find the use of “static” to indicate “class scope” unnecessarily confusing. To put it another way, I find it hard to remember that “final” means “static” (unchanging) and “static” means “class”…

    Of course, the reason why “static” is used for class methods is that Java doesn’t really have class methods; static methods are static in the sense that they’re really global functions that can’t be overridden and are shared between all objects of that class. Perhaps the word “shared” would have been less confusing?

  • The insistance on a separate file for each class fragments the source code and makes it harder to keep track of where things are–or alternately, encourages the “one huge class” antipattern. The insistence on making the directory structure match the package naming adds the annoyance of having to navigate in and out of directories multiple layers deep even for a trivial project.

  • Date and time handling is broken (it uses POSIX epoch values internally), and the methods and classes are confusingly named. For example, Calendar.getTime() returns a Date, because Date objects are actually date and time objects.

Things I really like about Java

  • Unicode everywhere, no special effort required.
  • It’s pretty much portable between Linux and OS X, without having to worry about contortions like autoconf.

I don’t include automatic memory management in the list, because that ought to be a given for any modern programming language.

I should also point out that for all its faults, Java is still better than C++.

Apr 06

When I was hunting around for a free version control system, people recommended Perforce. I wasn’t happy with a non-open-source version control system, however.

BitMover Software have now demonstrated why, far better than I could have hoped to. They encouraged Linus Torvalds to move the Linux kernel source tree into BitKeeper, offering him and other kernel developers free licenses for BitKeeper on Linux.

Then once everyone was using BitKeeper, they decided they didn’t like the fact that a developer at OSDL was reverse-engineering their file transfer protocol in his spare time. In a fit of pique, they have decided to yank BitKeeper for Linux, because OSDL refused to fire the guy. They apparently won’t even give Linus a free license to continue using BitKeeper in the long term, unless he agrees to their demands that he stop working for OSDL. In spite of which, he’s putting a surprisingly positive spin on the whole thing.

As a Mac owner, I’ve obviously seen some pretty user-hostile behavior by computer companies; but BitMover have taken it to a whole new level. So no, I definitely won’t be touching Perforce, or any other proprietary version control system. Perforce may seem reasonable now, but as BitMover have just shown, that’s no guarantee of continued reasonableness.

Apr 01

SQL is a dinosaur of a language, designed for the bad old days when computers enforced a fixed size and format for every kind of data, everything was upper case, and if you didn’t like it you went back to using paper. After all, disk space and CPU time were expensive, so you didn’t want people wasting them with pesky unaltered real-world information.

As such, SQL doesn’t have variable-length strings. Oh, sure, it has VARCHAR as well as CHAR, but VARCHAR only gives you a string that can be any size up to some fixed length. That’s as opposed to CHAR, which is even dumber, padding out all your strings to a fixed length. So all you gain from VARCHAR is the ability to get "John", "Smith" rather than "John            ", "Smith           " as output.

So, what happens if Mr Apu Nahasapeemapetilon suddenly joins the company, and your SURNAME VARCHAR(16) overflows? If you’re running MySQL, the name is silently truncated, and your database outputs garbage. If you’re running PostgreSQL, the database chokes and your system falls over with a run-time error, which is what the SQL standard apparently says should happen. So, would you like your leg amputated with a bread knife, or a rusty hacksaw blade?

Let’s suppose that you want to use PostgreSQL, so that you can have advanced features like, oh, the database actually remaining consistent if there’s a crash, and constraints that actually do something. Let’s also suppose that you don’t care too much about data being truncated, so long as you don’t personally have to deal with a support call every time some idiot decides to get creative with a product name.

Now you have a problem, because the program that’s feeding information into the database needs to know how much data it can put in each field without the database choking. The database knows this, of course; it’s part of the schema you used to set up the tables in the first place. But you don’t want the information in two places, because then if you change it in one place, you need to remember to change it in the other.

So, what’s the solution? Surely there has to be one?

Mar 02

Last night I wrote my first program in Ruby. So far I like it, a lot.

I’d been intending to learn something better than Perl for a long time. Perl is very useful, and with CPAN you can get a great deal done in very little time. (Thanks, Jarkko—how’s it going?) However, it’s a really crufty and syntaxy language, and the way it supports object-oriented programming is so butt-ugly and riddled with pitfalls that I had never bothered to get to grips with it enough to write my own classes. People complain about regular expression code in Perl being write-only, but to my mind it’s the OO code that’s the real problem. Not just that it’s ugly; as the tutorial says, “Yes, this is a lot of work”. Using OO shouldn’t be a lot of work, that’s the whole point of it!

I think that the Perl motto of “There’s more than one way to do it” has become something of a convenient excuse for not making difficult decisions. I don’t need two or three different syntaxes for method invocation, particularly not if they’re all ugly. I don’t need a variety of different types of scoping, and any language where the simplest and cleanest scoping options still require a lengthy tutorial needs to be taken out and shot.

I read the Apocalypses. I wanted to know if Perl 6 was going to become the language I wanted, in much the same way that Perl 5 had become a tolerable version of Perl 4. It’s clear that Larry Wall understands how awful Perl’s OO system is, in that he adds a whole bunch of fundamental problems to my rather superficial ones. Unfortunately, the proposed new clean OO for Perl 6 looks to have a lot of the ugly and arbitrary syntax that is a familiar feature of Perl—like $.x for a public variable and $:y for a private one, *%_ for a variable list of arguments, all the way up to abominations like %::*::($package)::method. And still “There are too many ways to do it”, for example in regular expressions you have a whole new syntax, plus you can use m:p5/ and use all the old Perl 5 syntax. Or hey, why not use both?

And Perl 6 isn’t even out yet. Who knows what kinds of compromises will be made before it becomes real? And it’s clearly going to have a major learning curve for Perl 5 users, so much so that some people are grumpily saying they’ll stick with Perl 5. Me, I’m always ready to learn a new language—but if I’m going to spend a few months of my time learning Yet Another Programming Language, I’d like it to be one that’s (smaller|simpler|easier|more consistent) than the ones I know already, not one that needs a big wall chart to summarize its operator precedence rules.

Ruby isn’t perfect, of course. I wish it had support for variable declaration, with an equivalent of use strict. Unfortunately, the Ruby zealots seem to have an unjustified religious objection to allowing variable declaration, and I was bitten by the lack of declarations twice last night alone. Still, I was writing proper OO code with recursion inside a couple of hours, and that has to count for something.

The other two big problems with Ruby are performance, and Unicode. It kinda lacks both. However, once Parrot has matured a bit, Cardinal should solve both problems at once.

Jun 04

I want a proper version control system.

Options I’m aware of are: CVS, Subversion, Arch, Monotone, Vesta, OpenCM, Perforce, BitKeeper and darcs.

As usual, I have some requirements which I would like to think are the bare minimum any system should support:

  • It must handle direct renaming of files. That eliminates CVS and Perforce.
  • It must support copying files while retaining their history. That eliminates Aegis and OpenCM.
  • It must handle filenames with spaces in. That eliminates Arch.
  • It must let me check out part of a repository, not just the whole lot. That eliminates Monotone.
  • It has to work without needing specific kernel versions or other hacks. That eliminates Vesta (which needs NFS and a 2.4 Linux kernel).

So, that leaves me with darcs or Subversion. I don’t like the fact that Subversion stores its data in DB files, but I also don’t like the fact that darcs is written in a language very few people know, is more or less a prototype, and is basically only being worked on by one person. It may also fail some of the above requirements, I don’t know.

Anyone know of any other options?

The frustrating thing is that Arch does almost everything right, except for being utterly broken in its filename handling and mandatory naming conventions.

Mar 24

REXX

Every now and then I read an article about REXX, a scripting language designed at IBM and popularized on the Amiga. The authors of such articles generally enthuse about the language in a low-key kind of way, and I find myself wondering if maybe I should learn it.

Then I go away and find a REXX FAQ and tutorial, and I read for a bit, and I realize that no, I shouldn’t. So for my own benefit (when I later archive and index this part of my journal), here’s a quick list of reasons why I should never go near REXX:

  1. Functions can’t return multiple values, nor can they modify their arguments. If you want to write a function which returns two values (say), you need to use a magic string which you think will never occur in either of them as a separator, concatenate, return the result, and then split it apart again.

  2. Whitespace is significant—it’s interpreted as concatenation. Hence mistyped or syntactically invalid statements are quite likely to be reinterpreted as some kind of concatenation of variables. (And I thought Python was bad.)

  3. Using an undefined variable is not considered an error. Instead, it just defaults to having a value that’s the same as its name, only in upper case. Truly foul, especially when combined with misfeature #2 above.

    So if (for example) you put whitespace between a function name and the brackets surrounding its arguments, it suddenly stops being a function call and becomes a concatenation of strings instead. Pass the barf bag.

  4. REXX normally guesses continuations, by assuming the next line is a continuation of the current line if the current line doesn’t look like a complete statement.

  5. Comma is used both to separate function arguments, and to indicate explicit continuation. So in spite of #3, you can’t just break a long list of function arguments across multiple lines—you have to turn the last comma on each line into double-comma, or you get something completely not what you intended. Ugh.

  6. You’re allowed to use variables that have the same names as words used in the language itself.

  7. Scoping is dynamic. Functions and procedures are just a hack whereby the system temporarily hides all variables except the listed ones, until it next hits a return. Not that you have to; it’s quite possible to write functions with overlapping scope.

  8. Forget about associative storage, REXX doesn’t even have arrays. You can simulate them with‘compound variables’, but then there’s no type checking or bounds checking. If you want any, you have to write it yourself.

  9. You can’t pass arguments by reference. In fact, you can’t pass them by value either. Instead, you have to pass constants, and have the function or procedure use those constants to calculate the name of the variables it should use.

Still, I’m sure it’s better than JCL.

Anyone want to convince me of the beauty of REXX, in spite of the above? If so, give it your best shot.

Mar 10

The Eclipse project has developed a toolkit which allows you to write Java applications via an application toolkit called JFace. This in turn uses a toolkit called SWT to provide native user interface objects to your Java program. End result: native look and feel, faster applications.

So, if you’re a software developer, go take a look at the Eclipse web site. See if you can spot what’s missing.

Hint: It’s something incredibly important. So important that it ought to be on the main navigator. Yet incredibly, it isn’t.

OK, I’ll tell you what’s missing from the Eclipse project.

There’s no fucking documentation.

Sure, there are a bunch of articles they link to. But an article in Java Programmer Monthly does not constitute an API reference manual.

Nor do the crappy comments extracted from the source code via JavaDoc. Yet as far as I can tell, that’s all the documentation that exists for the entire project inside or outside IBM.

Yes, there are third party books, or at least there will be soon. But third party books aren’t a good substitute for actual documentation either, because (a) they’re usually out of date by the time they’re printed, (b) they’re often written by someone with appalling programming habits, and (c) in this case the authors of the books are likely having to guess what the hell the code does as well.

Lack of documentation is a defect. It indicates crappy development practices. Any time someone suggests that you use a particular toolkit, ask to see the documentation. If it isn’t clear and well-indexed, run like hell in the other direction, because it’s a fairly safe bet the code will be even worse. (Hello, Struts.)

Dec 29

Red Pill 1.4 is out.

It’s now Open Source under the GPL, so you can download the source code and hack away.