The Sorry State of SAAJ

When I started working on Spring Web Services some three years ago, I needed a way to represent a SOAP message. I was a good boy and used J2EE, or more specifically: the SOAP with Attachments API for Java™, also know as SAAJ, also known as the package javax.xml.soap. Founding Spring-WS on a specification seemed like the Proper Thing To Do, and gave me a warm fuzzy feeling inside.

Over time, I found out that SAAJ is quite possibly the worst implemented piece of J2EE Java EE out there. Now I’m sharing the pain with you in this post because, well, sharing is the first step in acceptance.

A Bit Of History

SAAJ started out as being part of the Java™ API for XML Messaging (JAXM). JAXM actually was quite nice: none of this “look ma, it’s just a method call!” nonsense that JAX-RPC offered. Unfortunately, JAXM was pulled in favor of JAX-RPC. Some of JAXM survived, namely the representation of a SOAP message, and was rebranded SAAJ 1.1.

SAAJ 1.1 basically offered a DOM-like representation of a SOAP message: SOAPEnvelope has a SOAPBody, and an optional SOAPHeader, and so on. All elements extended SOAPElement, which in turn extended Node. javax.xml.soap.Node, that is, not org.w3c.dom.Node, which - coincidentally - was the main problem with SAAJ 1.1: there was no easy way to integrate SAAJ with existing JAXP code. Even though SAAJ was similar to org.w3c.dom, there was no clear conversion path, besides writing the whole message to a buffer and reading that with JAXP. Yay.

The wise folks at the JCP decided to change that in SAAJ 1.2: in this version, all SAAJ classes extend their W3C DOM counterparts. So javax.xml.soap.Node extends org.w3c.dom.Node, SOAPElement extends Element, and so on, and so forth. Not everybody thought this was a clever idea; clearly Hani was not among those in favor. At this point, SAAJ was added to J2EE 1.4. Some J2EE vendors seem to agree with Hani as to whether this was a good idea, more about that later.

SAAJ 1.3 introduced some more stuff: support for SOAP 1.2, MTOM, and more. As a result of this plethora of features, SAAJ graduated from Java EE to Java SE in version 6. I think this was probably more a side-effect of SUN putting JAX-WS in Java 6, which is subject to a different rant altogether (short rant preview: had I implemented JAX-WS, I would be pretty pissed at SUN. Aren’t there laws against this sort of bundling?)

The SAAJ Saga continues

So, as of SAAJ 1.2, a SOAPElement isa W3C DOM Element. This makes sense, because now you can use JAXP to put XML content into the some message. For instance, you can use standard XML-DSIG or XML-ENC libraries to sign or decrypt a SOAP message, handy when doing WS-Security. Or, you can read a bit of XML from a file, and dump that into the SOAP body as a payload. The only caveat mentioned in the javax.xml.soap javadoc is that:

an application that starts to use SAAJ APIs on a tree after manipulating it using DOM APIs must assume that the tree has been translated into an all SAAJ tree and that any references to objects within the tree that were obtained using DOM APIs are no longer valid.

No problem, we can do that. Just don’t keep any references dangling around. So we might write something like:

MessageFactory messageFactory = MessageFactory.newInstance();
Transformer transformer = TransformerFactory.newInstance().newTransformer();
SOAPMessage message = messageFactory.createMessage();
Document document = createDocument();
SOAPBody body = message.getSOAPBody();
transformer.transform(new DOMSource(document), new DOMResult(body));
body = message.getSOAPBody();
transformer.transform(new DOMSource(body), new StreamResult(System.out));
message.writeTo(System.out);

Here, we create a new SAAJ message using the MessageFactory. Next, we create a W3C DOM document, and we use a javax.xml.transform.Transformer to transform the document to the SOAP body. As a result the root element of the document should be appended to the SOAP body, it will become the payload of the message. Next, we write the body to the console, and finally we write the whole message to the console. Simple enough, and conforming to the spec.

Yet, this simple program does not work on the majority of Java EE servers.

The Acid Test

I created a simple Servlet which basically does the above, and ran it in a wide variety of Java EE application servers. Here is an overview, all J2EE 1.4 or higher (with the exception of Tomcat, of course, I used that as a baseline and to test the Axis 1 SAAJ implementation). I’ve listed the name, the MessageFactory implementation provided by the server, the test result, and the exception given - if any. Finally, I’ve submitted bugs where possible, because I am a Good Citizen.

Application Server MessageFactory Result Exception Bug
Geronimo 2.1.2 with Jetty org.apache.geronimo.webservices.saaj.GeronimoMessageFactory
Geronimo 2.0.2 with Tomcat org.apache.geronimo.webservices.saaj.GeronimoMessageFactory TransformerException when transforming Document to SOAPBody GERONIMO-4029
AXIS2-3808
GlassFish v2ur1 com.sun.xml.messaging.saaj.soap.ver1_1.SOAPMessageFactory1_1Impl
JBoss 4.2.2 org.jboss.ws.core.soap.MessageFactoryImpl IndexOutOfBoundsException when transforming SOAPBody to stream JBWS-2186
OC4J 10.1.3.1 oracle.j2ee.ws.saaj.soap.MessageFactoryImpl TransformerException when transforming Document to SOAPBody
Tomcat 6.0.16 with SAAJ RI com.sun.xml.messaging.saaj.soap.ver1_1.SOAPMessageFactory1_1Impl
Tomcat 6.0.16 with Axis 1.4 SAAJ org.apache.axis.soap.MessageFactoryImpl
WebLogic 9.2 weblogic.webservice.core.soap.MessageFactoryImpl UnsupportedOperationException when getting SOAPBody
WebLogic 10.0 weblogic.webservice.core.soap.MessageFactoryImpl UnsupportedOperationException when getting SOAPBody
WebSphere 6.1 com.ibm.ws.webservices.engine.soap.MessageFactoryImpl

Of this bunch, WebLogic is the clear winner. Every time you call SOAPMessage.getBody(), WebLogic barfs up a UnsupportedOperationException, saying that “This class does not support SAAJ 1.1″! The thing is: it supports SAAJ 1.1 just fine, it just doesn’t do SAAJ 1.2. You know, J2EE 1.4 and all that.

You Can Help!

Obviously, I am missing WebSphere in this overview. Because I run OS X, I can’t install WebSphere. So if you want to be a Good Citizen too, you can run my little test program on WebSphere, or any other app server I have missed. Update: craig has given me the info for WebSphere, and I’ve updated the table accordingly. Thanks! I’m still interested in any other J2EE 1.4+ app servers I’ve missed, though.

Or if you don’t trust me, you can test it yourself. Just deploy the WAR, and let me know in a comment below.

saaj-test.war

If you don’t trust me at all, you can get the sources. It has a build.xml and all:

saaj-test.zip

Let’s hope that this post solves the issues with SAAJ, because I Want To Believe.

Comments (8)

On bytes, chars, Strings, XML and Unicode

I entity Unicode

Strings

What does this print?

byte[] buf = new byte[]{'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd'};
String s = new String(buf);
System.out.println(s);

Obviously, the answer is the infamous “Hello World”. Unless you live in China or Japan. More about that later.

Let’s make it a bit more exiting:

byte[] buf = new byte[]{'H', (byte) 0xE9, 'l', 'l', 'o', ' ', 'W', (byte)0xF8, 'r', 'l', 'd'};
String s = new String(buf);
System.out.println(s);

Before you answer, let me tell you that 0xE9 is the ISO-8559-1 representation of é, and 0xF8 is the representation of ø. ISO-8859-1 being the character encoding that is used in most West-European countries. So, you would figure that this would print “Héllo Wørld”, right?

Well, it depends. On my Mac, it prints “HÎllo W¯rld”. On my Windows VMWare instance, it does print the correct string. What’s up with that?

The issue here is the implicit String constructor that’s used. According to the documentation of the String(byte[] bytes) constructor, this “constructs a new String by decoding the specified array of bytes using the platform’s default charset.” The default character encoding on OS X is Mac Roman. On Windows, it’s Windows-1252, which is almost, but not quite, entirely unlike ISO-8859-1. Hence the decode mixup. The way to make it would would be to use the other constructor, where you can specify a charset:

String s = new String(buf, "ISO-8859-1");

After working with Java for more than ten years, I still can’t see why SUN added the byte array “convenience” constructor. It’s not convenient at all. If anything, it’s inconvenient, because it causes many bugs. This is especially true in Enterprise apps, where you really don’t want to depend on the language settings of the underlying operating system to figure out how to encode your strings. It all works fine in the US on Windows, but when someone deploys your app in - say - Japan, you’re screwed.

InputStreams

There are a whole bunch of these “inconvenience” constructors in Java. Consider this:

byte[] buf = new byte[]{'H', (byte) 0xE9, 'l', 'l', 'o', ' ', 'W', (byte)0xF8, 'r', 'l', 'd'};
BufferedReader rdr = new BufferedReader(new InputStreamReader(new ByteArrayInputStream(buf)));
String s = rdr.readLine();
System.out.println(s);

As it turns out, the InputStreamReader(InputStream in) also uses the default character set. Bad Java! Bad! We should have done

BufferedReader rdr = new BufferedReader(new InputStreamReader(new ByteArrayInputStream(buf), "ISO-8859-1"));

By this point, Ruby developers will think that this is further proof as to why Java is so bloated, because of the verbosity. Well, scr�w you, and don’t come back until your language has proper Unicode support.

Java developers, on the other hand, will think that if they stick to using Strings everywhere, they’re good. After all, Java’s String is Unicode, right? Well, not really. As explained in the String javadoc, a String is made up of of UTF-16 encoded chars, which are exposed by toCharArray()), for instance. So the String is still decoded, but to a wide character array, rather than a byte array. The only way to properly deal with Unicode in Java is to use the Character class, more specifically its codePointAt and related methods.

If you’d been a good boy or girl, you would have read Joel’s The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!), and you’d already know this. But wait, there’s more!

XML

If you add XML to the mix, it gets more interesting. You probably know that every XML file starts with a declaration, like so

<?xml version="1.0" encoding="UTF-8"?>

In fact, the encoding part is unnecessary. If you leave it out, an XML parser will default to UTF-8, unless the file begins with a Byte Order Mark, then it’s UTF-16. So far, so good.

So what if I create a String containing XML, like so:

String s = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><content>Hëllo Wørld</content>";

In effect, we have two character encodings at work here: UTF-8 as defined in the declaration, but the String itself is UTF-16, as we just discovered. Doesn’t that confuse an XML parser? Let’s see, by using SAX:

ContentHandler handler = new DefaultHandler() {
    public void characters(char ch[], int start, int length) throws SAXException {
        System.out.println(new String(ch, start, length));
    }
};
XMLReader xmlReader = XMLReaderFactory.createXMLReader();
xmlReader.setContentHandler(handler);
String s = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><content>Hëllo Wørld</content>";
xmlReader.parse(new InputSource(new StringReader(s)));

which prints the all familiar “Héllo Wørld”. This is a nice trick, and I am not completely sure how it works. I think that SAX completely ignores the encoding in the XML declaration, and uses UTF-16. But I could be wrong. This doesn’t mean that we can’t confuse it further, by replacing that last line with:

xmlReader.parse(new InputSource(new ByteArrayInputStream(s.getBytes())));

So the XML parser is smart, but not brilliant. I, for one, don’t want to rely on this automagical encoding process, and I would recommend you handle raw XML as bytes, not Strings. It’s the XML parser’s job to turn the bytes into Strings, and it is probably a lot better at it than you are. Coincidentally, this is also the reason why, in Spring Web Services, the JMS transport defaults to using a BytesMessage, rather than TextMessage.

Conclusion

There are two simple lessons here, which keep you out of encoding hell:

  1. Never rely on the default encoding with converting bytes to Strings.
  2. Handle raw XML as a series of bytes. Use a parser to turn those bytes into Strings.

Comments (4)

I can see my house from here!

Ivory Tower Some time ago, I have heard a developer say something like:

You know, when developing an application, I really don’t care whether I am exposing services as EJB, SOAP, CORBA, REST, JMS, or any other infrastructure. It should just be a matter of configuration.

To which I replied:

Yeah, these things tend to look the same from up high in an Ivory Tower.

Comments (2)

Conforming to Zawinski’s Law

Over at the SpringSource Team Blog, I’ve just written a post about the new features in Spring Web Services 1.5. One of the new features is the email transport, thereby conforming to Zawinski’s Law of Software Envelopment:

Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can.

I’m happy to announce that Spring-WS 1.5 is ready to replace some other programs!

Comments off

REST FAQ

Apparently, the REST FAQ is hosted at:

http://rest.blueoxen.net/cgi-bin/wiki.pl?RestFaq

Quite a RESTful URI! ;)

Comments (7)

« Previous entries ·