Web Death by Strings

Posted by Uncle Bob on 01/04/2007

Communication between web clients and servers is dominated by strings. This leads to complex and horrific problems of coupling, and fragility. Where are the rules?

I am in the enviable position of working on two web systems at the same time. One is a ruby-on-rails system for tracking substitute teachers. The other is a JEE system for managing the contents of a library. The point-counter-point of this happy coincindence has illuminated something that has tickled my subconscious for years. The world of Web programming is a world of pathological string manipulation.

Take, for instance, the library system I am working on. One of the pages in this system manages the books in the library by their ISBN, and by their copy ids. Let’s say we had 3 copies of ISBN 0131857258. The page would have a table row for the ISBN that contained a check box for each of the three copies. If the user checks the checkbox, the copy will be deleted from the library. Another checkbox in that row is named “Delete all”. When the user clicks that check box, all the other check boxes in that row are automatically checked, and all copies of that book are eliminated.

Now, think about this from an HTML point of view. How does the server know which copies should be deleted? That’s easy, the server builds the HTML for the page, so it simply gives a special name to each checkbox. When the form is submitted the names of the checked checkboxes are sent back to the server. So all the server has to do is to give each checkbox a name that identifies the copy it represents. We chose a syntax similar to: “delete_432”, which would be the name of the checkbox that represents the deletion of the copy whose id is 432.

Notice the string manipulation? We have encoded server side information in a string that is sent to the client, and we expect that information to come back to the server unchanged. While this makes perfect sense, any good software designer should feel a bit queasy about it. Depending on strings to encode information like this feels just a little bit reckless. It’s manageable, but it’s icky.

Today that ickiness got a lot worse for me. Dean Wampler is working with me on the library project. He was working on the JavaScript to make the “delete all” checkbox work. Now copy ids are globally unique. No two copies, regardless of ISBN, share the same copy id. So when the ‘delete_nnn” comes back to the server, the server does not need to know which ISBN the book belongs to. It just happily deletes copy ‘nnn’. However, Dean needed get his client side JavaScript to set only those checkboxes that corresond to the ISBN of the ‘delete all’ button. The client does not know which copies correspond to which ISBNs. To solve this problem he changed the format of the checkbox name to ‘delete_ssss_nnnn’ where ssss is the ISBN, and nnnn is the copy id. This allowed him to write the JavaScript to look for all the delete buttons that corresponded to the appropriate ISBN.

Of course when he made that change, he broke my server code which was looking for ‘delete_nnnn’. Fortunately I had unit tests that detected the problem instantly. (I truly pity those poor programmers whose only means to stumble accross errors like this is to deploy the system to test and work through the pages manually!) This would have been easy for me to repair on the server side; and I was tempted to do so, simply in the name of efficiency; but my conscience wouldn’t let me.

Why should a client-side JavaScript issue have any impact on the server code? Answer: It shouldn’t!. This is software design 101. Don’t couple different domains!

So I talked it over with Dean and we quickly realized that he could change the JavaScript to use the the ‘id’ attribute of the checkbox tag. The server would construct the page with the id’s set correctly, and the checkboxes would retain their normal name of ‘delete_nnn’.

There is a general rule here somewhere. It’s something like: use names to communicate with the server, and use ‘id’ attributes to communicate with the client. Or, rather, don’t break server code to make client side javascript work.

I’ve had similar string issues with the ‘Substitute’ system I’ve been working on in Rails. In this case I am using Ajax to allow users to type the names of substitute teachers and quickly pop up a list of possible teachers. So if you type “B” into the “Substitute” field, you quickly see a menu of all substitues whose name begins with “B”. As you type more letters the list gets smaller. You can pick a name from the list when it’s convenient for you.

This works great, but has one gaping flaw. The server is looking these names up using SQL statements and is then populating the list in a convenient format. So, for example, it will put “Bob Martin” into the popup list, constructing the name from the first_name and last_name fields of the Substitute record. It is this constructed name that comes back to the server in the form when the submit button is pressed. But the constructed name is not the key of the Substitute record! So how does the server know which substitute has been selected? It could break apart the string “Bob Martin” into “Bob” and “Martin” and then do a query against first_name and last_name, but I hope you share my disgust with that solution! Not only is it inefficient, there are just loads of opportunities for error and fragility. (Just think of honorifics, suffixes, prefixes, middle names, etc.)

My solution, which I dislike almost as much, is to encode the id of the substitute along with the name. So the string that actually pops up in the menu is “(384) Bob Martin”. OK, OK, I know this is bad, and I intend to fix it once I learn how to get the JavaScript that pops up the menu to load a hidden field. But I don’t know how to do that yet, and I am agahst that I need to learn it! It seems to me that being able couple a pretty name to an unambiguous ID is such a common thing to do that I would not have to resort to the deep mysticism of javascript to achieve it.

Ah well, the web is hell. That’s all I can really say about this. Web programming is probably the worst programming environment I have ever worked in; and I’ve worked in a lot of programmign environments. Not only is it flogged by commercial hype that tries to make it seem much more complicated than it is; but it’s so poorly conceived, and so sloppily put together that it is, frankly, embarrasing.

Comments