Developing Feather-Weight Webservices with JavaScript

Using JS code libraries

Getting the code into the code from outside the code

We have our object code all nice and neat now; see Args() and Query catching for fun and profit. It’s so useful that we can’t wait to use it in all our service scripts. Pasting all that code over and over into every little thing we try to write. Joy!

Reusability changes everything

We have run into a veritable tragedy of JavaScript. It has no way to import code into code.

To use new Query() and new Args() to create our objects, we have to include all the code for them and we have to put it at the top of any service script we write. That means we just pushed our real program code down a couple hundred lines. Worse, we’ve broken good coding standards by cutting and pasting the identical code into a couple dozen places.

Besides the fact that they totally clutter up the scripts they are in, the different copies of Query() and Args() code will now either drift apart or become a maintenance nightmare.

While in theory this problem is easily surmounted but we give up some of the gains we’ve made. While JavaScript can’t import JavaScript directly, HTML has no trouble importing all you want. So we have to go back to multiple scripts to import libraries.

As a sidenote, you don’t see this kind of thing getting much treatment in some of the better books like Javascript: the Definitive Guide or JavaScript & DHTML Cookbook. The reason is, without being able to separate classes out of code, they just a pain to reuse. Maybe we have a way of abstracting out that difficulty.

Lost gains

<script type="text/javascript"
 src="http://elektrum.org/js/QueryObj.js"> </script>

<script type="text/javascript"
 src="http://elektrum.org/js/ArgsObj.js"> </script>

<script type="text/javascript"
 src="http://elektrum.org/js/fakeService.js?some=args;go=here">
</script>

fakeService.js can now use the objects defined in ArgsObj.js and QueryObj.js. The problems are obvious. We went from a single script back to multiple scripts, and not even just two anymore but three. This also counts on the client not misusing the scripts or leaving one out.

To account for the potential misuse we would have to wrap all our service code in try()s and catch()es in case it’s going to throw exceptions from missing library code. Lame and not what we want if there’s any way to avoid it.

Saved by the D to the HTML

Fortunately, we can squeak by again with a little ingenuity. As we already discussed, we can’t import code directly into code. We can import it into HTML. We can also write our HTML dynamically. Therefore, we can write the scripts we need into the HTML from a single master script.

It adds a layer of complexity for the service code but it keeps the client code simple. It’s back to a single script call. And the increased implementation complexity isn’t monstrous. We gain much more by being able to keep our library code, ArgsObj.js and QueryObj.js, in its own reusable files.

Now, instead of requiring the client to do this.

<script type="text/javascript"
 src="http://elektrum.org/js/QueryObj.js"> </script>

<script type="text/javascript"
 src="http://elektrum.org/js/ArgsObj.js"> </script>

<script type="text/javascript"
 src="http://elektrum.org/js/fakeService.js?some=args;go=here">
</script>

We can go back to this.

<script
 src="http://elektrum.org/js/fakeService.js?arg1=It%20works;arg2=Yay!"
 type="text/javascript"></script>

We just need an intermediary to write them all out. We’ll call our <script> writing script, fakeService.js, and move the original service code to realFakeService.js. That way the client only sees and uses the name fakeService.js.

fakeService.js

document.write('<script type="text/javascript" ' +
               'src="http://elektrum.org/js/QueryObj.js"></script>');

document.write('<script type="text/javascript" ' +
               'src="http://elektrum.org/js/ArgsObj.js"></script>');

document.write('<script type="text/javascript" ' +
               'src="http://elektrum.org/js/realFakeService.js"></script>');

And the script it calls with the third line of its output starts something like this.

realFakeService.js

// everything is imported already so we're good to go with our custom
// object classes
var query = new Query();
var args  = new Args();
// ...and roll from there

It’s never easy

Every solution spawns a new concern. Our trick for getting the currently executing script just went out the window because we’ve buried it in a stack of two or more scripts. In this specific case it’s four scripts deep.

  1. The single script called by the client, fakeService.js, which writes out the <script> tags for the libraries and the actual service script, realFakeService.js. But it doesn’t have access to the arguments yet because the ArgObj.js isn’t loaded until fakeService.js is done executing.
  2. The first library <script> for ArgsObj.js written by #1.
  3. The second library <script> for QueryObj.js written by #1.
  4. The service, realFakeService.js, written by #1 which now needs to get at the query string arguments left behind.

That means that the self-seeking code

var scripts = document.getElementsByTagName('script');
var myScript = scripts[ scripts.length - 1 ];

returns script #4 instead of #1 which is where the query string arguments are. This does get a little involved but isn’t too hard to fix. Here’s one way.

Find by depth

function findSelf ( depth ) {
  if ( ! ( depth > 0 ) ) depth = 1;
  var scripts = document.getElementsByTagName('script');
  var index = scripts.length - depth;
  var myScript = scripts[index];
  return myScript;
}

Which we’d then use like so.

Inside realFakeService.js

var myScript = findSelf(4);

That doesn’t really feel satisfactory. We’re spreading complexity across script boundaries which isn’t good. It also means you must have a count of how many scripts you’re writing out inside fakeService.js, and use that knowledge inside realFakeService.js. This is prone to error and update disconnect if a library script is added or removed but the depth setting isn’t changed.

There are always options

Find by recursion

function findSelf () {
  var scripts = document.getElementsByTagName('script');
  for ( var i = scripts.length; i >= 0; --i ) {
    if ( scripts[i].src.match(/^[^\?]+\?/) ) return scripts[i];
  }
  // none has a query string, so default to most recently seen
  return scripts[ scripts.length - 1 ];
}

And that’s pretty close to the sweet spot. We could use it just the way we used our original and we can use it in an arbitrarily deep stack of imported scripts too. It does what we want regardless of the context.

It does have two assumptions built in.

  1. There will be either one or no src with a query string or if there are multiple invocations, each one has exactly one query string; if one didn’t and a previous one did, it would pick up the previous one—bad.
  2. There will be no intervening, unrelated scripts with a query string in its src.

#2 is a safe assumption. We’re using this technique specifically so we never split our own scripts up so there will never be intervening scripts.

#1 is not a safe assumption. To fix #1, we have to move to a technique along these lines which lets us cache a note as to what we’ve already seen as we move along.

Recursive check with caching

function findSelf () {
  var scripts = document.getElementsByTagName('script');
  for ( var i = scripts.length - 1; i >= 0; i-- ) {
    if ( scripts[i].src.match(/^[^\?]+\?/) &&
         scripts[i].innerHTML !== '//seen' ) {
      scripts[i].innerHTML = '//seen';
      return scripts[i];
    }
    scripts[i].innerHTML = '//seen';
  }
  // none has a query string, so default to most recently seen
  return scripts[ scripts.length - 1 ];
}

In this case, we mark those that have already been looked at as //seen so we won’t look at them again. We use the innerHTML because we know it’s a real attribute—making up your own or using those outside the standard can blow-up on you—and because scripts with srces ignore their innerHTML to start with. It doesn’t change anything if we mess with it or reset it.

The problem with the technique is the one that so often derails obvious or elegant JavaScript solutions to problems: browser compliance. About 50% of major browsers support setting the scripts[i].innerHTML. That’s obviously not good enough. The algorithm is the right one, though, and we can use the same approach with a different implementation.

findSelf(), take 4; caching + site checking

var _SeenScriptCache;
if ( ! _SeenScriptCache ) _SeenScriptCache = new Array();

function findSelf () {
  var scripts = document.getElementsByTagName('script');
  for ( var i = scripts.length - 1; i >= 0; i-- ) {
    if ( ! scripts[i].src.match.('^http://elektrum.org/') ) continue;
    if ( scripts[i].src.match(/^[^\?]+\?/) &&
         ! _SeenScriptCache[i] ) {
      _SeenScriptCache[i] = 1;
      return scripts[i];
    }
  _SeenScriptCache[i] = 1;
  }
  // none has a query string, so default to most recently seen
  return scripts[ scripts.length - 1 ];
}

We use a global cache in the array _SeenScriptCache to keep track of what scripts have already been checked. The reason this isn’t as good as the previous approach is it’s back in a global. We use the leading underscore in an attempt to protect the variable’s privacy. It’s much more elegant and robust to keep the cache in the objects themselves where the privacy of the scheme would be assured. Alas.

We also added a sanity check.

if ( ! scripts[i].src.match.('^http://elektrum.org') ) continue;

Without that, we might intercept incorrect arguments from services provided by other sites if they have a query string and our call doesn’t. Remember, the query string is never intrinsically tied to the script like it is with a regular HTTP request. We’re finding it in the DOM and if we’re not careful we’ll examine the wrong <script>.

Under Handling client PEBKAC in Pangrams in action, we might want to cross #3 of the list for good. Without arguments we should fail silently or with some kind of error feedback.

The src check has a hidden benefit as well. You don’t want bandwidth leeches breaking your service interface and skipping directly to your libraries. The libraries have to be open to any referrer to be able to work, just like the service. With the src check, the argument handling will fail for anyone else trying to use the library in scripts outside your own server.

This won’t work for every kind of library, but it’s a nice bit of gravy for this one. To protect others you would probably have to catch the misuse in your weblogs—eg, libraries called by a page without the services they’re for—and ban IP.

The sweet spot

This function is really meant to be a method of the Args() object we developed. Once we’re back in a class, we have the ability to make variables private in a much better way. We’ll make the cache a class variable, available to all objects Args()s we create.

Args._findCaller(), a private class method

// a cache for use inside _findCaller()
if ( ! Args._SeenScriptCache ) Args._SeenScriptCache = new Array();
// -----------------------------------
Args.prototype._findCaller = function () {
  var scripts = document.getElementsByTagName('script');
  for ( var i = scripts.length - 1; i >= 0; i-- ) {
    var src = scripts[i].src;

    var rx = new RegExp(/^http:\/\/elektrum.org/i);
    if ( ! src.match(rx) ) continue; // ignore other sites' scripts

    if ( src.match(/^[^\?]+\?/) && ! Args._SeenScriptCache[i] ) {
      Args._SeenScriptCache[i] = 1;
      return scripts[i];
    }
    else
    {
      Args._SeenScriptCache[i] = 1; // mark it seen anyway
    }
  }
  // none has a query string, so default to most recently seen
  return scripts[ scripts.length - 1 ];
}

See two full versions of the Args() class in the Appendix: Code for the Args() classes.

There is one apparent failure which is actually okay. If there no arguments in the calling script, _findCaller() will seem to find the wrong script; namely realFakeService.js Since there are no arguments in this script (again, the service writer’s responsibility), its irrelevant. The script was meant to get no arguments and though it looks in the wrong place ultimately, it still finds the correct number: none.

There is one assumption left for this to be reliable and it’s the service provider’s responsibility again. All services must use this same handler—or technique—or risk bleeding over into each other. They all need to go through our Args() class. The whole point of abstracting this class out into its own file was to make sure it was the only code used, so we are in the clear at last.

Is that clear, class?

Here’s where we stand. We have scripts writing scripts to import scripts that will be used by the ultimately written script and dig backwards through the DOM’s scripts to find the arguments in the original calling script’s src. If you’re not confused, you’re Don Knuth.

Don’t panic. This is really an exceptionally valuable technique. It’s worth picking up and spreading around. So, let’s take the time to walk through a new and code-complete example of how it can be done: Using JS code libraries, part 2.

« Query catching for fun and profit · Using JS code libraries, part 2 »
Google
 
Web Developing Featherweight Web Services with JavaScript
This is version 0.57b of this manual. It is a beta version with some gaps. We are grateful for feedback.

The code is the manual has not yet been fully tested against Internet Explorer. Bug reports are welcome.
An Elektrum Press Online