Monday, February 11, 2008

Concretizing Static Typing Metadata

Well, that's a pretentious title, doncha think?

Steve Yegge writes in Portrait of a Noob that static typing is effectively meta-data ("we also know that static types are just metadata"), like comments, and so isn't strictly required for the compilation and execution of software. He's right in a limited context. If static typing is being used for nothing more than ensuring "type matching", and really doesn't add anything beyond that, then it is effectively just a stronger form of commenting, with the compiler acting in the role of object compatibility inspector.

This does get at why the argument for "type safety" has never achieved much success as a compelling reason for using a strongly typed language. Merely making sure your objects are compatible is a good thing, but it does constrain flexibility, extensibility, and adds type management overhead (to the programmer).

If strong typing is going to be seriously valuable it has to do more than merely ensure type safety, it needs to actually add concrete information to the software.

Take a programming language like Ada, considered one of the paragons of strongly typed programming languages. It does all the type safety stuff, and Ada advocates are more than happy to promote that as one of its great virtues for creating and delivering reliable, safety-critical software. All true, but obviously type safety, accompanied by its supporting syntax and semantics, was not sufficiently compelling to drive any significant adoption outside the defense and aerospace industries (and in those fields, of course, much of the initial impetus was mandate-driven anyway).

What most of the Ada programming language advocates overlooked was the productivity gain possible by the language's specific implementation of strong typing. When its advocates talked about strong typing aiding productivity, it was nearly always in terms of error avoidance. Again, true, and a good thing, but hardly sexy. After all, how many programmers are going to willingly admit that they write buggy code and that maybe they should look into using a programming language that would help them avoid errors?

I went into some detail about this in The Fundamental Theory of Ada, describing how the specifics of Ada's "type model" allows the Ada programmer to implicitly embed scads of additional information with no effort beyond that of defining a type. The language specifies all the additional programmatic information directly accessible to the programmer pertaining to that type. In a sense, user-defined type definitions implicitly declare an associated class instance with information relevant to that type. Here's an excerpt from Ada:
type Speed_Range is range 0 .. 1000;

With nothing more than a reference to an object of that type:

Speed : Speed_Range;

One can know its minimum value (Speed_Range'First), maximum value (Speed_Range'Last), the minimum number of bits needed to represent all possible values of the type (Speed_Range'Size), the actual number of bits representing a variable of that type (Speed'Size, which is often larger than the type size since objects almost always occupy a whole number of bytes), the number of characters needed to represent the longest possible string representation of values of that type (Speed_Range'Width), etc. You can convert values to and from strings (Speed_Range'Image, Speed_Range'Value), do min/max comparisons (Speed_Range'Min(100, Speed), Speed_Range'Max(Current_Max, Speed)), and use the type as a loop controller ("for S in Speed_Range loop" and "while S in Speed_Range loop"), and more. And none of this information needs to be explicitly programmed by a developer, it is all implicitly provided by the mere definition of the type.
This is where strong typing is far more than disposable metadata, like comments. This "aggressive" approach to strong typing, whether in Ada or a similarly conceived programming language, "concretizes" the metadata into practical use to not merely aid error avoidance, but to actively increase programmer productivity.

14 comments:

Gwenhwyfaer said...

"After all, how many programmers are going to willingly admit that they write buggy code and that maybe they should look into using a programming language that would help them avoid errors?"

Most of them, actually, and certainly anyone who's any good. The question is what helps them discover more errors, more quickly? The people who prefer dynamic typing feel that they give up type-safe interfaces and highly optimised machine code to get something more immediately useful for tracking down bugs - flexibility, a short feedback loop, whatever. The people who prefer static typing feel that those advantages break down in distributed, modularised development - the size of environment where knowing where everyone else is becomes more significant a factor than knowing where one is alone, and that effective metadata is the only way to do that.

Which might be a valid point of view - after all, every large organisation falls prey to bureaucracy, so it must be useful for something...

(And as for students who like having the compiler catch their errors before the professor does, or those who hate having to type more than is absolutely necessary? Not exactly representative of the top end, and not yet smart enough to realise what they don't know; we should ignore them.)

It's not about sitting back and claiming "our code doesn't have bugs in". Both sides of the debate sling this accusation at the other, and both are wrong.

(My own prejudices are built on a universal disdain for large systems - I think Tony Hoare was exactly right - and for batch-mode development.)

Anonymous said...

The problem with static typing is that

1. You have to type in a lot of information to tell the compiler about all the types. A typical statically typed program has twice as many tokens as a dynamically typed program. All this extra information has to be consistent, which means that changing your program is labor-intensive and slow. This has a lot of adverse consequences - for example you are less likely to refactor your code.

2. The amount of useful information that is actually produced is very minimal. In fact there is less than 1 bit of entropy (information) per extra token in the program.

So, in summary - lots of work for not much benefit.

Some people claim that static typing allows automatic refactoring. What a load of nonsense - the first refactoring system was for smalltalk, which has dynamic typing.

Marc said...

@Anonymous:

"1. You have to type in a lot of information to tell the compiler about all the types."

Yes, for Ada I have to type:

type Speed_Range is range 1 .. 1000;

instead of just:

Speed : Speed_Range;

But I don't really consider entering one type definition as having to type "a lot" of information. Especially since that one type definition suffices for all objects of that type.

"2. The amount of useful information that is actually produced is very minimal."

Unless we're totally missing each other's point, I'm completely puzzled by this claim. Appendix K of the Ada Language Reference Manual lists well over a hundred attributes available to the Ada developer that can be used to extract the information about numerous aspects of each type definition (not all attributes are applicable to every "class" of types--some apply only to discrete values, others apply to floating points, still others apply to records and arrays, and some apply to almost everything). There is far more than "1 bit" of information added to the program for each type definition.

But like I said, that claim is so wrong I'm thinking I'm misunderstanding something...

Anonymous said...

That's perhaps an argument in favor of strong typing but not static typing. In dynamically typed languages like Python and Ruby there's all sorts of metadata about types that you can play with. In fact, those languages are actually strongly typed. Just not statically typed.

Michael Roger said...

anonymous's claim about statically typed programs is mostly false; it only applies to a few pre-modern implementations, like Java.

A corrected version reads:
"A statically typed program written in a *non-type-inferring compiler* has twice as many tokens as a dynamically typed program. All this extra information has to be consistent, which means that changing your program is labor-intensive and slow. "


A *type-inferring* language+compiler (Haskell, OCaml, Scala...), which nearly all modern statically typed languages are, requires each type to be documented at once, and more only wherever the programmer wants to put in a compile-time assertion as a test to protect from human error.

Static type declarations + type inference engine = concise, easy to run, *correct* unit tests that provide instructions on how to correct errors they find.

Anonymous said...

"The people who prefer dynamic typing feel that they give up type-safe interfaces"

Not true. I never felt this way. In fact, I am glad that a language does not put many constraints onto my brain. Like my favourite language ruby.

At any rate, as long as specific code does what it was supposed to do, and has no bugs, the one thing that matters is speed - be it execution speed, or writing this thing speed.

Personally I am happy to trade execution speed if some tasks takes me only a few minutes to solve, and I continually expand my available scripts. IT is like an ecosystem that grows and grows but it doesnt get ugly like perl.

Gwenhwyfaer said...

"Not true. I never felt this way."

anonymous, if you won't even put your name against your opinion, why should I treat it as statistically significant?

febuiles said...

Some comments:

1) Strong typing is not the same as static typing as someone said earlier.

2) Type inferring compilers are not the panacea. I haven't touched O'Caml or Scala, but I've noticed how Haskell and F# need a hand sometimes to determine the right types (e.g. when playing with Parsec/UU parsers). I find myself writing the types for most non-trivial functions (as most Haskell programmers do for the more-than-one-line functions.

3) Typing Speed : SpeedRange each time you want to create a Speed increases your keystrokes by a nice factor. When you're working in a language like Java with ReallyFreakingLongNamesInCamelCase the "factor" of noise vs. real working code's scary. Are you sure you're getting enough out of your types system? Are you sure you can't do that with duck typed languages?
Maybe your example wasn't clear enough for me, but can't I do that with class variables and one line in the constructor in most OO languages?

4) I had something smart to say...but I forgot :P

Federico Builes said...

I forgot to link this image in the last comment:
http://data.tumblr.com/PDVq61dc523i4o67lFgz56VG_400.gif (not spam, really :)

I can't remember where I found it (maybe Steve Yegge's blog or something), but I found it funny :)

Anonymous said...

The example of inteveral types doesn't really have anything to do with static typing.

Consider common-lisp which has equivalent interval types (integer 0 100), (real 0 (100)), etc.

This is not a static typing system. So, I think that your whole premise is essentially mistaken -- what you mean to talk about is the expressivity of type systems, and that's orthogonal to their static/dynamic nature.

James said...

The points you make about Ada's safety features are, I think, specific to Ada's constraint system, and not necessarily due solely to static typing: If you had

int Speed_Range

for instance, it's "statically typed", but you don't really have any more information or safety than you would if it was dynamically typed.

Of course, there's always the best of both worlds: Common Lisp, where you have dynamic typing by default, but the ability to declare types if you want.

Marc said...

@James:

>int Speed_Range

The equivalent in Ada would be:

Speed_Range : Integer;

And agreed, the only things you know per this declaration are those specific to its Integer type--but which still consists of all the relevant "type attributes" that Ada provides for such types.

But "Speed_Range : Integer" is a poor variable declaration, because you (the developer) have asserted nothing about the speeds allowed by the associated entity. (Does the speed really range over +/- 2 billion?? By constraining the variable with a suitably specific type, you've added information to your code, and Ada allows you to access that explicit information and much derivable information.

James said...

@Marc

That's exactly my point: The supposed usefulness of static writing here is really the usefulness of Ada's particular properties: That is, the benefits of safety are really dependent on the specific language and coding conventions than something as broad as "type system".

Marc said...

@James:

Which is exactly my point! :-)

With Ada strong and static typing go hand in hand:

"If strong typing is going to be seriously valuable it has to do more than merely ensure type safety, it needs to actually add concrete information to the software."

So I'm agreeing with you, it's not static or strong typing alone, but what you build atop it that adds value. In Ada's case there was a conscientiously designed "type model" that added value.

So dismissing strong/static typing as mere metadata in and of itself is accurate, but uninteresting.