Thursday, May 8, 2008

Binary Search -- Mmmm, tasty!

I stumbled across Tim Bray's article on binary search, and simply felt compelled to pass it along.

It's well-written, concise, and even if as a software developer you've got binary search down pat, it's still a pleasure to read a well-written explanatory article about something so fundamental to our craft.

So if you need a little refresher on the details of binary search, and why it's so gosh darn useful, or you need to roll your own because you need to make some small variations to the basic algorithm (which was my situation and why I went looking for some articles on this subject), give it a read.

Good technical writing stands the test of time.


Tuesday, May 6, 2008

Send Your Loved Ones Into Space -- And You Too!

A couple of upcoming missions have announced that the public can submit their names, which will then be attached to the spacecraft that is launched into space. I've done this with a couple previous space missions, and it's a cute thing, you can even get a certificate :-)

The two missions that I just became aware of that are doing this are:

Lunar Reconnaissance Orbiter - whose purpose is to "to [find] safe landing sites, locate potential resources, characterize the radiation environment, and demonstrate new technology." Send your name (and those of your family and friends) to the moon!











Kepler Mission - "specifically designed to survey our region of the Milky Way galaxy to detect and characterize hundreds of Earth-size and smaller planets in or near the habitable zone." Submit your name (and a brief message if you want) to be carried along on the spacecraft that may be the first to detect a possibly habitable extra-solar planet.


Tuesday, April 22, 2008

Why the Chart Wasn't Opening

Here's a not atypical experience when porting a humongous (1.3ish MSLOC) legacy application from one platform to another; in this case from SG/IRIX to Linux.

During a simulation run the operator can click on a button that opens up a chart displaying some statistics depicted as a line graph. In the working version of the port that chart wasn't opening up.

Why?

Because the chart module never received a "Create Line Graph message".

Why?

Because that message is sent only when the simulation's "current time counter" is not 0.0, and it was not getting incremented.

Why?

Because no "Timestamp" message had been received.

Why?

(At this point I embarked on a Wild Goose Chase -- Marc)

Because the sending process never sent one.

Why?

Because when reading a file it turned out the file was unexpectedly empty and so froze up, having thrown an "End_Error" exception.

What did the original version do?

Err..I guess the file is empty there too, but it throws an "Out_Of_Data" exception.

Why?

Ah...race condition.

So what's the net effect of the difference between how the two exceptions are handled?

Huh. None. They're both taken to mean "no data", which is a not unexpected condition, and so the exceptions are resolved and processing continues.

-- End of Wild Goose Chase. Backtracking to where I left off:

Why?

Because no "Timestamp" message had been received.

Why?

Because the sending module is experiencing a SEGFAULT.

Why?

Because some data being extracted from a database is getting stomped on with bad, bad values, triggering the segfault.

Why?

Compiler bug.

Really??

Looks that way. Though 99% of the time that I start to think "compiler bug" it turns out to be a programming error, this is one time it looks legit. A procedure is getting called that does some calling of additional subprograms to retrieve the data from the DB. Down at the bottom an exception is thrown that propagates back up to the calling routine. This is a "no data found" exception that is perfectly legitimate to have occur and propagate back up. When control returns to the controlling procedure, though, some of the local variables have gotten clobbered--even those that are not part of the calling sequence. Everything is fine until that exception is handed up to the calling procedure. So the work-around for this was to catch the exception within that first called procedure, and change the function parameter list to include a "found" flag, which is set according to whether the exception occurred or not. The caller than checks the flag and handles the response as if the exception had occurred.

And then?

The chart still doesn't open.

Why?

In a color setting function, the name of the color is passed in and checked against a table that maps each color name to some internal data. That function lower-cases the color name parameter, since all the names in the table are lower case. The function, though, is modifying (via tolower()) the color name within the parameter itself, rather than to a local variable. For some reason trying to overwrite the parameter in place is causing another segfault. This is a less-than-desireable thing to be doing anyway, i.e. modifying the passed-in argument that should only be used as a lookup value, so the function was modified to lower-case the value into a local variable, which was then used for the table lookup.

Now?

The chart kinda opens, and then freezes.

Why?

Segfault.

Again??

Yes, down in the Lesstif code a null dereference is occurring.

Why?

Beats the hell out of me on this one. I built Lesstif from source, with debug, so I can find the line of code that's causing the problem but I really don't know what exact sequence of events is leading to this problem (I'm an applications, not a systems, programmer!) It does seem to be happening with the ScrolledList widget, when doing something pertaining to fonts.

What now?

Try exploratory code removal. Comment out the line that sets the font list and see if maybe some sort of default gets used.

And...?

Chart correctly opened and displayed, although the text is not italicized like it is on the original platform.

I can live with that.


Wednesday, March 26, 2008

Decades of Spam

I have some contact-the-author email accounts that are provided in README files that accompany some open source software I've released, and which I knew would sooner or later get scraped up by spambots. They were, and now I get anywhere from 20-100 spams a day on them. The Gmail filter is pretty good so I rarely see more than one every couple weeks that gets through. Still, because every now and then a legitimate email comes in and sometimes gets marked as spam I have to take a quick look at the spam jail pretty much daily and make sure that nothing got caught that should have gotten through.

And I wonder, how long is this going to go on? Is this something I and every other email user that needs to provide a publicly accessible email address going to have to deal with...for decades?? Geeze that's depressing.

About the only thing more depressing in this matter is the poor slob whose job it is to write those ridiculously lame come-on one-liners for "enlargement" products. That's a soul-killing job if there ever was one.


Monday, March 24, 2008

The Toilevator

One thing I've noticed over the last decade or so is the increasing likelihood that some mundane item that one thinks ought to exist, more often than not does exist. At least in the areas of household items, tools, and home improvement type stuff.

That said, I give you...the Toilevator.

























No, it's not for me. But thanks for asking.


Monday, March 17, 2008

Let's see one of you telekinesisists do THIS

Levitating and wirelessly powering a lightbulb...with SCIENCE!



































And no small amount of imagination.

I love this stuff.

Check out the movie too.


Thursday, March 13, 2008

Otters Should Not Be Allowed to Design Software

I got nothing against otters.

They're cute, playful, inquisitive, and so on. We should take joy in their simple life, and protect their habitat.

Neither they nor their human analogues, however, should be allowed to design software.

When you're porting a large software system from one platform to another you spend a lot of time dealing with the design and implementation..uh..quirks of the original builders. Sometimes it's just stylistic stuff like typedef'ing (C) or renaming (Ada) every single freakin' standard type name. Other times, though, you'd swear that otters had been tasked with coming up with the design.

So the part I'm porting now has a central control process and a GUI process that communicates via sockets. All well and fine.

The latest issue I've been dealing with has to do with the operator clicking on a field in a table to change it, which pops up a menu of valid entries, one of which is selected and then the "Done" button is clicked. Somewhere, though, in the update processing chain the value was getting trashed, causing the update to fail.

In other words, run-of-the-mill porting issues.

So I traced through the code and verified that the operator's selection was getting properly packaged up into a message and sent out through the socket. I followed it right up to the socket write, so there was no doubt.

I then went to the recipient of the message, the central control process, and verified that the message was being properly received and decoded.

The next thing that's done after receiving the message is calling a function called Update_Table(). The invalid data error is being detected inside this function. However, the data from the message I just read in is not being passed into Update_Table. WTF? Why is it failing then?

So I dig down into Update_Table and see that what it's doing is going out to query for the data in the table row that's about to be updated. But it's not getting this data from a database. Nor is it accessing some internal data structure model. It's sending out a message.

To the GUI process.

And so now I go back to the GUI process and start tracing from where that message is received. So is the GUI process maintaining some data store itself that it maintains and both displays to the operator and keeps for queries from the controlling process? Why, no, no it doesn't. Instead it goes and gets the data out of the graphical widget that's displaying it. If the value is a number, it converts the displayed string representation of the number back to a number, otherwise it passes it back as a string.

So not only does the central controlling process not have control of the data, it's outsourced that responsibility to the GUI, and that in turn is using the display widget as its data store.

Whoever came up with this bright idea is a complete, raving, ... otter.

Imagine, in a distributed system that does in fact utilize a database for data storage you can lose all your active data if the GUI crashes.

(Oh, the problem turned out to be a data alignment mismatch due to the change in word sizes between the different platforms.)


Tuesday, March 11, 2008

Solve the First Problem, and Don't Keep Going After It Breaks!

I cannot emphasize enough the truth and importance of Tilton's Law: Solve the First Problem:

Normally Tilton's Law refers to two or three observed issues that seem to be in the same ballpark. The law says pick out the one that seems most firstish and work on that and only that until it is solved. There is a terrific chance the other problems will just go away, and even if not the last thing we need to do while working on one problem is be looking over our shoulders at possible collateral damage from some other problem.
What I've experienced and expressed over my career is that once a bug has acted, you can no longer trust anything the software is doing. You've busted the state of the program, and everything that happens from that point on is suspect.

It drives me nuts when a tester has experienced a problem in the system being tested, logs it, and then keeps on going, writing error reports for every bit of collateral damage that is now cropping up. All of which have to be analyzed, dispositioned, and closed, and other than the first, none of which should have been written at all.

It's a waste of time, effort, and money for developers trying to uncover and fix bugs that are the after-effects of the "first" bug, or testers logging reports of the after-effects of a bug.

Report the first bug, fix the first bug, and move on.


Primordial Program Porting Perils

Porting old code can be a pain.

I'm currently working on porting a large (well over a million SLOC of Ada and C) wargaming simulation system from a Silicon Graphics/IRIX platform to PC/Linux.

98.3% of the porting effort has gone pretty smoothly, but there have occasionally been some real showstoppers.

The latest issues I've been working with on the port have to do with the Xbae widget set. The developers of the original SG version of this app had grabbed a version of Xbae source code and frozen it to be evermore part of the code base.

Well, there were serious problems with that version of Xbae versus the Motif distribution that was installed on the designated Linux platform. Those have pretty much been taken care of, but there's been one thing left: When a particular text edit box is updated, it's supposed to automatically update the corresponding cell in an Xbae-provided matrix table, and that wasn't happening.

Since I'd grabbed all the source code for this stuff I was able to walk through what was going on in the debugger, and I discovered that when calling XmTextFieldSetString() to update the matrix cell, a check made in that function was rejecting the update. The check? Well, a quite reasonable one to make sure that the widget being updated was an XmText widget, so as to ensure that one was actually updating what one thought was being updated.

Okay, waitaminnit. It's checking that the widget is an XmText widget, but the code is expecting it to be an XmTextField widget. And isn't the latter what the XbaeMatrixWidget says is actually used?

Er...no. Or yes. Um, it depends on what you read.

The "What is it?" Xbae Matrix Widget page says: "While XbaeMatrix looks and acts like a grid of XmTextField widgets, it actually contains only one XmTextField."

But the Xbae Matrix Documentation page says: "While XbaeMatrix looks and acts like a grid of XmText widgets, it actually contains only one XmText".

So, the source code says it's an XmText widget. When did that happen?

Spelunking on Google we find:

* Swapped out the XmTextField widget to use the XmText
widget to enable multi line rows and the like.
When did this happen? Version 4.7, from mid-1999. What version of Xbae am I using? 4.60. And this portion of the app was written in early 1997, well prior to the Xbae's conversion from the XmTextField to the XmText widget.

So, I can't say the information wasn't out there, it was properly published in the release notes back then.

But because this is an old application, a "Mines of Moria" project, much of the code hasn't been looked at in years, and so there was never any incentive, hell, any reason, to keep it current with evolving utilities. I expect I'm going to run into this sort of thing again, but what I've got to get onto right now is locating and fixing any additional expectations of XmTextField widgets being used when interacting with the Xbae matrix.

This is gonna be so cool when it's done!


Friday, February 29, 2008

It's only a millionth of the vehicle's velocity...

The Planetary Society has an article up about the "Flyby Anomaly", wherein scientists at JPL have discovered that space exploration probes that use the earth as a gravitational slingshot are gaining a tiny amount of speed that exceeds the gain expected by the maneuver.

This amount is small, "only about one millionth the velocity of the spacecraft", but that's still detectable.

(That ain't much, right? How much of an effect could that possibly have?)

According to the article, the effect was first detected when the Jupiter probe Galileo got its first gravitational assist from the Earth back in 1990. It was subsequently detected on similar maneuvers by the NEAR and ROSETTA spacecraft, to varying degrees. (Two other spacecraft, Cassini and MESSENGER, also swung by Earth as part of their trajectories, but the anomaly was not detected, though for reasons that are understood.)

So I got to thinking--exactly what does "one millionth [of] the velocity of the spacecraft" actually add up to? And is it really that likely to have any kind of significant effect?

Caveat: I'm not an astronomer or an orbital mechanics guy, so I'm just going by the numbers I found in the article and web searches. I mean, you're only going to get "so" accurate when you're going by "about a millionth" :-)

According to a ROSETTA press release, it made its March 5, 2005 Earth flyby at a speed of around 38,000 kph. I'm going to use that velocity for these calculations, knowing full well that the purpose of the flyby was to increase the speed of the spacecraft, but that number will serve for what I'm trying to show regarding the scale of the anomalous speed gain.

So what is one millionth of 38,000 kph? Well, 0.038 kph, or 38 meters/hour. In non-metric terms this converts to 0.0236 miles per hour.

That may not seem like much, especially when compared with the vast distances a spacecraft, even an interplanetary one, must travel to reach its destination. On the other hand, because of those vast distances, flight times are usually quite lengthy, which gives such small values plenty of time to grow into significant ones.

So the small ROSETTA velocity discrepancy grew within one day to 0.5664 miles. Yikes, that means it's already half a mile further along in its trajectory than it was expected to be. Within a week it's nearly 4 miles further along.

Now the destination of the ROSETTA craft is the comet 67P/Churyumov-Gerasimenko, which is itself only 5 km by 3 km size (3 x 1.8 miles), so a week after the flyby ROSETTA would already be off in its expected position by more than a full span of its target's size. And since ROSETTA is on a 10 year trajectory, this discrepancy would keep adding up with each subsequent year, at 206+ miles per year, making the "flyby anomaly" quite a significant factor in mission planning and success.

Obviously the mission planners have taken this all into account now, and we can look forward to a successful ROSETTA flight and mission.

But this certainly does demonstrate that being off by "about a millionth" can have some real consequences, despite how insignificant that sounds.