And we are back on the slow crawl toward eventually explaining what I do, out here in the darker recesses of my lab tucked in the remote Kansai countryside.

Aside from breeding deadly mutant monkeys to serve in my army of evil minions when I kickstart the world-domination part of my plot, that is.

Before I go any further, let me remind the casual reader that: 1) it is most likely nice and sunny out there where you live and you would be considerably better off looking at squirrels running through the trees 2) if you have even the slightest inkling of formal mathematical/computer science training, you will be better served foregoing this edulcorated version in favour of one of the 10 million tutorials and entries on bioinformatics available throughout the internets (Wikipedia being a good place to start). The entry written henceforth is geared at some hypothetical grandparents who would care to know what the fuss with modern Science is all about (for instance mine, were they not already perfectly content in the sole knowledge that the good Lord has put all these tiny amino-acids together in the best possible way of all worlds and that modern genetics is the work of the Devil1).

In last month’s episode, we laboriously learnt that Biology abounds with really, really, tough problems. Two major points were:

1. For all practical matters, NP-Complete problems are all in the same bag: finding a way to solve one efficiently would mean you can solve any other in roughly the same order of time.

2. Once you have proved that a problem is NP-Complete, trying to find an exact solution for a real-life set of data, is about as meaningful as trying to take down the Everest with a toothpick. There are however plenty of ways to find an approximate solution. Proving NP-Completeness is your cue to start looking for approximation algorithms; and thus the fun begins.

Today, instead of going straight onto the myriad fun ways in which mathematicians solve biology problems, and which one of those I am actually connected to, another digression and an illustration everyone has heard of: genome sequencing.

Full genome sequencing (mapping the entire DNA of a given organism) is one of the earliest application of modern bioinformatics techniques, a seminal example: it starts off as a rather straightforward bio-chemistry problem, soon runs into pesky matters of size, complexity and intractability, goes through a difficult phase of alcohol and substance abuse, but is ultimately saved by the power of Love and Mathematics.

Before I go into the gory details, allow me to dissipate a common misconception about DNA sequencing: it is nowhere as easy as you might have been led to believe by your TV (most people’s preferred source of Science™ facts). Hearing of “DNA tests”, “DNA crime database” and other everyday life DNA-related techniques might make it sound like sequencing is as easy as sending your saliva swab to the lab and waiting a couple days for the results. In reality, despite serious advances, actual full genome sequencing is still a multi-year, multi-million-dollar affair. When people talk about DNA in a forensics or medical context, they are usually looking at a single base nucleotide, located at a precise location on one gene, out of the entire genome. Even cases that require a larger sample of such observations (e.g. DNA matching, when it actually uses sequencing altogether) are still somewhere in the lower hundreds (if that). That’s a mere 100 bases to look at, against 100+ million for the first organism fully sequenced, 10 years ago (make that 3 billions for humans). Quite a difference in scale. And, of course, this is one of those problems where solving twice the size requires much more than twice the time (hopefully by now, this does not surprise you, otherwise you might want to go back and read episode 1 again).

OK, let’s start:

  1. I know: one is not supposed to capitalise the name of God’s evil nemesis, but I am going on the assumption that Satan is a vindictive bastard and one can never be too prudent in courting the good graces of major players of the afterworld. []

Two months and countless draft embryos after initially promising it, here is the first part of an unfathomably long rant describing my field of research. I honestly don’t expect anybody to subject themselves to that read, but at least now I have a place to send those who foolishly ask me about it at cocktail parties.

The short answer is that I do research in Bioinformatics, which is where Mathematics (along with Computer Science and a dozen other disciplines) meet with Biology and Genetics in a dark back-alley, and do all sorts of indescribable things to each other in the hope of: creating a better world, curing cancer, breeding the next race of eugenic übermenschen or making a few bucks for Big Pharma… whichever comes first.

But that sort of answer, while technically correct, does not really tell you why such an unnatural coupling of disciplines was warranted in the first place. Allow me to start at the beginning. Way at the beginning.

[open long semi-relevant digression that can be advantageously replaced by a thorough read on Complexity Theory, if you feel up for the more sciencey and truthy version of things]

The scientific problems of this world tend to fall in either of two categories: those you might eventually solve with a good computer and some time… and those you will never solve exactly, no matter how much crazy sci-fi supercomputing power you throw at them.

This “solvable” vs. “not solvable” demarcation might sound like a tautology, until you understand the full meaning of “never” in the above statement: these, are not problems that might be solved one day, when science progresses far enough or computers get ten, a hundred or a million times faster. These are problems whose solutions require calculation of a complexity that is proven to be beyond the reach of any conventional means of computation in any foreseeable future (“unconventional means” would begin with the discovery of heretofore unknown laws of Physics: in other words, unlikely in your lifetime. at best1).

By and large, the mathematical complexity of a problem, is the order of time (or computing power) it will take to solve it, relative to its size.

Without calculating the result of a certain task, it is often possible to predict whether producing this result could or could not be done in a reasonable amount of time (where “reasonable” usually means “in less than the age of the universe, assuming the use of every single computer on earth”, or somesuch).

There are countless examples of tasks falling in the first category, “easy” tasks that can be solved quickly, regardless of how big they are. For example, anybody past kindergarten age can presumably add two numbers of practically any size with a piece of paper and a pen. You just add each digit one by one (and, yes, carry the one) and adding two 100-digit numbers will take barely more time than adding two 3-digit numbers.

Now consider a different task: say you are a traveling salesman who needs to plan their next sales route. You have a map of the region, with the towns you must visit and all the distances between them, given in kilometers. How do you find the absolute shortest route that will take you to each city at least once without wasting gas or time?

More to the point: how difficult do you think finding that route will be?

Sure, it sounds easy enough: pick a starting point, follow every roads that go from that city to another one, then onto the next etc. Keep the shortest distance you’ve found. Can’t be that tough, right?

Let’s say there are five cities: you pick a city to start from, then check all remaining four, and from each four, go onto one of the remaining three etc. etc. In total, that’s 5x4x3x2x1 = 120 different paths to compare (that product can also be written using the factorial function: n! = n x (n-1) x … x 3 x 2 x 1. e.g. 5! = 5x4x3x2x1). Not so bad.

What if there are a few more cities… for instance, two times more: 10 cities. That’s 10! = 10x9x8x7x6x5x4x3x2x1 = 3,628,800 paths to look at. Huh, that might take a bit longer to do by hand. No worries: somebody will write a computer program that gives you the answer in a couple seconds.

Except that, you guessed it, each time I double the number of cities, the difficulty does way more than just double.

For 20 cities, the number of paths to look at is: 20! = 2,432,902,008,176,640,000.

For 70, cities, there are 70! (that’s factorial of 70: 70x69x68x…x3x2x1) possible paths to check one by one. That number has exactly 100 digits. This is (very) roughly the number of particles in the entire universe. Assuming you were to put every single computer in the world to work on this, you likely would not be done by the time the Sun explodes.

  1. Yea, yea, I know… “Quantum Computing”. Let’s wait and see when we get there, ok? []

When my advisors (a combination of current and past ones) suggested that I get on the “2 weeks, 5 cities” tour, I was initially very excited.

As it turns out, however, they were not talking about an all-expenses paid tour of Asia and America’s best nightlife spots.

For mild entertainment and posterity value, a few frackload of random tidbits gleaned over the past 10 days and 25,000 miles (counting):

Boston is a nice city. Somewhat nicer than I imagined (was perhaps one of the only major US city I had never been in). At least in the middle of July, when the sun is warm and rain had apparently stopped pouring, just in time for my arrival there. But weather concerns apart, it feels like one of a rare breed of US cities, where you can live (fine) without a car. Which automatically puts it toward the top of my book. It also has lots of nice tree-lined avenues with cute little houses, and plenty of coffeeshops with semi-witty names and lovely US-style breakfasts (baaaacon…) that nearly make up for the filtered sock juice they call coffee…

Coincidentally, and with no bearing on the above statement of appreciation: Everybody in Boston is a 20-something upper-middle-class white person who only wears pastel polo shirts. Really: everybody. Even Asian people there are white. And they wear pastel polo shirts. On their way to one of the 259 Ivy League universities within walking distance of Fenway park.

I am told there are black people living in Boston too.

Tidbits of daily life in my very own secret underground volcano lair, otherwise known as Kyoto University’s Bioinformatics Center…

  • Our lab building is one of the more modern on campus (remind me one day to capture some of the cool derelict buildings that sit on the edge of the campus: I swear, on sunny days they look straight from some Dr Moreau’s Island movie). The architecture is somewhat reminiscent of California-style 60’s: think John Lautner’s Desert Modern with more wood or Lloyd Wright with more glass… At any rate, it’s considerably nicer (and in better shape) than my last place of work.
  • I spend my days at work barefoot (and I love it). Like most Japanese research labs, ours is a shoe-free zone: you are required to leave your outside-shoes in a locker by the door.
  • Another typical Japanese tradition: food omiyage. Mix this with the fact that, out of a dozen researchers, there will always be one or two freshly returning from a trip abroad and you have a kitchen corner that looks like an international food court, all year round.
  • We have, I kid you not, twelve different recycling bins in our lab. That’s not counting special items like batteries, pens, electronic parts etc., which all have their specific recycling deposit box in another building. Every friday afternoon, all the lab’s researchers have a giant round of Jan-ken-pon to pick who has to bring it all out to the campus’ recycling center.
  • My morning commute from Kyodai’s international residence to the lab is approximately 10 minutes by foot (on a leisurely stroll). I believe this is the closest I’ve ever lived from work to date. Also: since the residence is all the way to the top of a small hill, while Kyodai’s campus is at the very bottom, I could ride my bike the entire way and not have to hit the pedals once (and possibly end up smashed by the Kyoto-Nara express train while crossing the tracks, but beside that). The hike back is way less fun.
  • Beside the ever-impressive Entrance Ceremony and half a dozen redundant mandatory lectures on miscellaneous research-related topics, I am finally getting to the good part of huge layer-cake bureaucratic administrations: welcoming parties! Five of them, to be exact: one for each level of subdivision of each faculty or research institute I officially depend on. Free food (and booze): yay!

Sensei: Blah blah blah… international collaboration project… blah blah blah… grant submission accepted… blah blah blah… Five year budget.

Dave: Great. But, huh, how does it affect me?

Sensei: How would you feel about going to Berlin or Boston for three months? All expenses paid, of course.

Dave: Sure, what’s the schedule and which project would I be working on?

Sensei: Oh, it’s entirely up to you, just pick the faculty and project you’d like to work on and a time during this fiscal year you’d like to go. Where would you prefer: Berlin or Boston?

Dave: Dunno. Both are nice. Does either include daily spa and massage, by any chance?

Sensei: If you want, you can do one country this year and the other next year.

Dave: You don’t say.

Sensei: We’ll also send you to their workshop in Boston this Summer anyway.

Dave: Recession hasn’t hit our lab yet, has it…

a.k.a. The Long Overdue Life-Update

The three people still reading this blog on a regular basis (two of which possibly paid by the Chinese government after some bizarre translation mix-up convinced them I was a dangerous political dissident to be monitored) might have noticed the lack of substantial news on this blog for quite a long time. OK: even less substantial content than usual.

I also realise that the lack of proper context as to my whereabouts made a lot of past blog entries somewhat puzzling. If this can make you feel any better, I am pretty sure that my own genitors have had only the faintest sense of my exact location, occupation or plans, ever since I was last sighted, putting a finishing touch to my grand World Domination Plot Master Thesis.

In fact, it took all that time for the plan set in motion nearly a year ago to finally reach its final stage (tonight).