You know you are in Japan when…
you show up to renew your Kyodai ID and a staff you’ve never met before immediately pulls it out of a stack of 300 identical cards, before you had a chance to give your name.
Yea: not a lot of whities in my faculty.
You know you are in Japan when…
you show up to renew your Kyodai ID and a staff you’ve never met before immediately pulls it out of a stack of 300 identical cards, before you had a chance to give your name.
Yea: not a lot of whities in my faculty.
Today, the entire 3-month research stipend covering the cost of my stay in Berlin (living expenses, plane ticket etc), was deposited on my bank account. All at once.
Am I a bad person for even wondering how many years I could live off that, were I to accidentally end up on some remote beach island instead of Berlin’s Max Planck Institute for Molekular Genetics?
And we are back on the slow crawl toward eventually explaining what I do, out here in the darker recesses of my lab tucked in the remote Kansai countryside.
Aside from breeding deadly mutant monkeys to serve in my army of evil minions when I kickstart the world-domination part of my plot, that is.
Before I go any further, let me remind the casual reader that: 1) it is most likely nice and sunny out there where you live and you would be considerably better off looking at squirrels running through the trees 2) if you have even the slightest inkling of formal mathematical/computer science training, you will be better served foregoing this edulcorated version in favour of one of the 10 million tutorials and entries on bioinformatics available throughout the internets (Wikipedia being a good place to start). The entry written henceforth is geared at some hypothetical grandparents who would care to know what the fuss with modern Science is all about (for instance mine, were they not already perfectly content in the sole knowledge that the good Lord has put all these tiny amino-acids together in the best possible way of all worlds and that modern genetics is the work of the Devil1I know: one is not supposed to capitalise the name of God’s evil nemesis, but I am going on the assumption that Satan is a vindictive bastard and one can never be too prudent in courting the good graces of major players of the afterworld.).
In last month’s episode, we laboriously learnt that Biology abounds with really, really, tough problems. Two major points were:
1. For all practical matters, NP-Complete problems are all in the same bag: finding a way to solve one efficiently would mean you can solve any other in roughly the same order of time.
2. Once you have proved that a problem is NP-Complete, trying to find an exact solution for a real-life set of data, is about as meaningful as trying to take down the Everest with a toothpick. There are however plenty of ways to find an approximate solution. Proving NP-Completeness is your cue to start looking for approximation algorithms; and thus the fun begins.
Today, instead of going straight onto the myriad fun ways in which mathematicians solve biology problems, and which one of those I am actually connected to, another digression and an illustration everyone has heard of: genome sequencing.
Full genome sequencing (mapping the entire DNA of a given organism) is one of the earliest application of modern bioinformatics techniques, a seminal example: it starts off as a rather straightforward bio-chemistry problem, soon runs into pesky matters of size, complexity and intractability, goes through a difficult phase of alcohol and substance abuse, but is ultimately saved by the power of Love and Mathematics.
Before I go into the gory details, allow me to dissipate a common misconception about DNA sequencing: it is nowhere as easy as you might have been led to believe by your TV (most people’s preferred source of Science™ facts). Hearing of “DNA tests”, “DNA crime database” and other everyday life DNA-related techniques might make it sound like sequencing is as easy as sending your saliva swab to the lab and waiting a couple days for the results. In reality, despite serious advances, actual full genome sequencing is still a multi-year, multi-million-dollar affair. When people talk about DNA in a forensics or medical context, they are usually looking at a single base nucleotide, located at a precise location on one gene, out of the entire genome. Even cases that require a larger sample of such observations (e.g. DNA matching, when it actually uses sequencing altogether) are still somewhere in the lower hundreds (if that). That’s a mere 100 bases to look at, against 100+ million for the first organism fully sequenced, 10 years ago (make that 3 billions for humans). Quite a difference in scale. And, of course, this is one of those problems where solving twice the size requires much more than twice the time (hopefully by now, this does not surprise you, otherwise you might want to go back and read episode 1 again).
OK, let’s start:
Two months and countless draft embryos after initially promising it, here is the first part of an unfathomably long rant describing my field of research. I honestly don’t expect anybody to subject themselves to that read, but at least now I have a place to send those who foolishly ask me about it at cocktail parties.
The short answer is that I do research in Bioinformatics, which is where Mathematics (along with Computer Science and a dozen other disciplines) meet with Biology and Genetics in a dark back-alley, and do all sorts of indescribable things to each other in the hope of: creating a better world, curing cancer, breeding the next race of eugenic übermenschen or making a few bucks for Big Pharma… whichever comes first.
But that sort of answer, while technically correct, does not really tell you why such an unnatural coupling of disciplines was warranted in the first place. Allow me to start at the beginning. Way at the beginning.
[open long semi-relevant digression that can be advantageously replaced by a thorough read on Complexity Theory, if you feel up for the more sciencey and truthy version of things]
The scientific problems of this world tend to fall in either of two categories: those you might eventually solve with a good computer and some time… and those you will never solve exactly, no matter how much crazy sci-fi supercomputing power you throw at them.
This “solvable” vs. “not solvable” demarcation might sound like a tautology, until you understand the full meaning of “never” in the above statement: these, are not problems that might be solved one day, when science progresses far enough or computers get ten, a hundred or a million times faster. These are problems whose solutions require calculation of a complexity that is proven to be beyond the reach of any conventional means of computation in any foreseeable future (“unconventional means” would begin with the discovery of heretofore unknown laws of Physics: in other words, unlikely in your lifetime. at best1Yea, yea, I know… “Quantum Computing”. Let’s wait and see when we get there, ok?).
By and large, the mathematical complexity of a problem, is the order of time (or computing power) it will take to solve it, relative to its size.
Without calculating the result of a certain task, it is often possible to predict whether producing this result could or could not be done in a reasonable amount of time (where “reasonable” usually means “in less than the age of the universe, assuming the use of every single computer on earth”, or somesuch).
There are countless examples of tasks falling in the first category, “easy” tasks that can be solved quickly, regardless of how big they are. For example, anybody past kindergarten age can presumably add two numbers of practically any size with a piece of paper and a pen. You just add each digit one by one (and, yes, carry the one) and adding two 100-digit numbers will take barely more time than adding two 3-digit numbers.
Now consider a different task: say you are a traveling salesman who needs to plan their next sales route. You have a map of the region, with the towns you must visit and all the distances between them, given in kilometers. How do you find the absolute shortest route that will take you to each city at least once without wasting gas or time?
More to the point: how difficult do you think finding that route will be?
Sure, it sounds easy enough: pick a starting point, follow every roads that go from that city to another one, then onto the next etc. Keep the shortest distance you’ve found. Can’t be that tough, right?
Let’s say there are five cities: you pick a city to start from, then check all remaining four, and from each four, go onto one of the remaining three etc. etc. In total, that’s 5x4x3x2x1 = 120 different paths to compare (that product can also be written using the factorial function: n! = n x (n-1) x … x 3 x 2 x 1. e.g. 5! = 5x4x3x2x1). Not so bad.
What if there are a few more cities… for instance, two times more: 10 cities. That’s 10! = 10x9x8x7x6x5x4x3x2x1 = 3,628,800 paths to look at. Huh, that might take a bit longer to do by hand. No worries: somebody will write a computer program that gives you the answer in a couple seconds.
Except that, you guessed it, each time I double the number of cities, the difficulty does way more than just double.
For 20 cities, the number of paths to look at is: 20! = 2,432,902,008,176,640,000.
For 70, cities, there are 70! (that’s factorial of 70: 70x69x68x…x3x2x1) possible paths to check one by one. That number has exactly 100 digits. This is (very) roughly the number of particles in the entire universe. Assuming you were to put every single computer in the world to work on this, you likely would not be done by the time the Sun explodes.
When my advisors (a combination of current and past ones) suggested that I get on the “2 weeks, 5 cities” tour, I was initially very excited.
As it turns out, however, they were not talking about an all-expenses paid tour of Asia and America’s best nightlife spots.
For mild entertainment and posterity value, a few frackload of random tidbits gleaned over the past 10 days and 25,000 miles (counting):
Boston is a nice city. Somewhat nicer than I imagined (was perhaps one of the only major US city I had never been in). At least in the middle of July, when the sun is warm and rain had apparently stopped pouring, just in time for my arrival there. But weather concerns apart, it feels like one of a rare breed of US cities, where you can live (fine) without a car. Which automatically puts it toward the top of my book. It also has lots of nice tree-lined avenues with cute little houses, and plenty of coffeeshops with semi-witty names and lovely US-style breakfasts (baaaacon…) that nearly make up for the filtered sock juice they call coffee…
Coincidentally, and with no bearing on the above statement of appreciation: Everybody in Boston is a 20-something upper-middle-class white person who only wears pastel polo shirts. Really: everybody. Even Asian people there are white. And they wear pastel polo shirts. On their way to one of the 259 Ivy League universities within walking distance of Fenway park.
I am told there are black people living in Boston too.
Tidbits of daily life in my very own secret underground volcano lair, otherwise known as Kyoto University’s Bioinformatics Center…
I just spent half a day learning how to handle radio-isotopic material, feed my hypothetical SPF testing mice and properly dispose of dichloromethane (hint: not by flushing it down the drain), all in Japanese.
… Which would all be very useful if, you know, I ever worked with anything else than my computer and a blackboard.
Sensei: Blah blah blah… international collaboration project… blah blah blah… grant submission accepted… blah blah blah… Five year budget.
Dave: Great. But, huh, how does it affect me?
Sensei: How would you feel about going to Berlin or Boston for three months? All expenses paid, of course.
Dave: Sure, what’s the schedule and which project would I be working on?
Sensei: Oh, it’s entirely up to you, just pick the faculty and project you’d like to work on and a time during this fiscal year you’d like to go. Where would you prefer: Berlin or Boston?
Dave: Dunno. Both are nice.
Does either include daily spa and massage, by any chance?Sensei: If you want, you can do one country this year and the other next year.
Dave: You don’t say.
Sensei: We’ll also send you to their workshop in Boston this Summer anyway.
Dave: Recession hasn’t hit our lab yet, has it…
The three people still reading this blog on a regular basis (two of which possibly paid by the Chinese government after some bizarre translation mix-up convinced them I was a dangerous political dissident to be monitored) might have noticed the lack of substantial news on this blog for quite a long time. OK: even less substantial content than usual.
I also realise that the lack of proper context as to my whereabouts made a lot of past blog entries somewhat puzzling. If this can make you feel any better, I am pretty sure that my own genitors have had only the faintest sense of my exact location, occupation or plans, ever since I was last sighted, putting a finishing touch to my grand World Domination Plot Master Thesis.
In fact, it took all that time for the plan set in motion nearly a year ago to finally reach its final stage (tonight).