Category Archives: Bioinformatics

Rosalind MMCH – Maximum Matchings and RNA Secondary Structures Solution

For some time now I’ve been working on the little problems on Rosalind. Adn I’ve been stuck on one quite easy one, for ages.

And I’ve just solved it.  And the problem wasn’t to do with my code at all, it was to do with the differing versions of Python.

To solve this problem you use the inbuilt factorial function and the results from that differ under Python 2.7 and 3.

I am using 3 and obviously the people who are running Rosalind, are using 2.7.

So for instance, with this dataset:

>Rosalind_0258
ACCCUCAGCAAUUAUACAUUCGUGAACCUCCAAAUCUGAUGCGUGGCCGUUGGUACACUC
UCGCAAACGCUGUAGGACUCUCCAA

Python 2.7 gives  the answer: 124917299331128613009585280018022400000000 which is accepted.

And Python 3 gives the answer:124917299331128612574021702489190831226880 which looks to be more accurate to me, but is not accepted.

So I’m just posting this here in  case others who are stuck on this one can find the answer here.

Week 10 of Bioinformatics and Course Finished!

Man, I’m exhausted. It’s 10:15pm on a Monday night and I’ve just finished my Bioinformatics Algorithms 1 offered by UC San Diego!  Here is a link to the course FYI.

Wow, 10 weeks, and around 70 programming assignments later, I reckon I’ve learned heaps.  Heaps about programming in Python, heaps about biology, heaps about algorithms and gained some confidence in myself!

Years ago I used to program at Air NZ, The Optima Corporation, Compaq Sorting and a few other places.  I used all the scientific/engineering tricks I learned at university and picked up many others on those jobs.

And then I got into property in 1996 and have only done bits and pieces of programming since then.  So it was really good to pick this stuff up again and give it a whirl, as they say.  It’s a different arena, it’s Python rather than C++, VB, Fortran, and it’s in the biology sector as opposed to things like airlines.

But there are a lot of crossover areas/skills etc.  So after a very shaky start and many “I want to throw the monitor through the window” moments, I managed to pull through.  Actually week 10 was easier than week 9 because the material was presented more fully by the tutors.  So it was easier to sort it out.

Going to bed now!

9 Weeks of Bioinformatics Down

Sheesh, this week was a big week!

We covered suffix trees and suffix arrays.  These are tree structures that are useful when looking for patterns in collections of strings.

I found the material quite challenging.  Not many details were given by the instructors for this week’s material which meant a lot of it had to be done via finding stuff on the net.

I actually got the code challenges finished 3 days early (usually I’m pushing it right to the wire, as is the way with these things) so I’m going to start on the next week’s assignments straight away.

This will be the last week, week 10.  And it’s on something called the Burrow’s Wheeler Transform…

For anyone wanting some sort of idea of what bioinformatics is, see the video on this page: https://www.coursera.org/course/bioinfomethods1.  It gives quite a good description.

1 week to go, wahoo!

7 Weeks of Bioinformatics Down

Well  bioinformatics continues to be interesting and challenging.

The last couple of week have been on sequence alignment.  Biologists want to align sequences of text (made up from the alphabet of ACGT or the protein alphabet which is similar to the normal one we use minus a few characters) for many reasons.

They can discover all sorts of things by aligning sequences like how various functions work, how bits of DNA replicate and so on.

Some bits of sequences might look like this:

 

YAFDLGYTCMFPVLLGGGELHIVQKETYTAPDEIAHYIKEHGITYIKLTPSLFHTIVNTAASFAFDANSL
-AFDVSAGDFARALLTGGQLIVCPNEVKMDPASLYAIIKKYDITIFEATPALVIPLMEYIYEQKLDISQL
IAFDASSWEIYAPLLNGGTVVCIDYYTTIDIKALEAVFKQHHIRGAMLHPALLKQCLVSAPTM---ISSL

 

The values in red have been lined up.  And this can be quite tricky to do but very useful as I said above.

So we’ve been learning various alignment techniques, which rely pretty heavily on graph theory and dynamic programming.

I’ve never really understood dynamic programming before so it was good to do quite a lot with it and get a decent feel for it.  It’s a really powerful technique.

Oh well, onto week 8, only 3 or so more weeks left I think, the course finishes on 27th Jan.

 

Bioinformatics 5 Weeks Down

This was a pretty tough week.  Interesting but tricky algorithms to implement.

I had another case of an algorithm that was running really slowly.  In fact I started it before I went to bed and it was still running when I got up in the morning.

So I rewrote it, and it ran in like 20 seconds!  Just goes to show, choice of data structures and algorithms are soooo important.

Now onto week 6!

Bioinformatics, 4 Weeks Down

So 4 weeks and 26 programming assignments completed now.

Gotta say I’m enjoying it immensely.

It’s quite hard but I seem to eventually muddle my way through the assignments.  Sometimes with a bit of help from others posting on the discussion forums.

Had a fun time optimising one of my programs a couple of days ago.  This is what I posted on the Coursera forum for this course:

By the way, what are people’s runtimes? I spent some time optimising yesterday and got my run time down from 2 mins to under 1 second lol.  I was using a list to store nodes and traversing that list to find the node in the list of 2000 or so with label ‘234’ or whatever. 

So then it occurred to me that I could have an indexing array.  With the node label pointing to an index in the node list.  And then I slapped my forehead, said “Doh!”, not for the first time in this course.  This is of course exactly what a dict is, so instead of having a node list, I have a node dict! 

And another thing: Profiling:  It’s absolutley wonderful in Python.  Took me about 15 mins to grab some code off the net and add it to my program.  It shows where all the time is spent, number of calls to each function AND runs in real time! i.e. doesn’t slow down the program!  All built in too: http://docs.python.org/2/library/profile.html

So onto week 5 now and I’ve done 2 out of the 5 so far but only a few days left for the other 3…  I’ll get them done somehow.

This course is considerably better than many I took at university, paying fees for.  This course is free!