Bioinformatic Course – day 2

Today the frontal lectures were not more exciting then yesterday. In the first part of the morning we talked about Alignments. And with talk, I mean: we scrolled a long list of programs that you can use to do alignments, and some details for what’s best.

Here they come:

  • Classic solutions:
    • Smith-Waterman (local alignment)
    • Needleman-Wunch (global alignment)
  • Heuristics solutions:
    • BLAST
    • FastA

What I got best is how BLAST works. Again, all these algorithms always want the same things: an input, a database to look at and a cut-off value. BLAST chops into peaces (called words) both your query and the database, and make a table of some kind out of it. Then it will seek matches between all these little fragments that exceed a certain score. This will result in collection of high-scoring pairs (HSP). From each of these HSP, it will start walking both direction to see how far there is a sequence match, and stop when the total score of this system gets below 0.

How does BLAST work

The results are a list of hits classified according to their bits score and the E-value (expectation value). Btw, I finally got what the hell an E-value mean.

The E-value indicates the probability of that specific alignment to occurs if the sequence we are interested in was randomly taken from the reference database. The smaller, the better. OBS: the E-value don’t depend much from the length of the query, rather it changes from database to database!

The second part of the lecture was about the “fine art of Phylogeny“, as the teacher put it. Which I kinda liked.

I relazed form the very beginning of the talk, as the teacher made me noticed that phylogeny is all about computation. So, let’s the computer scientist talk.

Here again all you need an input (which are alignments or even descriptions, like fossils evidence) and a computational method. The results will be a different binary tree according to the method chose.

Binary trees are those trees where from each node branches out 2 ramifications only. Each branch has a length. That length, basically, represent the amount of evolution that has happened between two nodes.

Apparently, the definition “amount of evolution” is the best to picture the complexity of phylogeny: that length, indeed, results as a combo of the evolution rate (that depends on the generation frequency, for example) and time.

evolution on phylogeny trees = evolution rate & time

<biological-meaning spoiler alert>

We can actually draw biologically meaningful conclusion from well designed trees! If all the organisms (or genes or proteins) we are looking at are contemporary (such as, they are alive today) the reason why some branches are longer then othger may be because these organisms have longer generation time, so it takes longer for them to branch out and creates new species. Let’s say: they are more “stuck in time”. Shorter branches are often found in quickly proliferative organisms (Arabidopsis in plants or bacteria) because the chance of mutation and characteristic fixation in these organisms are higher. Cool!

</biological-meaning spoiler alert>

There are two best ways to draw a phylogeny tree:

  • Maximum likelyhood method (ML)
  • Markov Chain Monte Carlo method (MCMC)

Though I appreciate that the teacher tried to explain the concept behind these two mathematical methods, I’m afraid I got lost.

What I retained is the following: the MCMC method is a good one because it creates trees in an “iterative fashion”: it starts creating two trees and ask: which of these two is best? Which of these two is more probable? Then it discard one and continue.

Apparently, every (good) tree generated with ML or MCMC method has number at each node that indicates the probability that that specific branch is placed that way.

MCMC-generated tree

To conclude, the laboratory session of today was very good! I wish I’ll have time to write about that soon. We worked with different sequence alignments methods, ChIP-seq data used to identify motifs (using both MEME Suite and CisFinder). We used BLAST from MCBI to have an idea of what our mysterious sequence may actually be, and at last the amazing HMMER which I am in love with now.

Here’s what happen on day 1

Posted in Science. Bookmark the permalink. RSS feed for this post. Leave a trackback.

One Response to Bioinformatic Course – day 2

  1. Pingback: Bioinformatic course – day 3 | SciencePlug

Leave a Reply

Swedish Greys - a WordPress theme from Nordic Themepark.

SciencePlug is Stephen Fry proof thanks to caching by WP Super Cache

%d bloggers like this: