Posted by mady | Posted in Role of software Engineers and Technology in Biotechnology | Posted on 1:30 AM
critical for studying biology as an informational science. Curiously,
biology is the only science that at it's very heart, employs a digital
language. The grand challenge in biology is to determine how the
digital language of the chromosomes is converted into 3-D and 4-D
(time varying) languages of living organisms.
Need for software automation
DNA encodes the information necessary for building and maintaining
life. DNA is a non-branching, double-stranded macromolecule in which
the nucleotide building blocks (A,C,G,T) are linked. Bases are
arranged in A-T and C-G pairs. Small viral genomes of the order of
several thousand bases were the first to be sequenced in 1970. Few
years later, genomes of the order of 40 kilo base pairs represented
the limit of what could reasonably be sequenced. At this stage, the
need for automation was recognized and methods were applied to the
degree possible. By the year 1997, the yeast genome consisting of 12
Mega base pairs was completed, and in 1998, the conclusion of the 100
Mega base pairs nematode genome project was announced. Most recently,
the 180 Mega base pairs fruit-fly genome was also completed. All of
these projects relied on substantially higher levels of software
automation. We are now in the midst of the most ambitious project so
far: sequencing of the 3 Giga base pairs Human Genome. For this
effort, and those yet to come, software automation lies at the very
core of the planning and executing of the project.
The need for automation is driven largely by the trend of handling
ever larger sizes of DNA and the corresponding increase in the amount
of raw data this entails. Mathematical analysis indicates that the
size of a project is roughly proportional to the size of the genome.
This is due to the fact that the amount of information obtained for an
individual sequencing experiment is relatively constant and is
independent of the genome size. It is estimated that for the human
genome, as order of 108 individual experiments are required to cover
the genome. To meet the projected goals, modern large scale sequencing
centers have developed throughput capacities of the order of several
million experiments per month, with data processing handled on a
continuous basis. Managing such large projects without a high degree
of automation would clearly be impossible in terms of cost and time
requirements.
So, DNA is the basic genetic material. It transmits hereditary
characters from one generation to the next. During synthesis of
proteins, mRNA which act as the messengers of information (the exact
genetic code) are build from DNA. Proteins are synthesized using mRNA
molecules. Protein interactions give rise to information pathways and
networks which help in building cells which are identical to their
parent cells. Clustering of many cells in a predefined format composes
a tissue. An organ is a combination of tissues and an organism is
nothing but an organization of organs. Refer figure 3.1.
The challenge for computer professionals is to create tools that can
capture and integrate these different levels of biological
information.
Genetic Algorithms
All that computers can do is implement algorithms. Hence when we talk
of using computers for processing of biological information, we have
to define precise mathematical algorithms. Following are a few
absolutely basic algorithms in Bioinformatics.
1.Database Searching
Database interrogation can take the form of text queries (e.g.
Display all the human adrenergic receptors) or sequence similarity
searches (e.g. Given the sequence of a human adrenergic receptor,
display all the similar sequences in the database). Sequence
similarity searches are straightforward because the data in the
databases is mostly in the form of sequences.
2Comparing Two Sequences
Let us take the case of comparing two protein sequences. The alphabet
complexity is 20, since a protein is nothing but a sequence of amino
acids and there are 20 possible amino acids. The naïve approach is to
line up the sequences against each other and insert additional
characters to bring the two strings into vertical alignment. More the
matches, more is the closeness in the two sequences.
The process of alignment can be measured in terms of the number of
gaps introduced and the number of mismatches remaining in the
alignment. A metric relating such parameters represents the distance
between two sequences.
3.Multiple Sequence Alignment
In the previous sub section, we saw pairwise sequence alignment,
which is fundamental to sequence analysis. However, analysis of groups
of sequences that form gene families requires the ability to make
connections between more than two members of the group, in order to
reveal subtle conserved family characteristics. The goal of multiple
sequence alignment is to generate a concise information-rich summary
of sequence data in order to inform decision-making on the relatedness
of sequences to a gene family.
Multiple sequence alignment is a 2D table, in which the rows
represent individual sequences and the columns the residue positions.
The sequences are laid onto this grid in such a manner that (a) the
relative positioning of residues within any one sequence is preserved,
and (b) similar residues in all the sequences are brought into
vertical register.
I really appreciate this wonderful post that you have provided for us. I assure this would be beneficial for most of the people. Claas 2 Digital Signature Certificate
Thanks for this useful article. we are the best digital signature provider in Delhi.
Digital Signature mart
Thanks a lot very much for the high quality and results-oriented help.
digital signature certificate provider in Delhi
Wow!!! It was a great blog with so much information .
Make My Digital Signature
I entered this site by chance, but I found very interesting.
Digital signature certificate in delhi