A novel programming approach for DNA computing.
DNA computing was introduced by Adleman in 1994 who has explained how to use biological tools to solve NP-Complete problems [1, 8]. DNA computing or more generally, molecular computing having research and development in this area concerns theory, experiments and applications. DNA computing is fundamentally similar to parallel computing in that it takes advantage of the many different molecules of DNA to try many different possibilities at once. DNA consists of polymer chains usually referred to as DNA strands. A chain is composed of nucleotides, also referred to as bases. The four DNA nucleotides or bases are Adenine(A), Guanine(G), Cytosine(C) and Thymine(T). The double helix DNA consists of two strands binding with each other. The binding of two strands occurs by attraction of the bases with each other. Normally A binds with T and C binds with G called complementary base pairs, also referred to a Watson-Crick complementary. An example of a double stranded DNA is as follows
5 ' T A C A G A C T C A 3 ' 3 ' A T G T C T G A G T 5 '
In 1994  Adleman successfully solved the Direct Hamiltonian Path problem HPP (which is an NP-complete problem) that opens a new area, called DNA computation. The research in DNA computation developed a number of molecular operations as found in . Several models have been proposed for DNA operations like Lengthening of DNA strands, Shortening of DNA strands, Multiplication of DNA strands, Modification of nucleotides in DNA, Pasting of DNA strands and Separation of DNA strands . Here, we present algorithms that implements some of the above mentioned operations in Java using threads.
The disadvantage of DNA computing is that until now, molecular computation has been used with 'exact' and 'brute-force' algorithms. It is necessary for DNA computation to expand its algorithmic techniques, incorporate approximative and probabilistic algorithms and heuristics, so that the solution of large instances of NP complete problems will be possible. Without algorithms, DNA computing has linear time, solving NP-complete problems but with exponential space. The proposed work is motivated from the following:
Martyn Amos's thesis is the first Ph.D dissertation in the area of DNA computation
. In his thesis, Martyn says
"It could one day lead to the development of hybrid computer system, in which a silicon based PC generates the code for automated laboratory based operations, carried out in a 'miniature lab in a box' linked to the PC".
Reif describes in his PAM model  that,
"Although our results are of theoritical consequence only, due to the large amount of molecular paralellism required, we believe that our theoritical models and results may be a basis for more practical later work".
We propose to construct a unconventional system using conventional computer based on the DNA computing. Until now the main drawbacks of DNA computing are its error rate, time and cost consuming, due to its manual processing. All the above said problems can be solved by computer based algorithms. Efforts were made to implement biological reactions . This proposed work implements the basic DNA operations needed for any specialized research.
In DNA computing, it is very important to know that instability of DNA strands can cause undesirable reactions. When facing a problem of any kind, one of the most important thing to do is assuring that the data, the problem will use is stable. The input data for DNA computing must be encoded into DNA strand. Many conditions can cause loss of DNA bases or strand breakage, but due to Watson-crick complementarity's, parts of single DNA strands can bind together forming a double stranded DNA sequence. The operation is essential for the process of selection of the fittest genes, and we have to take into account that they are user generated. In our proposed algorithms, each gene has a different amount of A,T,C and G bases, that are used to identify each gene , which are all user generated.
The paper is organized in the following order. Section 2 discuss about the computing paradigms of DNA. The section 3 contains the implementation details of the DNA operations with its complexities being analyzed. Section 4 details on the platform details and its functions, Section 5 discusses the applications of the proposed work and finally followed by the conclusion and future work.
The research in DNA computation develops a number of theoretical models and gives DNA solutions to mathematical problems. DNA Computation have been studied both in theoretical analysis and lab based solutions. Two extremes are possible in describing the tool box of techniques for manipulating DNA; a dictionary - like listing of techniques with a short definitional description of each of them or a detailed description of each technique . We have chosen for a middle ground, that gives simplified descriptions. The advantages of DNA based computation methods are its parallel computing, light weight, low power and fast computation . The operations mentioned below are implemented and complexity is analyzed.
1. Separating DNA strands: The hydrogen bonding between the complementary bases is weaker than the phosphodiester bond between consecutive nucleotides within a strand. This allows us to separate the two strands of a DNA molecule without, breaking the strands. One way of doing this is to heat a DNA solution until the DNA melts, meaning that the two strands come apart, called denaturation. Melting temperature is from 85C to 90C, melting temperature of a DNA molecule is the temperature at which, half of the molecule separates.
2. Fusing DNA Strands: If the heated solution is cooled down, the separated strands fuse within the hydrogen bonds (this cooling down must be done slowly so that the corresponding complementary bases have enough time to find each other), called renaturation. Fusing two single stranded DNA molecules by complementary base pairing is also called annealing. Another term used for fusing is hybridization, although originally it was used for describing the complementary base pairing of single strands of different origin. The denaturation of a double stranded molecule can be also facilitated by exposing it to certain chemicals. A commonly used chemical for this purpose is formamide -the melting temperature in the presence of for-mamide is much lower .
3. Multiplying DNA: One of the central problem of genetic engineering is the amplification of available "small" amount of a specific fragment of DNA. The problem is especially acute if the small amount of the known fragment is lost
in a huge amount of other pieces. The Polymerase Chain Reaction (PCR), is incredibly sensitive and efficient; one can produce millions of copies of a desired DNA molecule within a short period, even begins with only one strand of the sequence. The application of PCR are really enormous; they include areas such as genetic engineering, forensic analysis, genome analysis, archeology, paleontology and clinical diagnosis. PCR has three phases namely Denaturation, Priming and Extension .
4. Cutting DNA: Endonucleases destroy internal phosphodiester bonds in the DNA molecule.
They can be quite specialized as to what they cut, where they cut and how they cut. Restriction endonuclease are much more specific, they cut only double stranded molecules, and more over only at a specific sites. The cut itself can be blunt or staggered.
5. Pasting DNA: DNA molecules may be linked together through a process called ligation which is mediated by enzymes called Ligases. This can be done in several ways. While the hydrogen bonds keep complementary sticky ends together, there is a gap in each of the strands, called a nick. A nick is a lack of a phosphodiester bond between consecutive nucleotides. Such a bond can be established by a ligase. In the blunt end, a DNA ligase will join together the 3' end and the 5' end of one molecule with the 5' end and the 3' end, respectively, of another molecule. Although the term "ligation" means technically just the sealing of a nick, it is often used also to describe the combined process of the annealing of sticky ends and then ligating the nicks.
The above mentioned operations, when treated theoretically it assumes to solve many problems but only few problems are implemented practically. Many theoretical models are found in the literature of DNA computation [2, 12]. In DNA computation, for all the operations the DNA molecules are working par-allely. So, it is possible only if a language supports for parallel processing. Such a language we have selected is Java and implemented the concept by using Java Threads. Each individual operation in the DNA based computers can take minutes or even hours to perform. The cost of the computational step, when compared to that of supercomputers capable of executing a trillion operations per second, looks unimpressive. However, the real power of DNA computers lies in their inherent parallelism. Each operation is performed not on single DNA strand but on every strand simultaneously . Here it is implemented on every nucleotides simultaneously.
Since DNA computation is parallel-like computation, the theoretical foundation is essentially needed. Once a theoretical model is introduced, the next step is to implement practically. In this paper, we propose algorithms for the DNA operations mentioned in the previous section and their time complexities are also analyzed.
Separating DNA strands
Algorithm 1: Separation of DNA Strand Input: double stranded DNA strand Output: two separated single stranded DNA strands begin let s1 [left arrow] dnaStrand ; let m - numberOfThreads ; let n [left arrow] length(s1) ; foreach i [left arrow] 1 to m do break s1 into two single strands ; end end
The nucleotides of the given double stranded DNA strands are separated into single strands parallely depending on the given m threads, giving us two single DNA strands. length() is a function which returns the number of nucleotides in s1. The total number of nucleotides are divided equally among the number of threads and been processed parallely. Let us consider a double stranded DNA with 10 nucleotides and two threads, where the first thread operates on the first half of the strand and at the same time the second thread operates on the second half of the strand and the result is as shown in the Figure 1.
[FIGURE 1 OMITTED]
Fusing DNA Strands
The given DNA single strands s1 and s2 are checked for compatibility of nucleotides parallely using threads and if found compatible they are joined together to get a fused DNA strand, which is shown in the Figure 2.
[FIGURE 2 OMITTED]
Algorithm 2: Fusing of DNA Strands Input: single stranded DNA strands Output: fused DNA strand begin let s1 [left arrow] dnaStrand1 ; let m [left arrow] numberOfThreads ; let n [left arrow] length(s1) ; let s2 [left arrow] dnaStrand2 ; foreach i [left arrow] 1 to m do if compatible nucleotides then fuse s1 and s2 ; end end end
Multiplication of DNA Strand
Algorithm 3: Multiplication of DNA Strand Input: double stranded DNA strand , primer Output: multiplied DNA strands begin let s1 [left arrow] dnaStrand ; let m [left arrow] numberOfThreads ; let n [left arrow] length(s1) ; foreach i [left arrow] 1 to m do call SeperationOfDNA() ; do priming process do extend process ; end end
The nucleotides in the given double stranded DNA s1, is split into two single strands parallely, then priming is done, after which extend process is applied on each single strand to obtain two double stranded DNA.
Cutting of DNA Strands
Algorithm 4: Cutting DNA Strand Input: double stranded DNA strand, nucleotidepairToCut Output: multiple number of cut DNA strands begin let s1 [left arrow] dnaStrand ; let m [left arrow] numberOfThreads ; let n [left arrow] length(s1) ; let site [left arrow] cutplace; foreach i [left arrow] 1 to m do if nucleotidepairToCut then cut s1 at the nucleotide pair; end end end
Each thread checks for the nucleotidepairToCut if finds, cuts the DNA at that place, the resultant DNA strand is cut DNA.
Note: Two methods of cut can be implemented depending on the enzymes used. The method implemented in the algorithm 4 is the blunt end cut using
SmaI enzyme. The EcoRI and XmaI enzymes are used for staggered cut. The enzymes act as shown in the Figures 3 and 4.
[FIGURE 3 OMITTED]
[FIGURE 4 OMITTED]
Pasting of DNA Strands
On every thread the missing elements and Watson-Crick complementary of s1 and s2 are checked and if found compatible then pasting of nucleotides of s1 and s2 are done to obtain pasted DNA s1 and shown in Figure 5.
[FIGURE 5 OMITTED]
On every thread join s1 and s2 at their ends as shown in the Figure 6 to obtain pasted DNA s1.
[FIGURE 6 OMITTED]
Algorithm 5: Pasting of DNA Strands-hybridization method Input: Two double
stranded DNA strands Output: pasted DNA strand begin
let s1 [left arrow] dnaStrand1 ;
let m [left arrow] numberOfThreads ;
let n [left arrow] length(s1) ;
let s2 [left arrow] dnaStrand2 ;
foreach i [left arrow] 1 to m do
if elements missing in s1 then if elements missing in s2 then if compatible nucleotides then
join s1 and s2 ; end end end end end
Algorithm 6: Pasting of DNA Strands-blunt end method Input: Two double stranded DNA strands Output: pasted DNA strand begin let s1 [left arrow] dnaStrand1 ; let m [left arrow] numberOfThreads ; let n [left arrow] length(s1) ; let s2 [left arrow] dnaStrand2 ; foreach i [left arrow] 1 to m do join s1 and s2 at their ends; end end
Note: Pasting of multiple strands are possible, we have to check primers, but practically only two strands are pasted per experiment, so that part is negligible.
Time Complexity: For the algorithms 1 to 6, If m = n then the complexity is at its best case O(1), where the number of threads(m) is equal to the number of nucleoties(n) in the given input DNA strand. If m ranges from 2 to n - 1, the complexity is at its average case O(n/m). If m = 1 the algorithm acts to be a sequential algorithm and the complexity is O(n).
The above discussed algorithms are implemented with the most popular general purpose programming language Java, in Windows XP platform. Java makes concurrency available to the personal computer application programmer through its APIs. Threads are a basic mean for concurrent programming and are widely used in operating systems. At language level, threads offer a way to structure program by decomposing systems in several concurrent components; in this respect, threads are useful for modularity. Threads are sharing the same address space and are executed in turn, one after the other. The processor is allocated to threads by a scheduler which schedules threads execution. The programmer specifies that, application contains threads of execution, here each thread designates a portion of a program that may execute concurrently with other threads. The Java Virtual Machine(JVM) allows an application to have multiple threads of execution running concurrently. The number of threads generated in JVM is platform dependent. Thus, a multi threaded application could behave differently on different Java implementations and platform.
The speed and time efficiency of the DNA computing is the motivation behind this system generation. As the test tube based DNA operations requires massive money and time investment and it is sure a laborious task and it also needs an expert. In this work, the DNA operations are implemented for n elements parallely using Java threads. This can be used as a simulator for the biologists, to analyze their reaction results of their experiments, which involves the above implemented operations. So a test tube based experiment is not necessary. It finds its usefulness as a prototype, for a system to analyze, the real time processing of the DNA operations with real world data, with minimum cost and time.
Problem solving in any discipline of biology and computer science requires, usage of one or more of the above operations implementation, to solve it with more speed and time efficiency. This system can be used to solve a variety of NP complete problems. The proposed work has its importance in solving applications in molecular biology, path-finding in Robotics, routing algorithms design, expert systems, data mining applications, analysis of rule-based systems, genetic algorithms, pattern recognition problems, verify and reengineer the system, trace algorithm behavioral patterns etc, parallely and efficiently. As the system is implemented in Java, it is platform independent and can be used with any operating system.
The operations are implemented seperately as individual functions. They can also be integrated and used as a simulator or can be used seperately as individual operations, according to the research requirements.
Conclusion and future work
DNA computing is fundamentally similar to parallel computing, in that it takes advantage of the many different molecules of DNA to try many different possibilities at once. In this article, we proposed the practical implementations of DNA operations, which extends the possibilities of solving the theoretical problems practically. The complexity of the proposed implementations are also analyzed. In future many NPComplete  problems can be solved using these operations and their time complexity can also be analyzed. In recent years we find a massive growth in the field of parallel computing, because of its performance, response time etc. So parallel computing of DNA - based operations should find its respective place, for the better utilization of resources and research developments. Proper use of the proposed work can provide us low power and fast computing system which is the requirement of today's world. The limitations found by solving the problem in such a way is that it depends on the number of threads allowed by the operating system, its RAM capacity and the speed of the processor used. In future, the proposed work can be extended to distributed computing. It can also be used with multi processor systems assigning each processor a thread, for large volumes of data.
 L. Adleman. Molecular computation of solutions to combinatorial problems. Science, 266:1021- 1024, 1994.
 M. Amos. DNA Computation. PhD thesis, University of Warwick, 1997.
 Deitel and Deitel. Java How to Program. Prentice, US, 1997.
 A. Goni, F.J.Cisneros, P. Cordeno, and J.Castellanos. A dna codification for genetic algorithms simulation. In VI International Conf. on Information Research and Applications. Bulgaria, 2008.
 Alexander J. Hartemink, Tarjei S. Mikkelsen, and David K. Gifford. Simulating biological reactions : A modular approach. DNA 5 :DI-MACS series in Discrete MAthematics and Theoretical Computer Science, 54:111-121, June 2000.
 J.H.Reif. Parallel biomolecular computation: Models and simulations. Algorithmica, 25(2):142-176, 1999.
 L. Kari. DNA computing: Arrival of biological mathematics. The Mathematical Intelligencer, 19:9-22, 1997.
 R. J. Lipton. DNA solution of hard computational problems. Science, 268:542545, 1995.
 R. J. Lipton. Using DNA to solve NP-complete problems. Science, 1995. Tech. Report. Princeton University.
 Swarnendu Mukherjee, Debashis Ganguly, Swar-nendu Bhattacharya, and Partha Mukherjee. A cognitive study on dna based computation. International Journal of Recent Trends in Engineering, 1(2):51-56, May 2009.
 A. Murugan and K.S. Easwarakumar. Transposition based contextual insertion on linear DNA strands with deletion precedence. International Journal of Computer Mathematics, 81(6):647- 660, 2004.
 G. Paun, G. Rozenberg, and A. Salomaa. DNA Computing: New Computing Paradigms. Springer-Verlag, Berlin, 1998.
A. Murugan (a)*, B. Lavanya (b) and K. Shyamala (c)
(a) Department of Computer Science, Dr. Ambedkar Government College, Chennai-600039, India
(b) Department of Computer Science, University of Madras, Chennai-600005, India
(c) Department of Computer Science, Dr. Ambedkar Government College, Chennai-600039, India
Corresponding Author E-mail: email@example.com
|Printer friendly Cite/link Email Feedback|
|Author:||Murugan, A.; Lavanya, B.; Shyamala, K.|
|Publication:||International Journal of Computational Intelligence Research|
|Date:||Apr 1, 2011|
|Previous Article:||EEG signal classification for brain computer interface using SVM for channel selection.|
|Next Article:||A conceptual study of vulnerability detection.|