Super Computers in Pharmacy
Mr. A. M. Kudal
Supercomputers, just like any other typical computer, have two basic parts. The first one is the CPU which executes the commands it needs to do. The other one is the memory which stores data. The only difference between an ordinary computer and supercomputers is that supercomputers have their CPUs opened at faster speeds than standard computers.
This certain length of time determines the exact speed that a CPU can work. By using complex and state-of-the-art materials being connected as circuits, supercomputer designers optimize the functions of the machine. They also try to have smaller length of circuits connected as possible in order for the information from the memory reach the CPU at a lesser time.
Supercomputers have been designed to do complex calculations at faster speeds than other computers. Its designers make use of 2 processes for the enhancement of its performance. The first method is called pipelining. It does complex operations at the same time by grouping numbers which have the same order that it calculates and these are passed to the CPU in an orderly manner. The circuits in the CPU continuously perform the operations while data is being entered into it.
Another method used is called parallelism. It does calculations in a similar than orderly way. This is where it performs various datas at the same time and moves ahead step by step. A usual way to do it is connecting together various CPUs which does calculations together. Each of these CPUs do the commands it needs to carry out on every piece of information.
All supercomputers make use of parallelism or pipelining separately or even combine them to enhance its processing speed. However, an increased demand for calculation machines brought upon the creation of the (MPP) massively-parallel processing supercomputers. It consists of various machines connected together to attain a high level of parallelism. 1
Supercomputers are used for highly calculation-intensive tasks such as problems involving quantum mechanical physics, weather forecasting, climate research (including research into global warming), molecular modeling(computing the structures and properties of chemical compounds, biological macromolecules, polymers, and crystals), physical simulations (such as simulation of airplanes in windtunnels, simulation of the detonation of nuclearweapons, and research into nuclearfusion), cryptanalysis, and the like. Major universities, military agencies and scientific research laboratories are heavy users.
A particular class of problems, known as Grand Challengeproblems, are problems whose full solution require semi-infinite computing resources.
Supercomputers using custom CPUs traditionally gained their speed over conventional computers through the use of innovative designs that allow them to perform many tasks in parallel, as well as complex detail engineering. They tend to be specialized for certain types of computation, usually numerical calculations, and perform poorly at more general computing tasks. Their memory hierarchyis very carefully designed to ensure the processor is kept fed with data and instructions at all times—in fact, much of the performance difference between slower computers and supercomputers is due to the memory hierarchy. Their I/O systems tend to be designed to support high bandwidth, with latency less of an issue, because supercomputers are not used for transaction processing.
Technologies developed for supercomputers include:
- Vector processing
- Liquidcooling
- Non-Uniform Memory Access (NUMA)
- Stripeddisks(the first instance of what was later called RAID)
- Parallelfilesystems
Processing techniques
Vectorprocessingtechniques were first developed for supercomputers and continue to be used in specialist high-performance applications. Vector processing techniques have trickled down to the mass market in DSP architectures and SIMDprocessing instructions for general-purpose computers.
Modern video game consolesin particular use SIMDextensively and this is the basis for some manufacturers' claim that their game machines are themselves supercomputers. Indeed, some graphicscardshave the computing power of several TeraFLOPS. The applications to which this power can be applied was limited by the special-purpose nature of early video processing. As video processing has become more sophisticated, Graphics processing units(GPUs) have evolved to become more useful as general-purpose vector processors, and an entire computer science sub-dicipline has arisen to exploit this capability: General-Purpose Computing on Graphics Processing Units ( GPGPU.)
Operating systems
Supercomputers predominantly run some variant of Linuxor UNIX. Linux is the most popular since 2004
Supercomputer operatingsystems, today most often variants of Linuxor UNIX, are every bit as complex as those for smaller machines, if not more so. Their user interfaces tend to be less developed, however, as the OS developers have limited programming resources to spend on non-essential parts of the OS (i.e., parts not directly contributing to the optimal utilization of the machine's hardware). This stems from the fact that because these computers, often priced at millions of dollars, are sold to a very small market, their R&D budgets are often limited. (The advent of Unix and Linux allows reuse of conventional desktop software and user interfaces.)
Interestingly this has been a continuing trend throughout the supercomputer industry, with former technology leaders such as SiliconGraphicstaking a back seat to such companies as NVIDIA, who have been able to produce cheap, feature-rich, high-performance, and innovative products due to the vast number of consumers driving their R&D.
Historically, until the early-to-mid- 1980s, supercomputers usually sacrificed instruction setcompatibility and code portability for performance (processing and memory access speed). For the most part, supercomputers to this time (unlike high-end mainframes) had vastly different operating systems. The Cray-1 alone had at least six different proprietary OSs largely unknown to the general computing community. Similarly different and incompatible vectorizing and parallelizing compilers for Fortranexisted. This trend would have continued with the ETA-10were it not for the initial instruction set compatibility between the Cray-1 and the Cray X-MP, and the adoption of UNIX operating system variants (such as Cray's Unicosand today's Linux.)
For this reason, in the future, the highest performance systems are likely to have a UNIX flavor but with incompatible system-unique features (especially for the highest-end systems at secure facilities).
Programming
The parallel architectures of supercomputers often dictate the use of special programming techniques to exploit their speed. Special-purpose Fortrancompilers can often generate faster code than Cor C++compilers, so Fortran remains the language of choice for scientific programming, and hence for most programs run on supercomputers. To exploit the parallelism of supercomputers, programming environments such as PVMand MPIfor loosely connected clusters and OpenMPfor tightly coordinated shared memory machines are being used.
Special-purpose supercomputers
Special-purpose supercomputers are high-performance computing devices with a hardware architecture dedicated to a single problem. This allows the use of specially programmed FPGAchips or even custom VLSIchips, allowing higher price/performance ratios by sacrificing generality. They are used for applications such as astrophysicscomputation and brute-force codebreaking.
Examples of special-purpose supercomputers are
1) DeepBlue, for playing chess
2) Reconfigurable computingmachines or parts of machines
3) GRAPE, for astrophysics and molecular dynamics
4) DeepCrack, for breaking the DEScipher
The fastest supercomputers today
Current fastest supercomputer system
A BlueGene/L cabinet. IBM's Blue Gene/L is the fastest supercomputer in the world.
On March 25, 2005, IBM's Blue Gene/L prototype became the fastest supercomputer in a single installation using its 65536 nodes to run at 135.5 TFLOPS (10 12 FLOPS). The Blue Gene/L is a cluster of nodes, each based on a customized version of IBM's PowerPC440processor with 512MiB of local memory. The prototype was developed at IBM's Rochester , Minnesota facility, but production versions were rolled out to various sites, including Lawrence Livermore NationalLaboratory(LLNL). 2
Measuring supercomputer speed
The speed of a supercomputer is generally measured in " FLOPS" ( FL oating Point O perations P er S econd ) or TFLOPS (10 12 FLOPS); this measurementis based on a particular benchmarkwhich does LUdecompositionof a large matrix. This mimics a class of real-world problems, but is significantly easier to compute than a majority of
actual real-world problems .
|
Year |
Supercomputer |
Peak speed |
Location |
|
7.226 TFLOPS |
|||
|
35.86 TFLOPS |
|||
|
64 TFLOPS |
National AstronomicalObservatory/ University of Tokyo, Japan |
||
|
1 TFLOPS |
|||
|
42.7 TFLOPS |
NASA Advanced Supercomputingfacilityat NASA Ames Research Center, California, USA |
||
|
70.72 TFLOPS |
|||
|
136.8 TFLOPS |
U.S. Department of Energy/ U.S. NationalNuclear Security Administration, |
||
|
280.6 TFLOPS |
MareNostrum is a supercomputer based on processors PowerPC, the architecture BladeCenter, a Linux system and a Myrinet interconnection. These four technologies configure the base of an architecture and design that will have a big impact in the future of supercomputing.
- Peak Performance of 42, 35 Teraflops
- 4.812 2.2 GHz IBM Power PC 970FX processors (2406 dual 64-bit processor blade nodes)
- 9,6 TB of main memory
- 236 TB of disk storage
Interconnection networks: Myrinet, Gigabit Ethernet, Ethernet 10/100 3
Drug Design - Molecular Modelling for Drug Design on P2P Grid
The Virtual Laboratory project in Melbourne , is engaged in research, design, and development of Grid technologies that help in solving large-scale compute and data intensive science applications in the area of molecular biology. The virtual laboratory environment provides software tools and resource brokers that facilitate large-scale molecular studies on geographically distributed computational and data grid resources. 4
|
EnterTheGrid description |
Type |
Country |
|
Research Grids |
Netherlands |
|
|
Research Grids |
France |
|
|
Research Grids |
Italy |
|
|
Research Grids |
United Kingdom |
|
|
Research Grids |
European Union |
|
|
Research Grids |
Sweden |
|
|
Research Grids |
United States |
|
|
Research Grids |
United Kingdom |
|
|
Research Grids |
United Kingdom |
The supercomputer Parsytec allows to simulate more than hundred substance interactions on various plants at the same time.
The new structures, perhaps, the pre-images of medicines are needed for modern pharmacology. Fullerenes have appeared to be one of the most promising structures. Their derivatives have an increased selectivity concerning the interaction on biological plants. For example, the leading reason of today is the mortality occurred by atherosclerosis, that is why the application of the St.-Petersburg scientists (Foundation for Intellectual Collaboration) for "Sorbent for deleting atherogene lipoproteids from the blood and the method of its deriving", is the large event for experimental medicine.
Center of Fullerenes and Nanostructures of the Institute for High-performance Computing and Data Bases of the Ministry of Science and Technologies of Russian Federation uses large resources of a supercomputer center earlier inaccessible to biologists and physicians of the institute in order to simulate the biologically active derivatives of fullerenes and their action on alive organisms. The new approaches to computer simulation of medicines have been developed. First of all, it concerns the interaction of active molecules with lipids of biomembranes playing a key role in a penetration through a fabric barriers, that determines many aspects of a metabolism, interaction with receptors and ferments. It allows the following centre to become a foothold of pharmacology based on modern methods of molecular simulation. 5
Bioinformatics
The advent of numerous genome-sequencing projects like the Human Genome Project have led to millions of sequence residues flooding into the genome sequence databases like GenBank and EMBL. This biological data is being analyzed to provide structural and functional information on unknown genes or proteins, reconstructing metabolic pathways for detecting drug targets and so on using various computational tools, comparative genomics methods and microarray data analysis. The Bioinformatics Team at C-DACdeals with the development, porting and optimization of codes on PARAM(a parallel supercomputer developed by C-DAC) in the above areas and for mining large genomic databases, large molecular dynamics simulations, comparative genomic studies and gene expression data analysis.
Molecular Modeling
Molecular Dynamics (MD) simulation incorporates a deterministic molecular modeling method to derive sequential sets of atomic positions by solving the differential equations embodied in Newton 's Law of Motion. Molecular modeling programs like AMBER, CHARMM and GROMACS are widely used to carry out MD simulations. Carrying out large realistic simulations on biomolecules necessitate the use of high performance computing machines. Codes like AMBER, CHARMM and GROMACS have been ported and optimized on PARAM Padma. Using such codes on the PARAM system, it is possible to carry out large simulations of biomolecules for structural studies.
Biomolecular Docking
Biomolecular interactions are the core of all regulatory and metabolic processes that together constitute the process of life. To enable computer aided analysis of these interactions as well as automated prediction of molecular interactions, biomolecular-docking codes like FTDock and DARWIN have been ported and optimized on the PARAM Padma. This has tremendous application in the rational drug design process.
Ab-initio Methods
Efforts are engaged in the area of Quantum Chemistry to obtain stable structures and partial charges for various modified nucleotides, modified amino acids or any other drug molecule used in classical molecular dynamics simulation. The study plays an important role in finding active sites of drugs, stable and alternative conformations of proteins, nucleic acids or drug molecule, and also assists in obtaining parameters for molecular mechanics type potential energy function. The study uses codes like MOPAC and NWChem, which have been ported on PARAM Padma.
Genome Sequence Analysis
Genome sequence analysis deals with a range of popular tools beginning with dynamic programming methods like Smith-Waterman and heuristic methods like BLAST and FASTA, to multiple sequence alignment tools like CLUSTAL. Popular sequence analysis codes like BLAST, FASTA, Smith-Waterman and CLUSTAL have been ported on the PARAM Padma. Such a high throughput environment can be useful for large comparative genomic studies and rapid drug target identification.
Gene Finding
Tools to analyze and annotate genomic DNA sequences and model organisms are being used extensively to identify coding regions so as to deduce the structure of genes and the resulting proteins. Gene finders for eukaryotic genome like HMMgene and Genscan, and Glimmer for microbial gene prediction are the most widely used and have been ported on PARAM Padma.
Comparative Genomics
A significant goal in the post-genome era is to relate the annotated genome sequence to the physiological functions of a cell. Pathway analysis tools have been ported on PARAM 10000 to reconstruct metabolic pathways, derived from annotated genome sequence as well as biochemical and physiological information. In silico metabolic pathway reconstruction, metabolic pathway comparison, pathway based analysis of expression data, using software such as Pathway Tools and KEGG system, and metabolic pathway engineering are the major goals of porting the codes on PARAM. Efforts in this direction can help in the validation of functional annotation, identification of novel pathways, identification of probable drug targets and metabolic pathway engineering for better processes.
Microarray Data Analysis
Microarray, a high-throughput expression technique monitors and analyzes gene expression profiles of thousands of genes simultaneously, and finds great significance in novel gene identification, disease diagnosis, drug discovery and toxicogenomics. MEME (Multiple EM (Expectation Maximization) for Motif Elicitation) is one of the tools available to detect motifs in a set of DNA or protein sequences. The parallel version of MEME has been ported on PARAM Padma while clustering tools like Genesis are being used to cluster the gene expression data of the genes with a similar motif in their upstream regions.
Problem Solving Environment (PSE)
PSE or Problem Solving Environment is a software that enables the use of high performance computing resources by providing users with a complete, integrated environment for a specific application. Its main advantage is that it makes available advanced hardware resources, software tools and assistance in a friendly environment that allows the user to concentrate more on the domain research problems. Other relevant features of the PSEs, which is built, using the J2EE three tier architecture, include multiple session handling, persistence state and visualization capabilities. At present, PSE's for molecular modeling codes like AMBER and CHARMM, and sequence analysis codes like Smith-Waterman, FASTA and BLAST have been developed for PARAM. 6
High Performance Computing and Bioinformatics
The advent of high-throughput technologies like genome sequencing, microarrays and proteomics has transformed biology into a data-rich information science. The huge data generated needs to be organized in a structured manner to facilitate the use of data-mining tools for extracting knowledge. The ultimate objective of these efforts is to improve our understanding of human health and thereby provide rationale solutions to overcome diseases. This is possible only when there is a complete understanding of life and its processes. The grand challenge areas constitute: a precise overview of molecular details of speciation, mechanistic understanding of protein evolution, determination of effective protein-DNA, protein-RNA and protein-protein recognition codes, accurate ab-initio protein structure prediction and ability to predict cellular responses to external stimuli. Analysis of the phenomenally large amount of data representing different strata of bio-complexity necessitates the use of High Performance Computing (HPC).
The heterogeneity of biological data in terms of sequence, structure and microarrays increases the complexity for analysis. Algorithms developed to analyze such enormous and heterogeneous data are compute-intensive and more often are NP-complete problems thus posing challenges for HPC. Database searching algorithms used in functional annotation; comparative genomics analysis; microarray data analysis; reconstruction of metabolic pathways; homology modeling ; molecular mechanics, molecular dynamics and quantum mechanics simulations; docking studies to predict probable interaction partners are areas that demand the usage of HPC. 7
Recent Advances
Researchers looking for new anticancer drugs are hoping to harness the unused capacity of personal computers through the launch last week of software linked to a screensaver that uses free computer capacity to assess the anticancer activity of millions of molecules.
The project is a collaboration between the University of Oxford and United Devices, a
US technology company. Underthe scheme, users download a free screensaver and software that uses spare capacity on the computer. The system, which uses what is known as peer to peer technology, creates a “virtual supercomputer” and makes enormous quantities of processing power available. Each computer joining the scheme will be sent over the internet an initial package of models of 100 molecules together with a drug design software application called Think, which was developed at the university
of Oxford , and a model of a protein implicated in cancer. The target molecules being tested initially include superoxide dismutase, an enzyme that protects a Ras protein; and vascular endothelial growth factor. Think will create three dimensional models of the trial molecules and test their interactions with the target protein. If a molecule interacts successfully, it will be sent back to the central server for further investigation. 8
Conclusion:
Supercomputers are used for calculations such as problems involving quantum mechanical physics, weather forecasting, climate research , molecular modeling(computing the structures and properties of chemical compounds, biological macromolecules, polymers, and crystals), physical simulations (such as simulation of airplanes in windtunnels, simulation of the detonation of nuclearweapons, and research into nuclearfusion), bioinformatics etc. Futhermore the supercomputer allows to simulate more than hundred substance interactions at the same time. Supercomputers
are definitely gaining importance day by day.
References :
1.http://wawa.essortment.com/supercomputersw_ppk.htm
2.en.wikipedia.org/wiki/Supercomputer
3.http://mmb.pcb.ub.es/MODEL/
4.http://enterthegrid.com/vmp/articles/EnterTheGrid/AE-ETG-profile-174.html
5.www.spbcas.ru/mmbs/drug.html
6.www.cdac.in/HTmL/secg/bi.asp
7.http://bioinfo-portal.cdac.in/index.htm
8. http://www.bmj.com/cgi/reprint/322/7291/882/b.pdf
About Authors
Mr.Anand M. Kudal
Working as Lecturer at MAEER’s, Maharastra Institute of Pharmacy, MIT Campus, Pune. He has completed M. Pharm in Medicinal and Pharmaceutical Chemistry from Department of Pharmacy, SGSITS, Indore , RGPV, Bhopal . He is a Life member of APTI.
Dr. S. R. Parakh
Working as Principal and Professor in Pharmaceutics at MAEER’s, Maharastra Institute of Pharmacy, MIT Campus, Pune-411038
Email:srparakh@rediffmail.com
