Extracting Relevant Information from FDA Drug Files to Create a Structurally Diverse Drug Database Using KnowItAll®

Sponsored Links


Each Food and Drug Administration (FDA) consumer drug information file contains an inordinate amount of useful chemical, pharmaceutical, and pharmacological data. These files profile approved drugs by chemical structure, solubility, absorption, distribution, metabolism, elimination, toxicity (ADME/Tox), and possible adverse reactions. The ability to utilize this data in the classroom is a new approach to connect theory, technology, and reality. The KnowItAll® Informatics System available through Bio-Rad Laboratories, Philadelphia, PA , offers fully integrated software and/or database desktop solutions. It holds a large collection of in silico ADME/Tox predictors and is a chemical informatics platform used to record experimental data. This project had three goals: (1) extract relevant information for 75 drugs from their freely available FDA drug files (limited to orally administrated drugs, pro-drugs, having a chemical structure), (2) build a database so this extracted FDA information is indexed for search and analysis, and when completed, (3) undergraduates involved in such a project should be capable of harvesting useful chemical, pharmaceutical, and pharmacological information; be adept in computational chemistry software tools; and should gain an enhanced vocabulary and new insights into organic chemistry, molecular biology, and physiology.

Keywords: FDA consumer drug database©, KnowItAll®, ADME/Tox, (quantitative) structure activity relationship (Q)SAR, predictor tools, chemical informatics.


FDA consumer drug information files [1] provide open access to clinical data, including data from negative trials, and unpublished results. These portable document format (pdf) files profile drugs by chemical structure, solubility, metabolism pathways, absorption, distribution, elimination, carcinogenesis, mutagenesis, impairment of fertility, possible adverse reactions, and other useful pharmacokinetic, and toxicological data. Additionally, the searchable drug information database Drugs@FDA [2] serves as the most comprehensive resource for product specific information on drugs approved in the USA, and can be navigated using online text query with ease. However these documents are not consumer friendly and are very wordy. Furthermore, the often complex drug structures that are reported in the pdf drug files cannot be exported into another application. The FDA web-site also include findings from documents submitted voluntarily to the FDA Safety Information and Adverse Reporting program (MedWatch) [3] by the drug and biologic manufacturers, consumers, distributors, packers, and sponsors and from investigators with studies [3, 4] under the Investigational New Drug (IND) applications. Another very recent exhaustive resource serving the life science community is the University of Alberta’s DrugBank database [5]. One goal is to provide science majors with a process that overcomes the initial fear that they may experience when attempting to extract useful information from this mass of files, as all of the above browseable high-quality mass of data can be mined, analyzed, and interpreted to develop statistical models of predictive quantitative and qualitative structure activity relationships (QSARs, SARs) [6-8].

Bio-Rad’s KnowItAll® Informatics System Desktop Solutions [9] offers fully integrated software and/or database desktop solutions for multiple aspects of research including in silico ADME/Tox profiling, spectroscopy, cheminformatics, and medicinal chemistry [10]. The Company offers several KnowItAll® “editions” that combine the appropriate set of software tools based on specific user groups/needs [9-12]. In this article we detail steps that combine information freely accessed through the web along with our KnowItAll® Cheminformatics Edition [9] and our KnowItAll® ADME/Tox Edition [9] purchased through Bio-Rad Laboratories, in a project to teach students within an established Directed Research Program methods to access, extract, document, and manage relevant data from FDA drug files in order to create a searchable consumer drug database.


In order to create a mindset that would entail both qualitative and quantitative analyses, we undertook our project in two stages:  (1) extract information for 75 consumer drugs from the FDA Drug Information Websites (limited to orally administrated drugs, pro-drugs, having a chemical structure) [1-4]; and (2) build a pharmaceutical database using our ADME/Tox edition and our Cheminformatics edition of the KnowItAll® System [9].

Here, data was extracted from the freely available FDA consumer drug information files [1-4] for the following 75 randomly chosen unrelated consumer drugs: Iressa®, Levitra®, Strattera®, Abilify®, Inspra®, Hepsera®, Namenda®, Alinia®, Emtriva®, Emend®, Tindamax®, Sensipar®, Pletal®, Viagra®, Zavesca®, Orfadin®, Zyvox®, Uroxatral®, Zelnorm®, Avodart®, Frova®, Provigil®, Aromasin®, Detrol®, Thalomid®, Atacand®, Zonegran®, Micardis®, Maxalt®, Xeloda®, Arava®, Gleevec®, Cialis®, Reyataz®, 5FU from of Xeloda®, Cardesartan (active metabolite of Xeloda®), Hectoral®, Singulair®, Spectracef®, Cefaditoren (active metabolite of Specfracef®), Sanctura®, Colazal®, active metabolite of Arava®, Zetia®, Hepsera® M1, Spiriva®, Starlix®, Tasmar®, Xifaxan®, Crestor®, Nolvadex®, Detrol® M1, Mobic®, Sustiva®, Aciphex®, Ziagen®, Protonix®, Celebrex®, Agenerase®, Trileptal®, OxcarbazepineMHD (active metabolite of trileptal®), Exelon®, Temodar®, Keppra®, Tequin®, Avandia®, Actos®, Avelox®, Tamiflu®, Oseltamivir Carboxylate (active metabolite of Tamiflu®), Lopinavir®, Ritonavir®, Benicar®, Rapamune®, and Ketek®. The three dimensional chemical structure for each of the 75 drugs was drawn using the DrawItTM drawing application available in the KnowItAll® Cheminformatics Edition [9]. For prodrugs, chemical structures of both the prodrugs and their corresponding real drugs were drawn and dealt with separately (for example Detrol® and Detrol M1).

KnowItAll® Cheminformatics solutions [9] include tools to draw, modify, store, search, name, and retrieve chemical structures. Notably, their structure drawing and reporting tools are based on the well-respected ChemWindow technology and are designed so that chemists can recognize stereochemistry, E/Z isomers, and contains chemical recognition features such as, hot keys, chemical syntax checker, tools to calculate mass and formula, etc. A recent comparison [13] of current freely available commercial chemical software drawing and reporting tools, gave the applications from the KnowItAll® Academic Edition a very high rating for its quality, flexibility, and ease of use.  

Figure 1.   Chemical structure of Tamiflu® obtained using the DrawItTM application in the KnowItAll® Cheminformatics Edition


Figure 1 shows the chemical structure of the drug Tamiflu® drawn using the DrawItTM application in the KnowItAll® Cheminformatics Edition. This structure is documented in Tamiflu’s® FDA drug profile [1] where it is reported with a non-systematic International Union of Pure and Applied Chemistry (IUPAC) nomenclature of (3R,4R,5S)-4-acetylamino-5-amino-3-(1-ethylpropoxy)-1-cyclohexene-1-carboxylic acid, ethyl ester, phosphate (1:1). An advantage of the KnowItAll®

Figure 2.   IUPAC name of Tamiflu® obtained using the IUPAC NameItTM application in the KnowItAll® Cheminformatics Edition


Figure 3.   3D structure of Tamiflu® obtained using the 3D ViewItTM application in the KnowItAll® Cheminformatics Edition


Cheminformatics Edition is that it is bundled with the IUPAC NameItTM application, which has the capability to generate a compound’s correct systematic IUPAC name from its structure. In this case (as shown in Figure 2) the IUPAC NameItTM application reported, ethyl (5S,3R,4R)-4-(acetylamino)-5-amino-3-(ethylpropoxy) cyclohex-1-enecarboxylate, phosphoric acid, for Tamiflu®. This ensures the accuracy in the recording, storage, and the retrieval of chemical information for the drug that often has significant text information content associated with its systematic name in the literature.

The 3D ViewItTM application also bundled in the KnowItAll® Cheminformatics edition has the ability to covert the 2D DrawItTM image of Tamiflu® into realistic 3D drawings as shown in Figure 3. This ability to work with such high resolution images can serve as an inexpensive and powerful method to study structure based drug design.

Since FDA drug structures cannot be imported directly from the drug file into KnowItAll®, one has to redraw the often complex drug structure using correct stereochemistry and geometry. An example is shown below using the structure of the immunosuppressant Rapamune®. A slight error in the drawing (like not connecting the highlighted double bond as shown in Figure 4), can have a significant impact during any future mining and screening process [14]. Such an error can be avoided when using the KnowItAll® Cheminformatics Edition, due to the presence of a ‘Check Chemistry’ tool which highlights possible connectivity errors, and a ‘Calculate Mass & Composition’ tool that provides the structure’s molecular weight (MWt), molecular formula, and chemical composition; these can be then compared for accuracy with the data reported for these parameters in the drugs FDA files [1-4] (Figures 4 & 5). Storage of accurate structures within a database has been shown to be crucial especially when querying the database by structural criteria or substructure search [14, 15].

Figure 4.  Incorrect drawing of Rapamune®                   Figure 5.  Correct drawing of Rapamune®

MWt: 930.23; Formula: C52H83NO13                              MWt: 914.19; Formula: C51H79NO13


KnowItAll® also logically integrates all applications in a single interface (Figure 6) so the user can easily transfer information from application to application without opening another program [9]. For example, one can draw a structure, send it to a module for NMR prediction, and then add that structure to a user database. Tools are also available to correlate whether or not a structure matches a spectrum, and the SearchItTM application [9] allows structures and/or spectra to be imported and searched against reference databases, as well as against user created databases. 

Figure 6.   Single interface applications [9] in the KnowItAll® Cheminformatics Edition


The company’s website also reports that the KnowItAll® in silico ADME/Tox solutions can be used to assess a potential drug’s ADME/Tox profile with over 30 predictive models and tools for model building and validation [9]. These applications in the KnowItAll® ADME/Tox edition allows researchers to build predictive SAR models of biological properties using databases of compounds with known property values and molecular indices calculated from their chemical structure [9, 16-20]. We purchased fourteen pharmaceutical and pharmacological parameters and these were packaged in our KnowItAll® ADME/Tox edition. These parameters are common to traditional (Q)SAR modeling and each parameter value was then extracted (when reported) from each drug profile. The fourteen pharmaceutical and pharmacological properties were Oncogenicity, Teratogenicity, Mutagenicity, Human Intestinal Absorption, Plasma Protein Binding, Water Solubility, Volume of Distribution, Elimination Half Time, Rate of Absorption, Blood Brain Barrier, NeuroToxicity, pKa, Bioavailability, and log P. In cases where experimental values were not available, missing properties from some drug profiles were calculated with the available predictor tools [9] within the KnowItAll® platform. 

Since the published FDA drug profiles [1-4] for these structurally diverse consumer drugs come from different pharmaceutical companies, there are major differences in the reporting of a set of properties provided for each drug profile. Therefore for some of the drugs, before using their reported information in the pharmaceutical and pharmacological data content section of their FDA files, data normalizations were sometimes needed. An example here is for the pharmacokinetic parameter Volume of Distribution; for Micardis® (which is a combination of telmisartan, an orally active angiotensin II antagonist acting on the AT1 receptor subtype, and hydrochlorothiazide, a diuretic), the FDA file reports: “The Volume of Distribution for telmisartan is approximately 500 liters indicating additional tissue binding.” In our KnowItAll® database, for Micardis®, Volume of Distribution is normalized to 7.14 L/kg. This was done by dividing the reported volume by 70 kg which is the average weight of the human subjects in a majority of the documented FDA file data. Sometimes, wide ranges of data had to be inputted into KnowItAll®. For example, the Rapamune® FDA label file reports: “The mean volume of distribution (Vss/F) of sirolimus (the international nonproprietary name for Rapamune®) is 12 ± 8 L/kg.” Hence, its KnowItAll® database Volume of Distribution is documented as 4 ~ 20 L/kg. Sometimes data interpretations had to be made; as in the case of pharmaceutical formulation parameter, Water Solubility: approximations of sparingly, slightly soluble => 0.001 – 0.01 mg/ml; very, freely, or highly soluble => 1000 mg/ml, were made. For example, the Crestor® FDA label file reads: “Rosuvastatin calcium (active ingredient) is a white amorphous powder that is sparingly soluble in water and methanol, and slightly soluble in ethanol.” Hence, the KnowItAll® database Water Solubility value for Crestor® is documented as 0.001 mg/ml.

Figure 7.   FDA Consumer Drug database© created using the KnowItAll® ADME/Tox Edition


On evaluating each of the chosen 75 FDA consumer drug profiles, we redrew the three dimensional chemical structures for each of the 75 drugs using the DrawItTM drawing application available in the KnowItAll® Cheminformatics Edition; harvested information about each drugs trade (patent) name, classification, and associated chemical data; documented available information for all fourteen pharmaceutical and pharmacological parameters; and then, with the database building capability of Bio-Rad’s KnowItAll® platform, created a FDA Consumer Drug Database© as shown in Figure 7.  This database is now available [21] through Bio-Rad Laboratories, and it should help increase the accuracy of predictions by contributing to the variation of available models [22]. Going forward, there is the potential to use this database [21] to extract SAR patterns (example shown below) across these structurally diverse and unrelated 75 consumer drugs. Results from such a task will be presented in the near future [23].

In order to investigate structure activity relationships (SAR) of sulfur-containing functional groups (sulfide, sulfamoyl, sulfonyl, and sulfone); the sulfur-containing functional group is first drawn using the DrawItTM tool in KnowItAll®, then the SearchItTM tool is utilized to search the FDA Consumer Drug Database© for the maximum hits for that functional group, sorting among the 75 drugs.  However SearchItTM results reports some drugs several times (within the 4 sulfur-containing functional groups), not because they contain more than one functional group, but simply because of a common chemical structural motif, such as S=O, shared between the 4 sulfur-containing functional groups.  Hence, a visual sorting needs to be done.  On completion, the ChemSilico predictions [9] can be made for each of the 75 drugs using the KnowItAll® predictor ProfileItTM as represented in the Figure 8 below, and the required biological property data can be extracted from these predictions. 

Figure 8.   ProfileItTM prediction results for Tindamax® (a synthetic antiprotozoal and antibacterial agent).



FDA Drug Information data files [1-4] contain valuable chemical, pharmaceutical and pharmacological data.  However at times, this data needs normalization as it is expressed in wide ranges. The consistency to optimize compounds for multiple attributes (simultaneously) with experimental data in certain areas, should also encourage students working on such a project to make the connections between chemistry, biology, and physiology. The KnowItAll® platforms ease-of-operation, availability of analysis, interpretation, and reporting tools, coupled with the ability to build searchable databases seamlessly integrated within a single user interface, is ideally suited for cheminformatics applications at research institutions. This system offers an ideal environment for researchers to analyze and compare experimental results across a diverse collection of drugs including the potential to study structure activity relationship (SAR) patterns.


This research was supported by grant number 2 P2O RR016472-08 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH). This IDeA Network of Biomedical Research Excellence (INBRE) grant to the state of Delaware was obtained under the leadership of the Delaware Biotechnology Institute, University of Delaware , and the authors sincerely appreciate their efforts.

References and Notes

§Fumie Koyoshi was a Wesley College Biology major who completed this project during an INBRE supported Undergraduate Research Assistantship in the Directed Research Program in Chemistry at Wesley College.  On graduation, she joined the University of Pennsylvania Hospital, School of edical Technology, Philadelphia, PA.

1.Food and Drug Administration (FDA) profiles. Retrieved from http://www.fda.gov/cder/drug/default.htm

2.Drugs@FDA. Retrieved from http://www.accessdata.fda.gov/Scripts/cder/DrugsatFDA/

3. FDA Safety Information and Adverse Reporting program (MedWatch). Retrieved from http://www.fda.gov/medwatch/

4.NIH Molecular Libraries-Small Molecule Repository. Retrieved from http://mlsmr.glpg.com/MLSMR_HomePage/submitcompounds.html

5.Wishart, D. S.; Knox, C.; Guo, A. C.; Shrivastava, S.; Hassanali, M.; Stothard, P.; Chang, Z.; Woolsey, J. DrugBank: A Comprehensive Resource for In Silico Drug Discovery and Exploration, Nucleic Acid Research 2006, 34 (Database Issue), D668-D672.

6.Kruhlak, N. L.; Contrera, J. F.;  Benz, R. D.; Matthews, E. J. Progress in QSAR Toxicity Screening of Pharmaceutical Impurities and Other FDA Regulated Products, Advanced Drug Delivery Reviews 2007, 59(1), 43-55.

7.Cronin, M. T. D.;  Jaworska, J. S.; Walker, J. D.; Comber, M. H. I.; Watts, C. D.; Worth, A. P. Use of QSARs in International Decision-Making Frameworks to Predict Health Effects of Chemical Substances, Environmental Health Perspectives 2003, 111(10), 1391-1401.

8.Walker, J. D.; Jaworska, J.; Comber, M. H. I.; Schultz, T. W. Dearden, J. C. Guidelines for Developing and Using Quantitative Structure Activity Relationships, Environmental Toxicology and Chemistry 2003, 22(8), 1653-1665.

9.KnowItAll® Informatics System Desktop Solutions. Retrieved from http://www.knowitall.com/

10. D’Souza, M. J. KnowItAll® - Software Reviews, Chemistry World, 2005, 2(9), 70-71.

11. KnowItAll® U System. Retrieved from http://www.knowitallu.com/

12. D’Souza, M. J. KnowItAll® U System - Software Reviews, Chemistry World, 2007, 4(11), 70-72

13. Anand, V.; Gera, M.; Kumar, V.; Karwasara, P.; Kataria, M.; Kukkar, V. Comparative Evaluation of Freely Available Chemical Structure Drawing Software, Pharmaceutical Rev. 2008, 6(2), ISSN 1918-5561.

14. Richard, A. M.; Swirsky Gold, L.; Nicklaus, M. C. Chemical Structure Indexing of Toxicity Data on the Internet: Moving Toward a Flat World, Current Opinion in Drug Discovery & Development, 2006, 9(3), 314-325.

15. Baumgras, J. L.; Rogers, A. E. Chemical Structures at the Desktop: Integrating Drawing Tools with on-line Registry Files, Journal of the American Society for Information Science 1999, 46(8), 623-631.

16. D’Souza, M. L.; Abshear, T.; Banik, G. M.; Nedwed, K.; Peng, C. A Model Validation and Consensus Building Environment, SAR and QSAR in Environmental Research 2006, 17(3), 311-321.

17. Banik, G. M. In Silico ADME/Tox Prediction: The More, the Merrier, Current Drug Discovery, 2004, 31-34.

18. Dearden, J.; Worth, A. In Silico Prediction of Physicochemical Properties, JRC Scientific and Technical Reports, EUR 23051, EN-2007, 1-68.

19. Bidault, Y. A Flexible Approach for Optimizing In Silico ADME/Tox Characterization of Lead Candidates, Expert Opinion on Drug Metabolism and Toxicity 2006, 2(1), 157-168.

20. Dearden, J. C. In Silico Predictions of ADMET Properties: How Far Have We Come? Expert Opinion on Drug Metabolism and Toxicity 2007, 3(5), 635-639.

21. D’Souza, M. J. FDA Consumer Drug database – 2007. HaveItAll - ADME/Tox Experimental Databases Datasheet, Bio-Rad Laboratories, Bulletin # INF-96199, 2008.

22. Wess, G. How to Escape the Bottleneck of Medicinal Chemistry, Drug Discovery Today, 2002, 7(10), 533-535.

23. D’Souza, M. J.; Koyoshi, F.; Everett, L. M. Structure Activity Relationship (SAR) Patterns Observed Within a Series of Unrelated Common Consumer Drugs, 2009 International Conference on Bioinformatics, Computational Biology, Genomics, and Chemoinformatics (BCBGC-09), Orlando, FL, USA (2009).

About Authors:

Malcolm J. D’Souza , Fumie Koyoshi

Malcolm J. D’Souza

Malcolm J. D’Souza
Dr. Malcolm J. D’Souza is Professor of Chemistry at Wesley College, in Dover, Delaware. He has published over 50 peer-reviewed journal articles, and has established a nationally recognized Wesley College Undergraduate Directed Research Program in Chemistry. 26 undergraduates are co-authors on his publications, and 6 Wesley College undergraduate research posters from his laboratory earned national recognition. Recently, the Delaware American Chemical Society (DE-ACS) nominated Dr. D’Souza to the 2008 E. Emmett Reid Award for Excellence in Teaching at a Small College. Dr. D’Souza has received 3 teaching awards, a faculty research award, and currently serves as Guest Editor for a peer-reviewed journal. His current research efforts in bio-organic chemistry are supported by grant number 2 P2O RR016472-08 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH). This IDeA Network of Biomedical Research Excellence (INBRE) grant to the state of Delaware was obtained under the leadership of the Delaware Biotechnology Institute (DBI), University of Delaware.

Fumie Koyoshi

Fumie Koyoshi
Fumie Koyoshi (B.S. Biology, 2008) graduated at the top of her class at Wesley College. She earned several scholarships and honors, including the Wesley College Faculty Award given to the most outstanding student in the 2008 graduating class. Outcomes from her Wesley College undergraduate research work were published as 4 refereed articles, and two of her poster presentations earned Certificates of Recognition when presented at National American Chemical Society Conferences. On graduation, Ms. Koyoshi joined the University of Pennsylvania Hospital, School of Medical Technology, in Philadelphia, Pennsylvania.

Taxonomy upgrade extras: 
Volumes and Issues: