In silico Drug Design: some concepts & tools

Article Index

Drug discovery, chemical biology & precision medicine

There are many in silico tools to carry out, for instance, ADMET predictions, binding pocket analysis, define protein-protein interaction networks, rational drug repositioning, protein modeling, graft a sugar onto a protein structure, peptide docking, systems pharmacology, adverse drug reaction predictions, compound collection annotation, virtual screening, analysis of point mutations observed in patients (instead of mutation, variation is nowadays most often used), protein docking...

Overview

If you are new to the field, here are some general ideas: the first step is to define the in silico tools you need. This is directly linked to the type of questions you want to address, the type of project, the stage of the project and the data that you have to start with. In some cases, in silico approaches can not really help initially, some experiments have to be performed first and then, in silico prediction engines will become very valuable. In other situations, you need to start both at the same time, in vitro and in silico experiments while in other situations, in silico study can come first (notion of parallel integration of the approaches, iterative integration like experimental and then silico and then experimental while focused integration starts with silico to filter out unwanted molecules...).

Also, there are some key differences between chemical biology projects and drug discovery projects and these have to be considered, chemical probes can be critical for drug discovery projects, they are challenging to produce, but at the same time easier with regard to many ADMET properties...etc. It also depends how far you want to go... meaning how much funding you have, which is obviously a key limiting factor.

About 90% of projects entering clinical trials fail

If your project is about target-based screening, you may need the 3D structure of your target. You can check the PDB to find some experimental 3D structures (Xray or NMR in general). If not, you can try to predict the 3D structure, for instance for a protein via comparative model building tools. You can find many valuable standalone and online servers in the sections Modeling Molecules, Simulations, etc. You will thus have to look in the Bioinformatics section.


Next, assuming you have your selected protein in 3D, you may need a peptide or a small non-peptidic chemical compound or an antibody that binds to your target. If you know the binding pocket, then you can use structure-based virtual screening approaches (Chemoinformatics section) or peptide - protein docking (Bioinformatics section). In general you ll need to prepare a compound collection if you search for a small molecule modulating your system (small molecule or biologics such as peptides). Once you have prepared this collection or found it online, then, using in silico screening or related approaches you should be able to define a small list of molecules (maybe 20 peptides or maybe 200-500 small chemical compounds) that will need to be tested experimentally (important to think about the assays, how mechanisms are going to be investigated, can you get direct binding data..). Maybe you do not know the binding pocket and then you can use tools that will predict binding cavities and the so-called druggable pockets (Bioinformatics section). Your therapeutic target might be flexible, then, you need simulation tools like molecular dynamics and many others. For all these tasks, you have in silico tools that have been developed these last 10 to 20 years or so... If your therapeutic molecules are RNA, DNA, mAb, you can also find tools that help, for instance you may want to stabilize a protein, you may want to graft a small compound on a mAb, on a peptide....

If you know a small molecule that binds to your target, you can search in databases other molecules that are similar to your query, then you can test in vitro these new molecules and build some SAR (see for instance the ligand-based virtual screening tools) (Chemoinformatics section).
If you search a hit compound that could be used as starting point for drug discovery, you will need to predict some ADME-Tox properties (this can be very valuable also for chemical biology projects). You should check if the molecules have structural alerts, PAINS or promiscous cmpds. According to such analysis, you may have to perform additional experiments to double check your initial results. Tools that can be of interest at this stage can belong to the QSAR section or the virtual screening sections, and obviously to the ADME-Tox section...(Chemoinformatics section).

In most cases, you will have to look at databases to see if your target has been screened already or if your favorite compounds are already known to hit many targets. Databases that are open are for instance PubChem and ChemBL, these will be in the chemoinfo section.

You may want to know if your compound could bind to other secondary targets, often called off-targets and if the effect on health is not favorable, these secondary targets are called anti-targets. To do this, you can use tools that belong to the off-targets, repurposing, repositioning section. There, different approaches are available, from ligand-similarity searches to reverse docking...etc (Chemoinformatics section). If you use phenotypic screening, then several of these methods can also help to try to identify the target(s) involved.

As mentioned, to develop a new drug it is likely to take many years and success is far from certain. It has been estimated that it takes 13.5 years to bring a new molecular entity to market and the success rate for taking oncology drugs from phase I to approval by the US Food and Drug Administration (FDA) was only around 7%. These numbers change a bit in different reports but it gives an overall idea. Repositioning could thus be valuable in some cases as it builds on previous research, allowing compounds to progress more quickly as well as saving a substantial amount of money when it works. In silico strategies can help here. One concept that helps to understand repositioning is for instance the notion of polypharmacology that is one small molecule drug is likely to have an average of six to seven targets. Thus ones may reposition a drug onto another target.



If you are interested in protein-protein interactions (Bioinformatics section) and the modulation of these interactions with a small compound, you may need to use protein docking methods. You may want to see all the known interactions with your target and thus will need some "network" tools. If you have a 3D structure of your protein-protein complex, you may want to analyze the interface and predict hotspot residues. We have some recent reviews about "in silico approaches and compound design", for instance about protein-protein interaction inhibitors, see Villoutreix et al. Molecular Informatics June 2014).


If your protein has point mutations (experimental or naturally occurring, ...idea of precision medicine), you may want to predict the impact of the amino acid substitutions on folding, function etc... Then again, you need a different set of tools and you can go to the sections Simulations and Mutations...(Bioinformatics section).

You may need to search patent databases, find databases on diseases, find tools to help represent and visualize the data, you may want to find some commercial tools... These will be in the section related tools.


Bioinformatics : Ligand binding pockets

To design a compound, if you know the target and if you have a 3D structure for this target, it can be critical to identify likely binding pockets at the surface of this target (see for example our review Perot et al., DDT 2010), to score these pockets, to compare pockets from you target with databases of pockets... Many in silico tools have been developed (over 100) and below, I just provide some examples (NB: always think about checking the quality of the protein structure first (eg, with the VHELIBS application or online with MotiveValidator)).

Many computational methods are available to find pockets on protein surface. Two main types of tools were developed: those based on evolutionary algorithms and those that use structure-based algorithms. This last category can be subdivided in geometry- and energy-based algorithms (probe mapping methods and docking of fragments and compounds)...These approaches can find pockets and some also try to compute a druggability score (here meaning able to bind a low molecular weight drug-like compound). These tools work in general on static protein structures others that try to take into account flexibility... Some tools are more appropriate for the detection of binding pockets at protein-protein interfaces...where the pockets are usually different from pockets seen in enzyme, GPCRs and ion channels. Some tools are used to compare binding pockets, like for instance for compound repositioning


Bioinformatics: Protein-protein and protein-membrane interactions

Your target can be DNA, RNA, or a protein... it can also be a mechanism, for instance, you may want to modulate protein-protein interactions or a transient protein-membrane interaction, to do these, again many approaches have been developed starting around 1990, you may need docking tool, tools to find hotspots etc...

In many cases, one needs to predict interface residues. Several approaches can be used, those investigating specific features of protein sequences and/or structures, they look at amino acid composition, physico-chemical properties and can use machine learning strategies. For example, some tools use evolutionary information to try to predict interface residues that tend to be more conserved that other residues on the rest of the protein surface. Amino acid features and evolutionary information can then be combined to analyze amino acid sequences and perform some predictions. Of course, predictions based only on sequences are limited and it is important when possible to add 3D information. Thus it is possible to map sequence evolution onto the molecular surface. Analysis of the surface in term of hydrophobicity, desolvation energy can also be used. These approaches are said to belong to the mapping approaches, for instance ODA (physics based)... but some others are based on descriptors and machine learning...Because these types of prediction are difficult, meta-predictors have been developed and tend to give better results over individual methods (e.g., meta-PPISP).

More recently, template-based methods have been presented. Because interfaces tend to be conserved in homologous complexes, such data can help to make predictions (eg, HomPPI, T-PIP..). Predictions can be done using structural-neighbors, as proteins sharing a similar fold with the query protein, even if not evolutionarily related, can offer similar predictive information to that of homologues. Some other approaches are often called partner-specific methods (e.g., PAIRpred) that are sometimes subdivided into intrinsic-based methods (mentioned above, that is 3D-classifier predictors use 3D structural features possibly combined with sequence features but this time the set of features that is being computed for training and testing is complemented by partner-specific features), docking-based methods and coevolution-based predictors (The co-evolution principle suggests that mutations on one protein in a complex are often compensated for by correlated mutations within the same chain or on a binding partner. Such correlated mutations are assumed to maintain the stability of the protein or protein–protein complex).
Specific methods have been developed for antibody-antigen interactions, in this case one can find paratope (eg, proABC, Antibody i-Patch) predictions and epitope prediction (linear and conformational predictors) methods (eg, DiscoTope, ElliPro, PEPITO, SEPPA, EPITOPIA...) but above mentioned tools can also be applied. See the Protein-Protein and Antibody-Peptide sections.

If the goal is to modulate protein-protein interactions with small compounds, one usually need to combine many approaches, prediction of hot spot, prediction of 3D complexes (with or without restraints, with different scoring functions..), prediction of druggable pockets, analysis of flexibility with simulations tools, analysis of sequence variations in patients or in related protein family..., one needs to design compound collections, peptides, use biophysical approaches, combine with mutagenesis..., search in numerous databases..., look at interaction networks, disease databases, run text mining and patent search...etc

Protein-protein
For additional information, you can for example check our recent review introducing several aspects of PPI and in silico approaches:

Drug-Like Protein-Protein Interaction Modulators: Challenges and Opportunities for Drug Discovery and Chemical Biology (review). Villoutreix BO, Kuenemann MA, Poyet J-L, Bruzzoni-Giovanelli H, Labbe C, Lagorce D, Sperandio O, Miteva MA. Molecular Informatics 2014; 6-7: 414-437. (open)

 

Information about protein-protein interaction - databases:
Validated protein interactions, curated databases: BIND, BioGRID, DIP and MINT
In the absence of fully validated experimental data or predicted protein–protein interaction: PRISM, OPHID and 3D-partner
For a specific organism, for example human, HPRD, HPID and MIPS
If binding site information is needed, PSIbase and DOMINE

Protein-protein interface databases: PIBASE, SCOPPI, DOCKGROUND, 3DID, PiSITE, PIFACE...
Physical and chemical properties of the interface: PIC, ProFace, Protherm...
Hot spot databases and prediction servers: ASEdb (experimental), BID (experimental), FoldX, Robetta, ISIS, HotSprint...

PPI mutations: SNIP-IN and BeAtMuSiC (both need the 3D structure of the complex)...

Protein-membrane
We were among the first to propose the modulation of transient interactions between a protein and the cell membrane with a small molecule. The work was for instance reported with application on blood coagulation cofactors by Segers et al., PNAS 2007. This molecular mechanism is still essentially unexplored in 2015 for therapeutic intervention.

A tool to predict this potential membrane binding regions is Membrane Optimal Docking Area (MODA, see Kufareva I et al., 2014). Such information can help to find small molecules acting on such mechanism.


Bioinformatics: molecular modeling, mutations...

There are obviously many tools in these sections because many research groups started to work this field since at least the 90s. I just give the names of some names below essentially for proteins but there are also packages to try to predict the structure of RNA, DNA, or sugar molecules... When you have a 3D structure, if the quality is acceptable, then you can try to find binding pockets, run protein-protein docking engines, look at mutations (notion of biostructural pathology)....run simulations, like electrostatics or molecular dynamics..., dock a peptide... perform virtual screening...

Databases of models: MODBASE, Protein Model Portal, SWISS-MODEL Repository...

Tools to complete missing loops in a PDB file: one module of PDB_hydro, ...
Template selection: Phyre2, HHpred, PSIPRED (pGenThreader...)...
Alignment tools for sequences: CLUSTALW, MUSCLE, T-Coffee...
Homology modeling online: RaptorX, 3DJIGSAW, CPHModel, ESyPred3D, GeneSilico, Geno3D, HHpred, LOMETS (Meta-server combining 9 different programs), MODELLER (ModWeb: A Server for Protein Structure Modeling), Phyre and Phyre2, Protinfo, ROBETTA, BHAGEERATH-H, SWISS-MODEL, TIP-STRUCTFAST, WHAT IF...FALCON@home

Threading online: RaptorX, 3D-PSSM (now Phyre2), HHpred, I-TASSER, LOOPP, mGenTHREADER/GenTHREADER, MUSTER, Phyre and Phyre2, SPARKSx/SP series...

Ab initio structure prediction: EVfold, QUARK, I-TASSER, ROBETTA, Bhageerath, PEP-FOLD...

Secondary structure prediction: RaptorX-SS8, NetSurfP, Jpred, Meta-PP, PREDATOR, PredictProtein, PSIPRED, SymPred, YASSPP, PSSpred...

Transmembrane helix prediction: TMHMM, Phobius, PHDhtm, MEMSAT, HMMTOP...

Model Evaluation: DFIRE, COLORADO-3D (ANOLEA, PROSA, PROVE, VERIFY3D), FRST, HARMONY, ModFOLD, MolProbity, PROCHECK, ProQ, QMEAN...SAVES, model 3D structure optimization (ModRefiner)...

Macromolecule simulation with NMA: iMODS, DFprot, NOMAD-ref, MolMovDB, HingeProt, PATH-ENM, iENM, NMSim, KOSMOS, FlexServ, ElNemo, AD-ENM...

Mutations (or variations) - Personalized medicine - Precision medicine
Based on genome sequencing of individuals, it is estimated that each person’s proteome contains roughly 10 000–11 000 mutations compared to a reference proteome. A subset of these mutations has severe functional consequences, however, for the great majority, it is difficult to predict a priori what their effect will be on the resultant protein’s structure and or function.There are different types of mutations. Single nucleotide polymorphisms (SNPs) fall either within non-coding or coding regions of the DNA molecules. Synonymous mutations do not change the encoded protein sequence while non-synonymous SNPs (the most common disease-promoting mutations) produce either polypeptide sequences that have an amino acid substitution (missense mutations) or are truncated (nonsense mutations, this is a less common event as compared to amino acid change). Some mutations exert their effects via changes to the mRNA that can lead to altered mRNA splicing, folding or stability. Some mutations impact the responsiveness of patients to certain drug treatments (concept of pharmacogenomics).

With the decreasing cost of sequencing technologies, the study of the human genome on a large scale is now possible. Rapid advances in this field of research foreshadows the use of whole-genome or whole exome sequencing towards the goal of personalized medicine also called precision medicine. This can be defined by therapy decisions tailored to individual patients (or small groups), such as to improve therapeutic efficiencies and minimize side effects.
It is important to note that synonymous mutations could cause human diseases, thus, synonymous mutations cannot be ignored in Genome Wide Association Studies. Also, single nucleotide polymorphisms represent the most common source of genetic variation in the human population; they often determine which patients are most likely to respond to or suffer adverse consequences from specific medical treatments. The key to how a synonymous mutation can affect proteins most likely lies in RNA molecules. In addition to genomic studies, epigenetics investigations are also of major importance.

Here, in silico approaches can help gaining understanding over sequence-structure-function and the disease state. For instance, only to introduce this huge field of research, in silico approaches can help to discriminate between deleterious nsSNPs leading to a protein disorder and neutral polymorphisms. There are numerous tools, one way to cluster them is to consider methods that make use of machine learning approaches and methods that attempt to compute a delta-delta G, using the 3D structure of the protein and or rule-based. Then you have meta-tools that combine approaches. As always, some methods could fit in several groups. A possible way to list the methods that attempt to predict deleterious variants is suggested here:
Approaches using machine learning of some kinds:
CUPSAT, I-Mutant2.0, LS-SNP (Large-scale annotation of coding nsSNPs), Parepro, PhD-SNP, PON-P2, SNPs&Go, SNPs3D, MutPred, nsSNPAnalyzer, PMUT, SNAP, Polymorphism Phenotyping (PolyPhen-2), MutationTaster, AUTO-MUTE....

Approaches that can be considered as rule-based:
Sorting Intolerant From Tolerant (SIFT), Align-GVGD, D-Mutant, DS-SCORE, FASTSNP, FoldX, GERP++, Gumby, LogR E-value, MAPP, Mutation-Assessor, PANTHER, PhastCONS, PolyPhen-1, PopMuSic, SCONE, Skippy, SNPeffect...

Meta-tools:
F-SNP, pfSNP, SNP functional portal, SNPit, Vista, CONDEL, META-SNP, PolyDOMS, Pro-Maya, .. (PON-P is no longer running, it has been replaced by PON-P2, which however is not a meta-predictor)...

Some Databases that do not list the same type of data, there are overlaps ...but not easy to compare:
dbSNP, HGMD (human gene mutation database), OMIM, ClinVar, UniProt/Swiss-Prot, 1000 genomes.... some are specific to cancer, COSMIC, TCGA.... Some others: NewHumanVar, ProNit, VariBench, MutDB, SNPedia, StSNP, ProTherm, PicSNP, HOPE....

In addition to these tools, if the protein is known in 3D, you can also run your own simulations and perform structural analysis. Just check the Simulation section to see if you find a tool that can help you.

See for instance the table below published in Current Protein and Peptide Science, 2002, 3, 341-364: Title: Structural Bioinformatics: Methods, Concepts and Applications to Blood Coagulation Proteins by Villoutreix BO. Section Biostructural Pathology and Conformational Diseases: Some Rules for Assigning the Effects of Missense Mutations on Molecular Functions, Folding and Stability

It should be mentioned also that in general "mutation" data have not been fully explored to improve the effectiveness and efficiency of drug discovery. Genetic, epigenetic and environmental factors define pathophysiological states. For complex diseases, the one gene one drug paradigm may not be the best approach.

Additional notes: Mutation versus variation
There are several recommendations to systematically use the term variation for the products of the mutation process, for see the Human Genome Variation Society (HGVS) nomenclature
The VariOtator tool (http://variationontology.org/VariOtator.php) provides VariO variation-type annotations automatically from the sequence and variation details

See Resources for a unified genetic nomenclature (by Dr Vihinen, Trends in Genetics, 2015)
Gene Ontology http://geneontology.org/
HGNC http://www.genenames.org/
HGVS http://www.hgvs.org/
HVP http://www.humanvariomeproject.org/
Global Alliance for Genomics and Health (GA4GH) http://ga4gh.org/
LRG (Locus Reference Genomic sequences) http://www.lrg-sequence.org/
VariO http://variationontology.org/
Sequence Ontology (SO) http://www.sequenceontology.org/
VarioML data exchange format http://www.varioml.org/
The Phenotype and Genotype Object Model (PAGE-OM) http://www.omg.org/spec/PAGE-OM/

Peptides (binding sites and/or folding and/or docking)
Peptides can be used as chemical probes, they are usually easy to synthesize and are thus very often used in biology labs. In fact, during many years, they were used in priority as small chemical compounds were difficult to obtain in academic labs. Time is changing, but, yet, in academia, it is almost a "cultural thing": I have a target, I need to act on it, I try a peptide and I patent it if it does something without really looking further and ask questions like, for this protein or for this disease, do I need a peptide, a therapeutic protein, a mAb, a small compound...

There are many debates about peptides as drug, pros and cons, many claims such as peptides have greater chances in clinical trials or PK/PD is not an issue, etc, cost is not an issue... In my opinion, it is important to really read several reviews, listing strengths and weaknesses, and decide for yourself if a peptide is appropriate for your target and for the type of disease.

 


Chemoinformatics: Virtual screening, scoring, hit2lead, repositioning

Virtual screening can be done using different types of information from known small bioactive ligands, in this case it is usually called ligand-based virtual screening, or else use information about the 3D structure of the target, often a protein, in this case it is called structure-based virtual screening. Note that for ligand-based approach, structural information can also be used. In some situations, it is possible to combine these two main screening strategies. Also, one should not forget about fragment screening and de novo drug design (de novo approaches usually exploit information from the 3D structure of the receptor to build a compound inside a binding pocket while screening takes directly molecules in a collection and see if they fit in the pocket, both approaches can be combined, for instance during the optimization phase). See below, the section Comments about virtual screening. For structure-based screening, in general the scoring is performed with empirical scoring functions, force field based scoring functions or knowledge-based scoring functions. As scoring is a weak point in screening, consensus approaches can be used, or scoring functions tune to a target or a target family. Because flexibility is also a difficult point, several strategies have been proposed, the use of soft scoring, ensemble-based approaches, induced fit approaches (difficult for screening) and molecular simulation based on force fields and in general depicting intra and intermolecular interactions with or without water molecules. Ligand-based approaches often involve QSAR and pharmacophore modeling using known bioactive ligands (but pharmacophore models can also be developed using information from the binding pocket). In general to build QSAR models to design drugs, it is important to have a bioactive compound set that encompasses a wide range of affinity (eg, 4 orders of magnitude) and to have a minimum number of about 20 compounds, molecular descriptors should be selected with great care...

 

STRUCTURE-BASED OR DOCKING:
Docking fragments: SEED, GANDI, CrystalDock, LigBuilder, AutoGrow, Surflex...

Docking molecules: Surflex, Autodock, DOCK, MS-DOCK, PLANTS... GalaxyDock, LigandRNA, DOCK6 (with the Hungarian Algorithm to Account for Ligand Symmetry and Similarity in Structure-Based Design), GlamDock, GEMDOCK, FITTED, ParaDockS, iGEMDOCK (graphics), HomDock, Autodock Vina, VinaMPI, FlipDock, Rosetta Ligand, MS-Dock (runs with DOCK), BetaDock, HADDOCK (can be used for ligand), eSimDock, SABRE, SimG, RDock, Autogrow, LigBuilder, FINDSITELHM, BetaDock, Q-dock, MotifScore, BSP-SLIM, PyRx (inteface Autodock), PubDock, CODCK, Sanjeevini, DAPLDS, FTFlex, ParDOCK, BAPPL, Bapplz (docking with Zinc atom), PSI-DOCK, Simdock, Arguslab, VSDMIP, CRDOCK, Molegro Docker, MOE, ICM, Accelrys, Schrodinger, LigandScout, eHits, Sybyl, RosettaBackrub, WaterDock, GOMoDo...FlexAID

Docking online (some can dock only 1 compound): iScreen (docks the Traditional Chinese Medicine (TCM@Taiwan) with PLANTS)), iStar (with iDock), DOCKBlaster (with DOCK), FORECASTER (docks with FITTED), e-LEA3D (with PLANTS), SwissDock (with EADock), DINC (for larger ligands), DockingServer, 1-Click Docking, Docking At UTMB (with Autodock Vina), ParDOCK, FlexPepDock, PatchDock, BSP-SLIM, BioDrugScreen, GPCRautomodel, SimDock, kinDOCK, idTarget, Pose & Rank (Sali's lab), PLATINUM, CovalentDock Cloud, SLITHER, Lidaeus, Docking@Home, DockThor... DrugDiscovery@TACC, MTiAutoDock@mobyle2, MTiOpenScreen@mobyle2 ...

Help for optimization (other docking tools can help of course): Drugster, VAMMPIRE, VAMMPIRE-LORD, DST, SimG, AuPosSOM, Autogrow, LigBuilder, COPICAT, Dockres, Chimera, Molegro Viewer, MOE, ICM, Accelrys, Schrodinger, LigandScout, eHits, ViewDock, AMMOS, HBAT, LIGPLOT, PoseView, iView, PyPLIF, Pose and Rank, CaPTURE, DiSCuS, SwissBioisostere, SARANEA, Pubchem similarity...

Scoring (see also tools for optimization and docking): Xscore, Score, Ligscore, MeDock, SCORE3, DSX (previously DrugScore), GFScore, PEARLS, FOLDX, CLIBE, HINT, HBAT, PLOP, MM-ISMSA, RF-Score, LPCCSU, CHpredict...

LIGAND-BASED:
Ligand-based screening online:
Some tools: PubChem 3D, iDrug, USR, ChemMapper, MCSS, ViCi, pepMMsMIMIC, PhAST, e-Drug3D, LigCSRre, PharmaGist, ZINCPharmer, UFSRAT

Computational approaches for drug repositioning are in general based on similarity between drugs, between proteins or side effect phenotypes. Reverse or inverse docking can also be used. Searches in databases and literature mining methods are thus very important here also.

 


Chemoinformatics: ADME-Tox predictions - Personalized medicine

ADME-Tox
There are many tools, some are more appropriate for toxicity predictions, some are for the preparation of a compound collection or for the selection of molecules, etc.

Off-target predictions (and in this case it will in general be anti-targets) can be investigated in silico with :

  • Binding site similarity approaches: similar binding sites may bind similar ligands
  • Reverse docking
  • Ligand Based approaches
  • Ligand−receptor pharmacophoric descriptors
  • Chemogenomic approaches (eg., derive information from the simultaneous biological evaluation of multiple compounds on multiple targets) ...
  • Patients data can be taken into account..

Comments about virtual ligand screening and drug design

(BO Villoutreix, PhD, Research Director Inserm, Feb 10 2014)
The process of drug discovery and development is challenging, time consuming, expensive, and requires consideration of many aspects [some numbers: 7–15 years and $1.2 billion dollars to bring a new molecule to the market, 5 out of about 50,000 compounds tested in animals (usually resulting from the experimental screening of millions of molecules, the cost associated with the experimental screening of 1 million compounds is around $500,000) reach clinical trials and only 1 of 5 compounds reaching clinical studies is approved, some people mention that in general 90% of the compounds entering clinical trials will fail].

One first is to find hits. There are many ways to do this as seen in the figure below:

 

 

But finding binders is not enough ! One needs to optimize the compounds (hit2lead, lead optimization..) and here again in silico approaches can help, eg., using for instance multiparameter optimization tools.



Drug discovery requires a multidisciplinary approach. Indeed, several concepts, disciplines, skills and techniques (e.g., NMR, Xray, bioinformatics, chemoinformatics, medicinal chemistry, toxicology, medical sciences, biology, genetics…) have to work together to succeed as no technology/science alone is likely to make it. Drugs can be small chemical compounds, proteins, peptides, vaccines ... but here, we discuss only small chemical compounds and how in silico approaches can help the process.

To find interesting candidates, one possible way is to find substrates of the target (e.g., enzyme) and to make them drug-like. Another way is to start with someone else's hit and thus mine patent databases to find new ideas. These starting points can then be modified by combining for instance medicinal chemistry and chemoinformatics strategies. Around the 1990's (and still today), a very common approach to find hits is to use high-throughput screening (HTS). In this case, usually, one assumes (there are of course experimental evidences that link a target to a disease but often we fully know the value of a target at the end of the process, it can be too late...) that a target is critical to a disease condition and then, the target is screened experimentally using thousands of compounds (with robots) (some difficulties, cost, and time as it easily requires over a year to screen 200,000 cmps, it does take a lot of time even with robots not mentioning the time to analyze the data....the hit rate can be low ~0.2% or even lower for challenging targets such as modulation of protein-protein interactions) and it is only possible to explore a very small part of the chemical space, eg, screen 1 million compounds while the number of small drug like molecules is basically infinite, only in silico approaches can be used to really explore this almost infinite chemspace).

It is possible to use in silico screening instead of HTS (or with HTS) and then generate a small list of molecules (eg., 300 molecules) for experimental assays. Virtual screening can also be used to search for latent hits missed in a HTS project. Of course, combining experimental screening and virtual screening seem to be the best solution, but to save time and money, often only virtual screening is used, at least in many academic groups and in small companies. In silico tools are not perfect, scoring, docking, failures due to the use of a wrong training sets, difficulties with flexibility....are well-known but solutions to improve these situations are difficult to find. Yet, in many cases, the results obtained after in silico screening are interesting and the hit rate with in silico approaches is generally better than with HTS (from 1% to 10% or more). Of course many parameters will play a role here, like the preparation of the compound collection (filtering, ADMETox prediction, use of focused libraries, libraries to explore new challenging targets, like collections prepared to modulate protein-protein interactions, see for instance our website CDithem for additional information...). And again, finding hits is one thing, finding a drug is something else.

After in vitro, or phenotypic screening, or in silico screening, hits have to be optimized and here again, many in silico approaches can be used to assist the process. Most of the approaches mentioned above can be used for chemical biology endeavors. Finding binders is of course very different from finding a drug, yet, one important step is to find high quality starting compounds. In the paragraphs below, I ll give some examples illustrating the use of in silico drug design and screening. Obviously, you can find many others in Pubmed or on the WWW.

Example 1: AIDS, compound docking, receptor flexibility, new binding pocket and the generation of ideas to design Isentress
A binding pocket for a new class of drugs to treat AIDS was discovered using docking while considering the flexibility of the receptor through molecular dynamics. McCammon and his colleagues used AutoDock in conjunction with the Relaxed Complex Method to discover novel modes of inhibition of HIV integrase (J Med Chem, 2004). Researchers at the Merck Pharmaceutical Company then used these data to design the orally available raltegravir (HIV integrase inhibitor, brand name Isentress, approved by the Food and Drug Administration approved in 2007 while it received in 2011 approval for pediatric use)

Example 2: Virtual screening versus experimental screening, in silico investigations proposed interesting compounds at a reduced cost
An interesting example which can serve as a proof of principle on the benefit of using in silico approach involves a type I TGF‐beta receptor kinase inhibitor. The same molecule (HTS‐466284), a 27 nM inhibitor, was discovered independently using virtual screening by Biogen IDEC (J. Singh et al., Bioorg. Med. Chem. Lett. 13, 2003, p4355) and traditional enzyme and cell‐based high‐throughput screening by Eli Lilly (J.S. Sawyer et al., J. Med. Chem. 46, 2003, p3953). The in silico work involved pharmacophore‐screening of 200,000 compounds and used as a starting point the knowledge of hit compounds published several years before. The compound discovered experimentally at Lilly required in vitro screening of a large library of compounds to find potential inhibitors in a TGF‐β‐dependent cell‐based assay and chemical synthesis

Example 3: antianxiety, antidepression, 5HT1A agonist using several in silico strategies
An in silico modeling drug development program (homology modeling, virtual screening with DOCK, hit to lead optimization and in silico profiling) led to clinical trials of a novel, potent, and selective antianxiety, antidepression 5‐HT1A agonist in less than 2 years from the start and requiring less than 6 months of lead optimization and synthesis of only 31 compounds (O.M. Becker et al., J. Med. Chem. 49, 2006, p3116)

Example 4: ADMETox, drug design, cost and ethics
Applying QSAR algorithms to toxicity data and corresponding chemical structures led to the development of in silico tools that predict toxicity response (mutagenicity, carcinogenicity) and toxicity dosing (no observed effect level, NOEL; maximum recommended starting dose, MRSD). For example, carcinogenicity QSAR model using 53 descriptors and data from a 2‐year rodent study stored in a FDA database exhibited 76% sensitivity and 84% specificity (Contrera et al, QSAR modeling of carcinogenic risk using discriminant analysis and topological molecular descriptors, Curr. Drug Discov. Technol. 2, 2005, 55–67, see also Regul. Toxicol. Pharmacol. 40, 2004, 185–206). Rodent carcinogenicity studies are required for the marketing of most chronically administered drugs. These studies are the most costly and time consuming nonclinical regulatory testing requirement in the development of a drug. The cost is approximately $2 millions for a study on rats and mice, requiring 2 years of treatment, and at least an additional 1‐2 years for histo‐pathological analysis and report writing. Thus, computational or predictive toxicology computations have potential regulatory and drug development applications that can ultimately benefit the public health as well as reduce the use of animals in the assessment of safety

Example 5: compound optimization, myocardial infarction and ligandbased screening
In a recent review, Clark (Expert Opinion on Drug Discovery (2008) 8: 841‐851) commented on Aggrastat (Tirofiban). This molecule, from Merck, a GP IIb/IIIa antagonist (myocardial infarction, it is an anticoagulant and platelet aggregation inhibitor, protein-protein interaction inhibitor) results from a lead compound that was further optimized using ligand-based pharmacophore screening and medicinal chemistry. This compound appears to modulate a protein‐protein interaction (between Integrin glycoprotein Alpha IIb and Beta III and Fibrinogen receptors on platelets). It is among the first drug whose origins can be traced back to in silico designed. (See Hartzman et al. (1992). "Non-Peptide Fibrinogen Receptor Antagonists. Discovery and Design of Exosite Inhibitors". J Med Chem 35: p4640)

Example 6: 1,2,4-Oxadiazoles identified by virtual screening and their non-covalent inhibition of the human 20S proteasome
Although several constitutive proteasome inhibitors have been reported these recent years, potent organic, noncovalent and readily available inhibitors are still poorly documented. Two studies have been performed by two different groups, one using experimental HTS screening, the other virtual screening. Ozcan et al. screened 50000 molecules coming from Chembridge while Marechal et al., screened in silico 400000 molecules from the same vendor and tested experimentally some molecules to end up, like in the case of the HTS work on oxadiazole noncovalent proteasome inhibitors (see figure to the left). The cellular effects of these compounds validate their utility as potential pharmacological agents for anti-cancer pre-clinical studies

Example 7: Relenza, combining Xray studies with computer modeling and medicinal chemistry
Zanamivir is a neuraminidase inhibitor (transition‐state analogue inhibitor) used in the treatment and prophylaxis of influenza caused by influenza A virus and influenza B virus. Zanamivir was the first neuraminidase inhibitor commercially developed, the initial steps were indeed performed by a small company and in a university in Melbourne. It is currently marketed by GlaxoSmithKline under the trade name Relenza as a powder for oral inhalation. The strategy relied on the availability of the Xray structure of influenza neuraminidase while computational chemistry techniques were also used. The active site was investigated in silico and suggestions were made to optimize the initial hits up to the design of Zanamivir

Example 8: A recent 2012 report from GlaxoSmithKline about the contributions of in silico drug design

  • Direct contributions to the discovery and design of 2 molecules that reached positive proof-of-concept clinical decisions
  • Direct contributions to 8 candidate and pre-candidate decisions
  • 37 contributions resulting in new hit/lead series
  • 18 examples of significant contributions to lead optimization
  • More than 70 examples of screening data analysis resulting in program progression
  • Contributions to drug-discovery programs recognized in 25 published manuscripts and 12 issued or published patents
    See Green DV et al., 2012, J Comput Aided Mol Des (2012) 26:51–56

Example 9: SYK
We found, in collaboration with the group of Dr. P Dariavach, tyrosine kinase SYK non-enzymatic inhibitors and potential anti-allergic drug-like compounds by using virtual and in vitro screening.


The most likely binding area of these compounds that are inhibiting a protein-protein interaction instead of acting on the kinase site was found using binding pocket prediction and validated by site directed mutagenesis. This region was not known prior to our computational analysis. Some compounds are working on animal models and some molecules are patented. (See Villoutreix et al., PloS One 2011; Mazuc et al., J Allergy Clin Immunol. 2008).

 

Conclusion

In silico methods help the drug discovery process, they can be (or are) combined with biophysical approaches, experimental high throughout screening and biology/chemistry/toxicology/clinical studies; they assist decision making, contribute to reducing the cost, to the generation of new ideas and concepts, they bring solutions to problems, allow to rapidly test new hypothesis and to explore “areas” that could not be assessed experimentally either because the experiments could not be performed, or because they would cost too much or else as they would not be ethical. They for instance allow to investigate new compounds before they are even synthesized. In silico tools help to analyze, mine and rationalize millions of (heterogeneous) data points coming from multiple sources, assist in defining the functions of a molecule, help to understand and to predict polypharmacology, off‐targets, ADMETox properties, they provides supplemental information to resolve conflicting experimental results, reduces the necessity to repeat studies or to perform some experiments, they accelerate clinical trials by supporting the entry of subjects in clinical trials before standard toxicology studies are completed, support risk‐based testing and the reduction of animal testing, and supply additional supporting information for the selection of the first dose in humans to be used in standard phase I clinical trial…
Although promising, in silico methods are not without limitations… and thus have to be continuously developed and challenged and that research and funding in this field are needed, not only for applications but also for methodological developments. Apart from the screening tools, we now see an aggressive development of data mining and pipelining tools to keep pace with the massive amount of data generated by both experimental and computational experiments. Choosing the right strategy is critical and increasing the interaction between experimentalists and computational groups should increase the quality and efficiency of the lead discovery stage and the development of new and safer drugs.

NB: (It is also important to have in mind when trying to improve a process)
A possible way for further improving productivity and drug discovery (this applies to many things) is enhancing decision-making processes along the R&D pipeline. in fact, successful drug development is the culmination of thousands of decisions over a 10-15 year period, involving the combined judgment and experience of many individuals.
Cognitive research has shown that the human decision-making processes are inherently flawed or biased. Although this can be a problem, it is thought that these many biases serve as shortcuts to help the brain process the vast amount of information it receives and thousands of decisions it makes daily. Using these observations definitively suggest that drug discovery can be optimized by acting on these many biases.

Additional comments about cost and the need to combine and integrate in silico and in vitro strategies

In silico strategies usually contributes to understand better a molecular event, they tend to shed new light and contribute new ideas, they allow exploring data that the human mind cannot rationalize. This is very valuable as mentioned above but the impact on cost and time is also important. For example, in a recent talk from P. Ertl, these points were commented: It is possible to predict some ADMETox properties like interaction of a compound with the potassium ion channel protein hERG (potential fatal disorder called long QT syndrome). One experimentally measured value of hERG blockade with the patch‐clamp technique occupies one laboratory assistant for 1 day and consumes many research chemicals. Thus, if the prediction method is accurate, one can easily conceive the impact on time and money. Further, computer models allow making predictions of compounds that are not yet synthesized (for a research chemist it typically takes 1 week to synthesize a compound if everything goes right). But clearly computer tools are not only about cost and number crunching, they allow us to gain new knowledge and can guide experimental design

Note about experimental work

In silico tools are sometimes considered to be inaccurate by some scientists. While this may be true in some cases (such problems are usually alleviated when the tools are used by experts in the field), it is important to note that there is nearly no experimental measurement without error. Even for a simple log P value, different scientists working in different laboratories will measure different values. So wisdom and humility are needed on both sides, at the bench and behind the computer.

Some codes for machine learning

Random forests
Random Forests http://www.stat.berkeley.edu/~breiman/RandomForests/
randomForest R package http://cran.r-project.org/web/packages/randomForest/index.html
FastRandomForest https://code.google.com/p/fast-random-forest/

KNN
kNN classifier http://www.fit.vutbr.cz/~bartik/Arcbc/kNN.htm
k Nearest Neighbor demo http://www.cs.cmu.edu/~zhuxj/courseproject/knndemo/KNN.html
GPU-FS-kNN http://sourceforge.net/projects/gpufsknn/
GA/KNN http://www.niehs.nih.gov/research/resources/software/biostatistics/gaknn/
Dense K Nearest Neighbor http://www.autonlab.org/autonweb/10522.html

SVM
mySVM http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM/index.html
e1071 R package http://cran.r-project.org/web/packages/e1071/index.html
BSVM http://www.csie.ntu.edu.tw/~cjlin/bsvm/
LS-SVMlab http://www.esat.kuleuven.be/sista/lssvmlab/
LIBSVM http://www.csie.ntu.edu.tw/~cjlin/libsvm/
SVM light http://svmlight.joachims.org/
M-SVM http://www.loria.fr/~guermeur/

Neural network
NuClass http://www.uta.edu/faculty/manry/new_software.html
sciengyrpf http://sourceforge.net/projects/sciengyrpf/
Sharky Neural Network http://sharktime.com/us_SharkyNeuralNetwork.html
BrainMaker http://www.calsci.com/
fann http://leenissen.dk/fann/

Decision tree
Simple Decision Tree https://sites.google.com/site/simpledecisiontree/
OC1 http://www.cbcb.umd.edu/~salzberg/announce-oc1.html
SMILES http://users.dsic.upv.es/~flip/smiles/
PC4.5 http://www.cs.nyu.edu/~binli/pc4.5/
YaDT http://www.di.unipi.it/~ruggieri/software.html
C4.5 and C5.0 http://www.rulequest.com/Personal/

Further reading: some recent reviews in the field of in silico screening

  • Virtual screening ‐ what does it give us?. Koppen (Boehringer, Germany). Current Opinion Drug Discovery & Dev (2009) 12: 397‐407
  • From virtuality to reality. Rester (Bayer, Germany). Current Opinion Drug Discovery & Dev (2008) 11: 559‐568
  • Caldwell GW. In silico tools used for compound selection during target-based drug discovery and development. Expert Opin Drug Discov. 2015 May 8:1-23. (Janssen Research, USA)
  • What has virtual screening ever done for drug discovery? Clark (Argenta Discovery Ltd, UK). Expert Opinion on Drug Discovery (2008) 8: 841‐851
  • Docking and chemoinformatic screens for new ligands and targets. Kolb et al., Current Opin Biotech (2009) 20:1‐8
  • High‐throughput and in silico screenings in drug discovery. Phatak et al., Expert Opin Drug Discov (2009) 4: 947‐959
  • Structure‐based virtual ligand screening: recent success stories. Villoutreix et al., Comb Chem High Throughput Screen (2009) 12:1000‐16
  • Successful Applications of Computer Aided Drug Discovery: Moving Drugs from Concept to the Clinic. Talele et al, Curr Topics in Med Chemistry (2010) 10:127‐141
  • Computer‐aided drug discovery and development (CADDD): In silico‐chemico‐biological approach. Kapetanovic. ChemicoBiological Interactions 171 (2008) 165–176
  • Impact of high‐throughput screening in biomedical research. Macarron et al. Nat Rev Drug Discov. (2011)10:188‐95
  • Streamlining lead discovery by aligning in silico and high‐throughput screening. Davies et al. Curr Opin Chem Biol. (2006) 10:343‐51
  • Docking‐based virtual screening: recent developments. Tuccinardi. Comb Chem High Throughput Screen. (2009), 12:303‐14
  • Established and emerging trends in computational drug discovery in the structural genomics era. Taboureau et al., Chem Biol. (2012) 19:29‐41
  • Toward in silico structure‐based ADMET prediction in drug discovery. Moroy et al., Drug Discov Today. (2012) 17:44‐55
  • Druggable pockets and binding site centric chemical space: a paradigm shift in drug discovery. Perot et al., Drug Discov Today. (2010) 15:656‐67
  • Rationalizing the chemical space of protein‐protein interaction inhibitors. Sperandio et al. Drug Discov Today. (2010) 15:220‐9
  • Computational Drug Design Targeting Protein‐Protein Interactions. Bienstock, Curr Pharm Des. (2012) in press
  • 1,2,4-Oxadiazoles identified by virtual screening and their non-covalent inhibition of the human 20S proteasome. Maréchal X, Genin E, Qin L, Sperandio O, Montes M, Basse N, Richy N, Miteva MA, Reboud-Ravaux M, Vidal J, Villoutreix BO. Curr Med Chem. 2013;20(18):2351-62
  • Oxadiazole-isopropylamides as potent and noncovalent proteasome inhibitors. Ozcan S, Kazi A, Marsilio F, Fang B, Guida WC, Koomen J, Lawrence HR, Sebti SM. J Med Chem. 2013;56:3783-805
  • Last updated on .

Email

bruno.villoutreix(at)gmail.com

Address

Follow me


© Bruno Villoutreix. A first version of this Website was launched in 2006. Thank to Natacha Oliveira