VLS3D.com
Jan 23, 2007, updated March 2, 2008


Validation set 2007
Work by O. Sperandio, M. Miteva, M. Montes, D. Lagorce and B. Villoutreix

There are many ways to generate a validation set. The task is complex. We follow the Mol2 atom type from Tripos.
For some molecules, different types can be assigned, we however use the same protocol for all molecules, thus if
an error is present, it will on all molecules (for instance, we assigned COOH groups as COOH in one bank and in the other bank,
the so-called "charged bank", all COOH are COO-). Users should check some structures to see if appropriate
with their projects.









The Diversity set is just below then you ll find the targets and actives

The Diversity set:


To test our docking packages, we suggest to use the Diversity set from Chembridge January 2007.
In this bank, we have 50080 molecules, it is indeed the version from Apr 2006. No changes since then.


We do not run ADME/tox for this collection since it is to test docking packages and because the Lipinski rules
seem to be already there. One big reasons why compounds are rejected after ADME-tox filtering is that some
chemical groups are not desirable for oral drugs, but this should not affect the docking scoring.

Here are the properties computed by ChemBridge on this Diversity bank






First, in order to generate the 3D structure from the 2D SDF file from ChemBridge, we have to shift
the position of the compound ID flag just above ISIS such that Omega from OpenEye keeps the number ID
for each molecule. This can help if we want to look more at the ChemBridge web site some additional info about
the molecules.

David Lagorce wrote a Python script to do that.

Next, Omega version 2_1 (installed January 2007) was used with the following parameter file for the single conf.
For the multi conf, same parameter file but the -maxconfs was set to 50 instead of 1.


Omega2 Param file for 3D generation
#Interface settings
#-pvmconf (Not set, no default)

#File Options :  
    -commentEnergy  true (default)
    -in test_shiftname.sdf
    -includeInput  false (default)
    #-log (Not set, no default)
    -out test_shiftname3D.mol2
    #-param (Not set, no default)
    -prefix  omega2_1 (default)
    -rotorOffsetCompress  false (default)
    -sdEnergy  false (default)
    #-status (Not set, no default)
    -verbose  true (default)
    -warts  true (default)

#3D Construction Parameters :  
    -buildff  mmff94s_NoEstat (default)
    -canonOrder  true (default)
    -deleteFixHydrogens  true (default)
    #-dielectric (Not set, no default)
    #-exponent (Not set, no default)
    #-fixfile (Not set, no default)
    -fixrms  0.150000 (default)
    -fraglib  /usr/local/programs/openeye_2007.dir/openeye/data/omega2/fraglib.oeb.gz
    -fromCT  true (default)
    -maxmatch  10 (default)
    -umatch  true (default)

#Structure Enumeration :  
    -enumNitrogen  true (default)
    -enumRing  true (default)

#Torsion Driving Parameters :  
    #-erange (Not set, no default)
    -ewindow  25.000000 (default)
    #-maxConfRange (Not set, no default)
    -maxconfgen  30000 (default)
    # below how many conformers do you want
    -maxconfs  1 (default)
    -maxpoolsize  10000 (default)
    -maxrot  -1 (default)
    -maxtime  30.000000 (default)
    -rangeIncrement  5 (default)
    -rms  0.800000 (default)
    #-rmsrange (Not set, no default)
    -searchff  mmff94s_NoEstat (default)
    -tordrive  true (default)
    #-torlib (Not set, no default)

#PVM Parameters :  
    -pvmdebug  false (default)
    -pvmpass  10 (default)



Thus to start the job, i used something like this:
omega -param bruno_omega2_param

The input is thus a 2D SDF, the output, 3D mol2 single or multiconf.

The resulting single conf file in 3D mol2 with chemistry fixed by babel 2_0_2 (installed jan 2007) -b -p (modified by us for amine) option, thus we have COO-...
The 3D mol2 file from Omega single conf has COOH and amine NH2, so here some were fixed, first we remove
the Hydrogen atoms with OPenBabel 2_0_2 -d and add them back with the -b -p option with the pH model file activated.
the pH model file and the package still does not work well, some small problems are noted here and there.
However, it should be reasonable overall for docking/scoring purposes.
The Chembridge diversity set files in 3D single CONF is here:


The direct output of Omega single conf modified by Open Babel option -d and then -b -p (thus, Bank COO-, and NH3+ ..) in mol2 format:

The pH model of babel (i believe 2.0.2, this is very important since in the revised version of Babel there is a problem with this at present...so just ask the version) text file is: (here)
this file has to be located inside the Babel package in the directory : data
There you should also rename the file phmodeldata.h to original_file.h
and then compile Babel








The ligands and the proteins to test docking/scoring are below:



Target 1:
THYMIDINE KINASE
(narrow pocket)
(with all the PDB files of the protein with 10 ligand (the real active molecules) X-ray structures and
the Ligands alone in 3D with the Xray conformation in Mol2 and SDF format
and the 10 ligand in single conformation built de novo and as multi conformer, these ones are in mol2
: get the tar.gz file
(here)



We have 10 PDB Xray files with different ligands in the active sites:
1E2K.pdb
1E2M.pdb
1E2N.pdb
1E2P.pdb
1KI2.pdb
1KI3.pdb
1KI6.pdb
1KI7.pdb
1KIM.pdb
2KI5.pdb

The reference structure used to do the docking is KIM. We have added Hydrogen atoms assuming normal pKa
with InsightII (Accelrys)
the file is
1kim_H_insightII_forDocking.pdb
(some side chains or hydrogen atoms far away from the binding pocket might be missing, but this does not
perturb the docking scoring).
The ligand reference that docks directly into 
1kim_H_insightII_forDocking.pdb is also here, in PDB format,
named ligand_reference_for_1kim.pdb.        

Then we have the 10 ligands inside the same files with different formats:
TK_Xray_ligands_ChemistryOK.mol2  
TK_Xray_ligands_ChemistryOK.sdf
TK_Xray_ligands_multiConf_deNovo.mol2    (these ones have to be merged with a compound collection to test docking and scoring)
TK_Xray_ligands_singleConf_deNovo.mol2
(these ones have to be merged with a compound collection to test docking and scoring, in this case of TK, because of the nature of the ligands, this file can be merged either with the Diversity set COOH or with the one charged that went through Open Babel option -p, thus the bank COO-, NH3+)
WARNING : some ligands may seem missing when you look at them on the computer screen with PyMol or others, indeed
it is because they are not on the same reference frame while they are in the files).

One key file for testing docking scoring is:
TK_Xray_ligands_singleConf_deNovo_option_babel_P.mol2
The ligands here were build de novo as above but went though the same process as above with Open Babel -d to remove Hydrogens and -p to fix protonation state of some groups at normal pH. This option slightly modifies the atom types but things remain fully consistent with the original atom typing coming from analysis of the Xray structure and of the corresponding publications.
Thus this ligand file can be added to the Diversity single conf
Bank COO-, NH3+..., meaning all went through the same process.


The de novo structures were created using Omega. The input was the 3D SDF file with the ligands coming from the Xray
structures but with the atom types investigated manually. Then Omega single conf and multi conf files were generated.
Omega generates de novo 3D structures and does not take care of the input 3D structures with the parameters used.
So, the structures are really de novo and some can be close in 3D to the Xray structures in the multi conf file, but the single conf are usually different than the Xray because Omega selects the lowest energy structure obtained during the run and does not force the generated molecules to look like the input 3D (this option should be available if people need it, in Omega, see the User guide
at : www.eyesopen.com)



NOTE FOR DOCKING
The Xray files of the protein-ligand complexes can be superimposed very well,
the rmsd for the Calpha atoms is around 0.2 A to 0.4 A.
There are no big changes in the aa side chains surrounding the ligands. Only one Gln residue, the Gln125 side chain moves
slightly.

A view of the inhibitors with Marvin view from Chemaxon is below (hydrogen atoms are not shown):









Target 2:
Coagulation factor X (serine protease) All in tar.gz (here)
We have here 9 Xray structures of FX in complex with small molecules

The files are:

1F0R.pdb
1F0S.pdb
1FJS.pdb
1G2L.pdb
1KSN.pdb
1LPG.pdb
1LPK.pdb
1MQ5.pdb
1NFU.pdb

reference file for docking is 1FJS_Hins.pdb (Hydrogen atoms added with insightII)
the ligand reference for 1FJS (warning it is a little bit shifted of 0.2 A compared to the xray, this
should not be a problem to define the binding pocket, but please check, this is because we superimposed
all structures on top of fjs and the fit was not perfect....)


The ligands in mol2 format :

FX-ligands_from_xray.mol2

FX-denovo_singleconf_babelOptionbp.mol2
FX-denovo_multiconf_babelOPtionbp.mol2 (50 confs)

No major structural change in the side chains around the ligands in the above Xray structures
Few titlting of some aromatic side chains are yet noticed.
We suggest that  1FJS can be used for the docking



Figures for the FX inhibitors









Target 3:
The hydrolase Neuraminidase (NA)
(INFLUENZA VIRUS). All the proteins and ligands are in this tar.gz file here
Small active site groove, yet open such that large ligands can still fit and very charged
We have here 10 xray structures of NA in complex with small molecules.
The files are :
1B9S.pdb
1B9T.pdb
1B9V.pdb
1INF.pdb
1INV.pdb
1IVB.pdb
1VCJ.pdb
1a4g.pdb
1b9s_Hins.pdb (this is the reference file for docking with hydrogen atoms added with InsightII)
1f8b.pdb
2qwk.pdb
ligand_ref_for_1b9s.pdb

and the 10 ligands in mol2 format:
NA_Olivier_Xray_babel_optionP.mol2
(this one just in case NA_Olivier_Xray_babel_optionP.sdf to double check)
NA_Olivier_multiConf_deNovoCooMinus.mol2
NA_Olivier_singleConf_deNovo_COOminus.mol2 (thus this one goes with diversity bank single COO-)
NA_multiconf_deNovo_cooh.mol2
NA_singleconf_deNovo_cooh.mol2



There are only minor structural changes in the side chains around the ligands in the above Xray structure.
Thus one protein can be selected for the docking, we use 1b9s.

The 10 ligands are shown below, please note that they show as COOH (for example), this is only due to the drawing
package, in the mol2 files, when it is mentioned that they are COO-, they should be !











Target 4:
CDK (tar.gz file here)
Cyclin Dependent Kinase 2:
Relatively flat pocket
with all the PDB files of the protein with 10 ligands. The ligands with the Xray conformation and in 3D are in Mol2 and SDF format. Atome types and structures were checked on the graphics system. Also present in the set: the denovo conformation (i.e, the 3D structure is generated again from the connectivity and atom typing but the resulting structures differ from the Xray structure, thus they can be used for VLS experiments since we would like to have the ligands and the other molecules present in the collection processed via the same protocol) of the same 10 ligands, generated from Omega2.1 from the Xray structure of the ligands in single conf and multi conf.

We have 10 PDB Xray files with different ligands in the active sites:
1E9H.pdb code ligand INR
1FVT.pdb code ligand 106
1FVV.pdb code ligand 107  --> protein used for docking (reference protein)
1KE6.pdb code ligand LS2
1G5S.pdb code ligand I17
1H1S.pdb code ligand 4SP
1OGU.pdb code ligand ST8
1P2A.pdb code ligand 5BN
1PF8.pdb code ligand SU9
1PXL.pdb code ligand CK4

The reference structure used for docking is 1FVV.
We added Hydrogen atoms assuming normal pKa with InsightII (Accelrys)

the file FVV_insight_H_fordocking.pdb (only the chain A of the protein) can be used.

the ligand reference of 1FVV, 107 is also in PDB format and is named:
          ligand_reference_for_1FVV.pdb

Then we have the 10 ligands inside the same files with different formats:
         
     CDK2.Xray-with-H.sdf
     CDK2.Xray-with-H.mol2  
       --- > generated with babel from CDK2.Xray-with-H.sdf DO NOT USE THIS FILE WITH OMEGA

      because the aromaticity is detected by babel and messes up the protonation on 1P2A, 1H1S and 1PF8

     CDK2.Xray.denovo.singleConf.mol2     ---> used for flexible structure based method
     CDK2.Xray.denovo.multiConf-X50.mol2  ---> used to test either 3d ligand-based method or rigid ligand docking

The denovo structure were created using omega. The input was the Xray 3D structure but with the protonation and atom typing  made by hand.

Also might be of interest if needed to use the following crystal structures superimposed on 1FVV

1E9H_sur_1FVV.pdb      with its ligand       INR_1E9H_sur_1FVV.pdb     both the coordinate frame of 1FVV
1FVT_sur_1FVV.pdb      with its ligand       106_1FVT_sur_1FVV.pdb     both the coordinate frame of 1FVV
1KE6_sur_1FVV.pdb      with its ligand       LS2_1KE6_sur_1FVV.pdb     both the coordinate frame of 1FVV
1G5S_sur_1FVV.pdb      with its ligand       I17_1G5S_sur_1FVV.pdb     both the coordinate frame of 1FVV
1H1S_sur_1FVV.pdb      with its ligand       4SP_1H1S_sur_1FVV.pdb     both the coordinate frame of 1FVV
1OGU_sur_1FVV.pdb      with its ligand       ST8_1OGU_sur_1FVV.pdb     both the coordinate frame of 1FVV
1P2A_sur_1FVV.pdb      with its ligand       5BN_1P2A_sur_1FVV.pdb     both the coordinate frame of 1FVV
1PF8_sur_1FVV.pdb      with its ligand       SU9_1PF8_sur_1FVV.pdb     both the coordinate frame of 1FVV
1PXL_sur_1FVV.pdb      with its ligand    CK4_1PXL_sur_1FVV.pdb     both the coordinate frame of 1FVV

Protein Flexibility:
The residues having the biggest amplitude across the ten Xray structures of CDK2 are
Phe80 at the bottom of the pocket     --> motion within the same plane so no change with regard to potential pi pi stacking
His84 at the entrance of the pocket --> interaction of ligand with the backbone part of His84 so no consequences
Lys33 at the back of the pocket     --> except for 1PXL (two folds for Lys33) the motion of the lysine is acceptable. Presence of two clusters
of position 1st (1OGU, 1H1S, 1E9H, 1FVV) and  2nd (1PXL, 1FVT, 1G5S, 1P2A, 1PF8, and 1KE6).

Figure of the ligands (de novo)







Target 5:
RNAse  (tar.gz file here here)

Ribonuclease A
Large active subdivided in two subsites, one more permissive to adenosine-like (1)
compounds and the other to uridine-like (2) compounds.
Eight PDB file are present the RNAse validation set:

PDB                        ligand  sunsbite
1AFK.pdb    code ligand    PAP     (1)--> reference structure
1AFL.pdb    code ligand    ATR     (1)
1JN4.pdb    code ligand    901     (1) + (2)
1O0F.pdb    code ligand    A3P     (1)
1O0M.pdb    code ligand    U2P     (2)
1O0N.pdb    code ligand    U3P     (2)
1O0O.pdb    code ligand    A2P     (1***)
1QHC.pdb    code ligand    PUA     (1) + (2)

Two of the ligand 901 from 1JN4 and PUA from 1QHC occupy both subsite.
Note: 1O0O.pdb display a major flipping (180 degrees) of His119 which creates a change in the usual folding of adenosine-like
compounds.

Beside this the other side chain don't display major variations.


Concerning the flexibility

1AFK_Hins.pdb (pdb file + Hydrogen added by insightII at pH 7)

the eight ligands taken from the crystal structures and manually protonated and typed are in :
RNAse.Xray.sdf

the denovo version of these ligands either single conf or multi conf X50 are in:
RNAse.Xray.denovo.singleConf.mol2
RNAse.Xray.denovo.multiConf-X50.mol2

Note: the phospohate groups present in certain of these molecules are not charged but one of the 3 oxygen carries a hydrogen.
This would seem ok for the ligand in solution considering the 3 pKas of the group


Figure for this protein








Target 6:
ER (get the file here)