VLS3D.com
Jan 23, 2007, updated March 2, 2008
Validation set 2007
Work by O. Sperandio, M. Miteva, M. Montes, D. Lagorce and B.
Villoutreix
There are many ways to generate a validation set. The task is complex.
We follow the Mol2 atom type from Tripos.
For some molecules, different types can be assigned, we however use the
same protocol for all molecules, thus if
an error is present, it will on all molecules (for instance, we
assigned COOH groups as COOH in one bank and in the other bank,
the so-called "charged bank", all COOH are COO-). Users should check
some structures to see if appropriate
with their projects.

The
Diversity set is just below then you ll find the targets and actives
The Diversity set:
To test our docking packages, we suggest to use the Diversity set from
Chembridge January 2007.
In this bank, we have 50080
molecules, it is indeed the version from Apr 2006. No changes since
then.
We do not run ADME/tox for this collection since it is to test docking
packages and because the Lipinski rules
seem to be already there. One big reasons why compounds are rejected
after ADME-tox filtering is that some
chemical groups are not desirable for oral drugs, but this should not
affect the docking scoring.
Here are the properties computed by ChemBridge on this Diversity bank

First, in order to generate the 3D structure from the 2D SDF file from
ChemBridge, we have to shift
the position of the compound ID flag just above ISIS such that Omega
from OpenEye keeps the number ID
for each molecule. This can help if we want to look more at the
ChemBridge web site some additional info about
the molecules.
David Lagorce wrote a Python script to do that.
Next, Omega version 2_1 (installed January 2007) was used with the
following parameter file for the single conf.
For the multi conf, same parameter file but the -maxconfs was set to 50
instead of 1.
Omega2
Param file for 3D generation
#Interface settings
#-pvmconf (Not set, no default)
#File Options :
-commentEnergy true (default)
-in test_shiftname.sdf
-includeInput false (default)
#-log (Not set, no default)
-out test_shiftname3D.mol2
#-param (Not set, no default)
-prefix omega2_1 (default)
-rotorOffsetCompress false
(default)
-sdEnergy false (default)
#-status (Not set, no default)
-verbose true (default)
-warts true (default)
#3D Construction Parameters :
-buildff mmff94s_NoEstat
(default)
-canonOrder true (default)
-deleteFixHydrogens true
(default)
#-dielectric (Not set, no default)
#-exponent (Not set, no default)
#-fixfile (Not set, no default)
-fixrms 0.150000 (default)
-fraglib
/usr/local/programs/openeye_2007.dir/openeye/data/omega2/fraglib.oeb.gz
-fromCT true (default)
-maxmatch 10 (default)
-umatch true (default)
#Structure Enumeration :
-enumNitrogen true (default)
-enumRing true (default)
#Torsion Driving Parameters :
#-erange (Not set, no default)
-ewindow 25.000000 (default)
#-maxConfRange (Not set, no default)
-maxconfgen 30000 (default)
# below how many conformers do you want
-maxconfs 1 (default)
-maxpoolsize 10000 (default)
-maxrot -1 (default)
-maxtime 30.000000 (default)
-rangeIncrement 5 (default)
-rms 0.800000 (default)
#-rmsrange (Not set, no default)
-searchff mmff94s_NoEstat
(default)
-tordrive true (default)
#-torlib (Not set, no default)
#PVM Parameters :
-pvmdebug false (default)
-pvmpass 10 (default)
Thus to start the job, i used something like this:
omega -param bruno_omega2_param
The input is thus a 2D SDF, the output, 3D mol2 single or multiconf.
The resulting single conf file in 3D mol2 with chemistry fixed by babel
2_0_2 (installed jan 2007) -b -p (modified by us for amine) option,
thus we have COO-...
The 3D mol2 file from Omega single conf has COOH and amine NH2, so here
some were fixed, first we remove
the Hydrogen atoms with OPenBabel 2_0_2 -d and add them back with the
-b -p option with the pH model file activated.
the pH model file and the package still does not work well, some small
problems are noted here and there.
However, it should be reasonable overall for docking/scoring purposes.
The Chembridge diversity set files in 3D single CONF is here:
The direct output of Omega single conf modified by Open Babel option -d
and then -b -p (thus, Bank COO-, and NH3+ ..) in mol2
format:
The pH model of babel (i believe 2.0.2, this is very important since in the revised version of Babel there is a problem with this at present...so just ask the version) text file is: (here)
this file has to be located inside the Babel package in the directory : data
There you should also rename the file phmodeldata.h to original_file.h
and then compile Babel
The
ligands and the proteins to test docking/scoring are below:
Target 1:
THYMIDINE KINASE
(narrow pocket)
(with all the PDB files of the protein with 10 ligand (the real active
molecules) X-ray structures and
the Ligands alone in 3D with the Xray conformation in Mol2 and SDF
format
and the 10 ligand in single conformation built de novo and as multi
conformer, these ones are in mol2
: get the tar.gz file (here)
We have 10 PDB Xray files with different ligands in the active sites:
1E2K.pdb
1E2M.pdb
1E2N.pdb
1E2P.pdb
1KI2.pdb
1KI3.pdb
1KI6.pdb
1KI7.pdb
1KIM.pdb
2KI5.pdb
The reference structure used to do the docking is KIM. We have added
Hydrogen atoms assuming normal pKa
with InsightII (Accelrys)
the file is 1kim_H_insightII_forDocking.pdb
(some side chains or hydrogen atoms far away from the binding pocket
might be missing, but this does not
perturb the docking scoring).
The ligand reference that docks directly into 1kim_H_insightII_forDocking.pdb is also here, in PDB format,
named ligand_reference_for_1kim.pdb.
Then we have the 10 ligands inside the same files with different
formats:
TK_Xray_ligands_ChemistryOK.mol2
TK_Xray_ligands_ChemistryOK.sdf
TK_Xray_ligands_multiConf_deNovo.mol2 (these ones
have to
be merged with a compound collection to test docking and scoring)
TK_Xray_ligands_singleConf_deNovo.mol2 (these
ones have to be merged with a compound collection to test docking and
scoring, in this case of TK, because of the nature of the ligands, this
file can be merged either with the Diversity set COOH or with the one
charged that went through Open Babel option -p, thus the bank COO-,
NH3+)
WARNING : some ligands may seem missing when you look at them on the
computer screen with PyMol or others, indeed
it is because they are not on the same reference frame while they are
in the files).
One key file for
testing docking scoring is:
TK_Xray_ligands_singleConf_deNovo_option_babel_P.mol2
The ligands here were build de novo as above but went though the same
process as above with Open Babel -d to remove Hydrogens and -p to fix
protonation state of some groups at normal pH. This option slightly
modifies the atom types but things remain fully consistent with the
original atom typing coming from analysis of the Xray structure and of
the corresponding publications.
Thus this ligand file can be added to the Diversity single conf Bank COO-, NH3+...,
meaning all went through the same process.
The de novo structures were created using Omega. The input was the 3D
SDF file with the ligands coming from the Xray
structures but with the atom types investigated manually. Then Omega
single conf and multi conf files were generated.
Omega generates de novo 3D structures and does not take care of the
input 3D structures with the parameters used.
So, the structures are really de novo and some can be close in 3D to
the Xray structures in the multi conf file, but the single conf are
usually different than the Xray because Omega selects the lowest energy
structure obtained during the run and does not force the generated
molecules to look like the input 3D (this option should be available if
people need it, in Omega, see the User guide
at : www.eyesopen.com)
NOTE FOR DOCKING
The Xray files of the protein-ligand complexes can be superimposed very
well,
the rmsd for the Calpha atoms is around 0.2 A to 0.4 A.
There are no big changes in the aa side chains surrounding the ligands.
Only one Gln residue, the Gln125 side chain moves
slightly.
A view of the inhibitors with Marvin view from Chemaxon is below
(hydrogen atoms are not shown):

Target 2:
Coagulation factor X
(serine protease) All in tar.gz (here)
We have here 9 Xray structures of FX in complex with small molecules
The files are:
1F0R.pdb
1F0S.pdb
1FJS.pdb
1G2L.pdb
1KSN.pdb
1LPG.pdb
1LPK.pdb
1MQ5.pdb
1NFU.pdb
reference file for docking is 1FJS_Hins.pdb (Hydrogen atoms added with insightII)
the ligand reference for 1FJS (warning it is a little bit shifted of 0.2 A compared to the xray, this
should not be a problem to define the binding pocket, but please check, this is because we superimposed
all structures on top of fjs and the fit was not perfect....)
The ligands in mol2 format :
FX-ligands_from_xray.mol2
FX-denovo_singleconf_babelOptionbp.mol2
FX-denovo_multiconf_babelOPtionbp.mol2 (50 confs)
No major structural change in the side chains around the ligands in the above Xray structures
Few titlting of some aromatic side chains are yet noticed.
We suggest that 1FJS can be used for the docking
Figures for the FX inhibitors

Target 3:
The hydrolase
Neuraminidase (NA)
(INFLUENZA VIRUS). All the proteins and ligands are in this tar.gz file
here
Small active site groove, yet open such that large ligands can still
fit and very charged
We have here 10 xray structures of NA in complex with small molecules.
The files are :
1B9S.pdb
1B9T.pdb
1B9V.pdb
1INF.pdb
1INV.pdb
1IVB.pdb
1VCJ.pdb
1a4g.pdb
1b9s_Hins.pdb (this is the reference file for docking with hydrogen
atoms added with InsightII)
1f8b.pdb
2qwk.pdb
ligand_ref_for_1b9s.pdb
and the 10
ligands in mol2 format:
NA_Olivier_Xray_babel_optionP.mol2
(this one just in case NA_Olivier_Xray_babel_optionP.sdf to double
check)
NA_Olivier_multiConf_deNovoCooMinus.mol2
NA_Olivier_singleConf_deNovo_COOminus.mol2 (thus this one goes with
diversity bank single COO-)
NA_multiconf_deNovo_cooh.mol2
NA_singleconf_deNovo_cooh.mol2
There are only minor structural changes in the side chains around the
ligands in the above Xray structure.
Thus one protein can be selected for the docking, we use 1b9s.
The 10 ligands are shown below, please note that they show as COOH (for
example), this is only due to the drawing
package, in the mol2 files, when it is mentioned that they are COO-,
they should be !

Target 4:
CDK (tar.gz file here)
Cyclin Dependent Kinase
2:
Relatively flat pocket
with all the PDB files
of the
protein with 10 ligands. The ligands with the Xray conformation and in
3D are in Mol2 and SDF format. Atome types and structures were checked
on the graphics system. Also present in the set: the denovo
conformation (i.e, the 3D structure is generated again from the
connectivity and atom typing but the resulting structures differ from
the Xray structure, thus they can be used for VLS experiments since we
would like to have the ligands and the other molecules present in the
collection processed via the same protocol) of the same 10 ligands,
generated from Omega2.1 from the Xray structure of the ligands in
single conf and multi conf.
We have 10 PDB Xray
files with different ligands in the active sites:
1E9H.pdb code ligand INR
1FVT.pdb code ligand 106
1FVV.pdb code ligand
107 --> protein used for docking (reference protein)
1KE6.pdb code ligand LS2
1G5S.pdb code ligand I17
1H1S.pdb code ligand 4SP
1OGU.pdb code ligand ST8
1P2A.pdb code ligand 5BN
1PF8.pdb code ligand SU9
1PXL.pdb code ligand CK4
The reference structure
used for docking is 1FVV.
We added Hydrogen atoms
assuming normal pKa with InsightII (Accelrys)
the file
FVV_insight_H_fordocking.pdb (only the chain A of the protein) can be
used.
the ligand reference of
1FVV, 107 is also in PDB format and is named:
ligand_reference_for_1FVV.pdb
Then we have the 10
ligands inside the same files with different formats:
CDK2.Xray-with-H.sdf
CDK2.Xray-with-H.mol2
--- > generated with
babel from CDK2.Xray-with-H.sdf DO NOT USE THIS FILE WITH OMEGA
because the aromaticity is detected by babel and
messes up the protonation on 1P2A, 1H1S and 1PF8
CDK2.Xray.denovo.singleConf.mol2
---> used for flexible structure based method
CDK2.Xray.denovo.multiConf-X50.mol2 --->
used to test
either 3d ligand-based method or rigid ligand docking
The denovo structure
were created
using omega. The input was the Xray 3D structure but with the
protonation and atom typing made by hand.
Also might be of
interest if needed to use the following crystal structures superimposed
on 1FVV
1E9H_sur_1FVV.pdb
with its
ligand
INR_1E9H_sur_1FVV.pdb both the
coordinate frame
of 1FVV
1FVT_sur_1FVV.pdb
with its
ligand
106_1FVT_sur_1FVV.pdb both the
coordinate frame
of 1FVV
1KE6_sur_1FVV.pdb
with its
ligand
LS2_1KE6_sur_1FVV.pdb both the
coordinate frame
of 1FVV
1G5S_sur_1FVV.pdb
with its
ligand
I17_1G5S_sur_1FVV.pdb both the
coordinate frame
of 1FVV
1H1S_sur_1FVV.pdb
with its
ligand
4SP_1H1S_sur_1FVV.pdb both the
coordinate frame
of 1FVV
1OGU_sur_1FVV.pdb
with its
ligand
ST8_1OGU_sur_1FVV.pdb both the
coordinate frame
of 1FVV
1P2A_sur_1FVV.pdb
with its
ligand
5BN_1P2A_sur_1FVV.pdb both the
coordinate frame
of 1FVV
1PF8_sur_1FVV.pdb
with its
ligand
SU9_1PF8_sur_1FVV.pdb both the
coordinate frame
of 1FVV
1PXL_sur_1FVV.pdb
with its ligand
CK4_1PXL_sur_1FVV.pdb both the
coordinate frame
of 1FVV
Protein Flexibility:
The residues having the
biggest amplitude across the ten Xray structures of CDK2 are
Phe80 at the bottom of
the
pocket --> motion within
the same plane so
no change with regard to potential pi pi stacking
His84 at the entrance
of the pocket --> interaction of ligand with the backbone part
of His84 so no consequences
Lys33 at the back of
the
pocket --> except for
1PXL (two folds for
Lys33) the motion of the lysine is acceptable. Presence of two clusters
of position 1st (1OGU,
1H1S, 1E9H, 1FVV) and 2nd (1PXL, 1FVT, 1G5S, 1P2A, 1PF8, and
1KE6).
Figure of the ligands (de novo)

Target 5:
RNAse (tar.gz file here here)
Ribonuclease A
Large active subdivided in two subsites, one more permissive to adenosine-like (1)
compounds and the other to uridine-like (2) compounds.
Eight PDB file are present the RNAse validation set:
PDB
ligand sunsbite
1AFK.pdb code ligand PAP (1)--> reference structure
1AFL.pdb code ligand ATR (1)
1JN4.pdb code ligand 901 (1) + (2)
1O0F.pdb code ligand A3P (1)
1O0M.pdb code ligand U2P (2)
1O0N.pdb code ligand U3P (2)
1O0O.pdb code ligand A2P (1***)
1QHC.pdb code ligand PUA (1) + (2)
Two of the ligand 901 from 1JN4 and PUA from 1QHC occupy both subsite.
Note: 1O0O.pdb display a major flipping (180 degrees) of His119 which creates a change in the usual folding of adenosine-like
compounds.
Beside this the other side chain don't display major variations.
Concerning the flexibility
1AFK_Hins.pdb (pdb file + Hydrogen added by insightII at pH 7)
the eight ligands taken from the crystal structures and manually protonated and typed are in :
RNAse.Xray.sdf
the denovo version of these ligands either single conf or multi conf X50 are in:
RNAse.Xray.denovo.singleConf.mol2
RNAse.Xray.denovo.multiConf-X50.mol2
Note: the phospohate groups present in certain of these molecules are not charged but one of the 3 oxygen carries a hydrogen.
This would seem ok for the ligand in solution considering the 3 pKas of the group
Figure for this protein

Target 6:
ER (get the file here)