RUNNING YOUR SCREENING

Step 4. Running your Screening

I will now include instructions for your virtual screening to take place on a single computer or in a pbs environment, as on a cluster.

Copy the appropriate script and run vina according to its directions.

To run on a single computer use the following bash shell (.sh) script.

Step 4 (a). Single Computer Screening

#############

#!/bin/bash

for h in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99;

do

cd $h

for f in ZINC*.pdbqt

do

b=`basename $f .pdbqt`

echo Processing "$b"

mkdir -p $b

~/bin/vina --cpu 2 --config ../conf --ligand $f --out ${b}/out.pdbqt --log ${b}/log

done;

cd ..;

done;

#############

Please notice the green highlighting. This should be changed to the number of processors on the computer you want to use for the calculations. Otherwise the entire "--cpu 2" argument can be left out and the program will attempt to detect how many cpu's the computer it is running on has and use all of these processors during the multithreading portion of each docking.

Please notice the red highlighting. This is the, "conf", configuration file. An example configuration file follows. Descriptions of values that need to be filled in are in yellow highlighting.

Also note that we need to be looking in the correct directory when we are referencing files in our scripts. Typing "../" before the file name looks one directory up from the directory the script is in. We do this because our "conf" file is in the "screening-directory". Meanwhile the script is running on "screening-directory/0" or "sceeening-directory/67" or essentially "screening-directory/number_directory".

Also, notice the opening "for each" loop. It is the line "for h in 0 1 2"... . This line lists each of the numbered folders that ontians or small molecule files in the "screening-directory". If you only want to run the screening on certain folders or want to start the script from the middle of the screening this lone need to edited. For example, to start the screening at folder 95, skip 96, and run on 97, 98, and 99 the line would look like:

############# ...

for h in 95 96 97 98 99;

do

...#############

Step 4 ("conf" file)

This is roughly how to set up a "conf" file (for more info check out the AutoDock Vina manual online manual; http://vina.scripps.edu/manual.html):

#############

receptor = ../name_of_receptor_file.pdbqt (a filename)

center_x = X coordinate of center of grid box, search area (a number)

center_y = Y coordinate of center of grid box, search area (a number)

center_z = Z coordinate of center of grid box, search area (a number)

size_x = size of grid box, search area in X direction (a number)

size_y = size of grid box, search area in Y direction (a number)

size_z = size of grid box, search area in Z direction (a number)

exhaustiveness = how much to search conformations (eg. 9)

num_modes = how many top conformations to keep (eg. 9)

#############

Step 4 (b). PBS Environment Computer Screening

#############

#!/bin/bash

#PBS -m be

#PBS -q queue_name

#PBS -l walltime=48:00:00 (or max walltime),nodes=1:ppn=number of cpus in node

##PBS -l mppnppn=7

#PBS -t 54-65

cd $PBS_O_WORKDIR

cd $PBS_ARRAYID

for f in ZINC*.pdbqt

do

b=`basename $f .pdbqt`

echo Processing "$b"

mkdir -p $b

~/bin/vina --config ../conf --ligand $f --out ${b}/out.pdbqt --log ${b}/log

done

#############

The preceding script will run one folder of molecules, of folders 54 through 65, on each of 12 nodes in the queue "queue_name". The value "queue_name" should be filled in with the name of the cluster queue you are going to run it on. Everything in green should be adjusted to fit the individual circumstance you are running the script in.

Step 4 (in case of screening being interrupted) (corrected)

You will likely find at some point that a screening gets interrupted. This can be very annoying because you will be left with half finished folders that have so many files that the gui file browser has difficulty handling them. The following script, that I call "filter.sh", is run by typing "./filter.sh 'number_directory'" in your bash terminal window. The 'number_directory' should be a command line argument to filter.sh and should be the name of the directory your screening was processing when it was interrupted (the one of the subdirectories numbered 0 through 99).

This script filters through the numbered I usually run this script from the 'screening_directory'. This script filters through the molecule files in the 'number_directory' and places the .pdbqt and .mol2 files from the molecules that have dockings recorded into the directories created by the previous step 4 scripts into a directory called "molecule_files". The appropriate previously listed Step 4 script can then be run on the directory the screening script was working on when it was interrupted and you will not loose the time and electricity of redocking molecule files you have already docked. The script is:

#############

cd $1;

mkdir molecule_files; for g in [ -d ZINC* ]

do

if [[ -s $g/out.pdbqt ]]; then

mv "$g".mol2 molecule_files/

mv "$g".pdbqt molecule_files/

fi

done

#############