................ SHORT DOC ............................................. CCCPP: Computes Cavities, Channels, Pores and Pockets in proteins. References: Benkaidali L., Andre F., Maouche B., Siregar P., Benyettou M., Maurel F., Petitjean M. Computing cavities, channels, pores and pockets in proteins from non spherical ligands models. Bioinformatics, 2014, 30[6], 792-800. DOI 10.1093/bioinformatics/btt644 Benkaidali L., Andre F., Moroy G. Tangour B., Maurel F., Petitjean M. The cytochrome P450 3A4 has three major conformations: new clues to drug recognition by this promiscuous enzyme. Mol. Inf., 2017, 36[10], 1700044. DOI 10.1002/minf.201700044 Benkaidali L., Andre F., Moroy G., Tangour B., Maurel F., Petitjean M. Four Major Channels Detected in the Cytochrome P450 3A4: a Step Toward Understanding its Multispecificity. Int. J. Mol. Sci., 2019, 20[4], 987. DOI 10.3390/ijms20040987 mailto: petitjean.chiral@gmail.com CCCPP computes the Delaunay triangulation of a protein, then computes a modified alpha shape model of the network of channels for a given ligand. Then it computes the potential pathways to a given end point for this ligand. Version 1 of CCCPP is no longer maintained. Input data and parameters: BIO : Biosym (MSI) files CAS : Reserved for internal purposes HIN : Hyperchem-type files ISU : Reserved for internal purposes MDL : Cambridge Crystallographic Model files ML2 : SYBYL Mol2 files PDB : Protein Data Bank or Nucleic Acid Data Bank files (only HEADER, ATOM, ENDMDL and END records are recognized) SDF : Symyx Mol/SDF files (data between 'M END' and '$$$$' are treated as comments) XYZ : n+2 lines. Line 1: n; line 2: free comment, Next n lines: label or atomic symbol, x, y, z (separator: spaces; no tabulation allowed). INPUT MOLEC FILE NAME: name of the input file containing the protein Ouput lists: node addresses, tetrahedra, internal faces, external faces, extreme nodes Enter 5 characters (either Y or N ) separated by spaces ou commas. Each Y or N character indicates whether or not the corresponding list will be output. E.g., entering Y Y Y Y Y means that all the five lists are to be output. EPSTAB: Generate randomly perturbated cartesian coordinates. The coordinates are not modified when EPSTAB is negative or null. Independant random 3-tuples (x,y,z) are added to the spatial atomic positions. Each random 3-tuple follow an isotropic normal law of std.dev equal to EPSTAB. with radius equal to EPSTAB and centered on the atomic position. *** WE RECOMMEND TO USE THIS OPTION IN ORDER *** *** TO AVOID POTENTIAL NUMERICAL INSTABILITIES *** Most of the time, EPSTAB=1.D-7 is effective. For graphic display: OUTPUT FORMAT: Format of the molecular file to be produced for visualization of the channels. Same conventions than for the input format. OUTPUT MOLEC FILE NAME: Name of the output molecular file for visualization. MODEL: Enter an integer value in {1, 2, 3, 4} 1: Reserved for internal purposes. 2: The ligand with a CRITICAL VALUE smaller or equal to the radius of the circle circumscribed to a triangle of the Delaunay triangulation is declared to be able to pass through that triangle. 3: The ligand with a CRITICAL VALUE smaller or equal to the radius of the largest disk having its center in a triangle of the Delaunay triangulation and not containing any of the three vertices of the triangle, is declared to be able to pass through that triangle. 4: The ligand with a CRITICAL VALUE smaller or equal to the largest of the three heights of a triangle of the Delaunay triangulation is declared to be able to pass through that triangle. That latter model is convenient for ligand shapes modelized by minimal height enclosing cylinders. CRITICAL VALUE: See the MODEL parameter. Many geometrical descriptors of ligands are computable with the RADI freeware. OUTFAL: Enter Y to output the list of the internal triangles of the Delaunay triangulation with their status relative to the four models above. Enter N to cancel the output of that list. Triangles are flagged as external when they are on the boundary of the convex hull. The other ones are flagged as internal. OUTFEL: Enter Y to output the list of the external triangles of the Delaunay triangulation with their status relative to the four models above. Enter N to cancel the output of that list. Triangles are flagged as external when they are on the boundary of the convex hull. The other ones are flagged as internal. OUTCA1: Enter Y to output the connected components (channels) of the facial graph, and for each channel its radius, diameter, surface and volume. Enter N to cancel the output of that list. OUTCA2: Enter Y to output for each tetrahedra, its component and its excentricity in the facial graph. Enter N to cancel the output of that list. VISU0: Enter 1 to store the connected component selected by OUTCA1 in the output molecular file for visualization. Enter 0 to cancel that option. VISU1: Enter 1 to store the nodal graph of the Delaunay triangulation in the output molecular file for visualization. Enter 2 to store the nodal graph of the network of channels. Enter 3 to store the nodal graph of the alpha shape. Enter 0 to cancel that option. VISU2: Enter 1 to store the facial graph of all network of channels in the output molecular file for visualization Enter 0 to cancel that option. VISU3: Enter 1 to store the facial graph of the network of channels connected to the exterior of the protein in the output molecular file for visualization. Enter 0 to cancel that option. When generated, the graphs above are concatenated in the common molecular file as separate "molecules", to be displayed by some molecular viewer. VISU4: Enter 1 to store the nodal graph of each MCP (minimal cost path to the targetted atom: see further) in the output molecular file for visualization. Enter 0 to cancel that option. COMPONENT NUMBER OR POCKET TO VISUALIZE: Requested only if VISU0 is not null. Enter the number of the connected component (individual channel) of the facial graph to be stored in the output molecular file for visualization. Entering a negative number selects all the connected components having a size (number of tetrahedra) equal to the absolute value of this number. Entering a negative number of absolute value greater than the largest component size (e.g. -9999999) selects the component(s) having this largest size (generally, that permits to select the biggest cavity or channel). Enter 0 to cancel that option. TARGETED ATOM ADDRESS: Enter the number of the targeted atom. For a protein of n atoms, this number is in {1,...,n}. It is NOT the atom number printed in the input PDB file. Output results: -------------- The number of protein atoms, followed by the Delaunay triangulation structure (see input parameters). The triangular faces with their status (available to the ligand or not: see OUTFAL and OUTFEL) When these lists are printed, the three rightmost printed values for a face correspond to the availability of this face to the ligand, respectively for MODEL 2, 3, and 4. A value +1 indicates that the ligand can pass through the triangle, and a value -1 indicates that it can't. The list of the connected components (channels: see OUTCA1), with their size, diameter, radius, volume, surface, volume, and number of available external triangles an tetrahedra. The list of the nodes (tetrahedra: see OUTCA2) of the facial graph, with the excentricity of each node in its connected component. The number of tetrahedral cells available for the ligand connected to the exterior, then their neighbouring cells available to the ligand and connected to the previous ones, and so on. For each tetrahedra having the TARGETED ATOM as vertex, CCCPP indicates whether or not it can be reached or not by the ligand from the exterior of the protein: respectively, it is indicated PATH FOUND, or "unreachable from the exterior of the convex hull". The list of the MCPs (minimal cost paths) to the targetted atom, with cost, surface, volume, and route. Routes are described in terms of tetrahedra, triangles, atoms and residues. Remarks: ------- The number of protein atoms is currently limited to 50000. The source has to be recompiled to read larger proteins. The coefficient (D-R)/R computed for each channel takes values in [0;1]. It was introduced in J. Chem. Inf. Comput. Sci. 1992, 32[4], 331-337 (DOI 10.1021/ci00008a012). ................ END SHORT DOC .........................................