3D Shape-Structure Comparison Method for Protein Classification
In this paper, a 3D shape-based approach is presented for the efficient search, retrieval, and classification of protein molecules. The method relies primarily on the geometric 3D structure of the proteins, which is produced from the corresponding PDB files and secondarily on their primary and secondary structure. After proper positioning of the 3D structures, in terms of translation and scaling, the Spherical Trace Transform is applied to them so as to produce geometry-based descriptor vectors, which are completely rotation invariant and perfectly describe their 3D shape. Additionally, characteristic attributes of the primary and secondary structure of the protein molecules are extracted, forming attribute-based descriptor vectors. The descriptor vectors are weighted and an integrated descriptor vector is produced. Three classification methods are tested. A part of the FSSP/DALI database, which provides a structural classification of the proteins, is used as the ground truth in order to evaluate the classification accuracy of the proposed method. The experimental results show that the proposed method achieves more than 99 percent classification accuracy while remaining much simpler and faster than the DALI method.