nd antibody structures. Finally, a 429-bit sequence similarity descriptor was obtained. For geometric descriptor, three different aspects were taken into considerations: bond length, bond angel and dihedral angle. 41-bit of protein geometry descriptors were obtained for each antigen-antibody proteins in our dataset. Types of protein geometry descriptors could be seen in S1 Conclusions Currently, we can only rely on experimental methods to test the binding affinity of mutated antigens with certain antibody or antiserum. Considering the time-consuming experimental methods, computational methods which can accurately describe the antigen-antibody interaction and further help the measurement of binding affinity is highly desired. In this work, a series of protein fingerprint with epitope-paratope interaction fingerprint were firstly introduced and successfully tested on benchmark dataset through Proteochemometric Modeling. The results indicated that our new established protein fingerprint achieved a better predictive ability than peers. In addition, when cross-terms were introduced into Proteochemometric model, the newly established EPIF not only significantly improved the prediction ability, but 5 / 15 PCM Modeling by New Protein Fingerprints and EPIF Fig 1. Predicted PubMed ID:http://www.ncbi.nlm.nih.gov/pubmed/19763407 binding energy of all antigen-antibody in our testing set. PubMed ID:http://www.ncbi.nlm.nih.gov/pubmed/19763871 Panels a~h represent the predicted value compared with actual value simulated by Hex. Panel i represents the graphical illustrations of the predictive ability of all 8 obtained order BCTC models with the selected kernel. doi:10.1371/journal.pone.0122416.g001 also outperformed the pervious cross-terms of MLPD. Results also proposed that EPIF as a structure descriptor can increase the predictive performance of the Proteochemometric model based on conventional structure descriptors, but may not be suitable for sequence descriptor. Moreover, our recommended model based on support vector regression with descriptor combination of Fab-Fag-EPIF showed the ability to simulate bonding affinities for antigen-antibody complexes. With known or simulated conformational structures of antigen-antibody complexes, this new established fingerprint will be able to simulate binding affinity, and further, provide assistance for antibody screening. Materials and Methods Data set Training and validation dataset of antigen-antibody complexes were extracted from Protein Data Bank. We artificially excluded the inappropriate searching results such as: structures 6 / 15 PCM Modeling by New Protein Fingerprints and EPIF containing only antigen or antibody, T cell epitope-antibody complex structure. Also, structures with low crystalline precision and short sequence length had been excluded to ensure the quality of our dataset. Specific steps and parameters are given as follows: 1. Searching Keywords: antibody, antigen, Fab, Fv, Fc, IgG and immu 2. Resolution better than 3.0 3. Antigen length with more than 50 residues 4. Two structures share identical sequence and conformational in both epitope and paratope, one of them were removed from our dataset After these four steps, crystal structures of 429 antigen-antibody complexes including 343 as training data and 86 as testing data were collected. The PDB IDs in our dataset can be found in the Supplementary Data. Epitope and Paratope determination For each antigen-antibody complex structure in our dataset, epitope and paratope residues were distinguished by Solvent Accessible Surface Area based methods. SASA value