Inadequate Reference Datasets Biased toward Short Non-epitopes Confound B-cell Epitope Prediction
Rahman, Kh. Shamsur
Chowdhury, Erfan U.
MetadataShow full item record
X-ray crystallography has shown that an antibody paratope typically binds 15-22 amino acids (aa) of an epitope, of which 2-5 randomly distributed amino acids contribute most of the binding energy. In contrast, researchers typically choose for B-cell epitope mapping short peptide antigens in antibody binding assays. Furthermore, short 6-11-aa epitopes, and in particular non-epitopes, are over-represented in published B-cell epitope datasets that are commonly used for development of B-cell epitope prediction approaches from protein antigen sequences. We hypothesized that such suboptimal length peptides result in weak antibody binding and cause false-negative results. We tested the influence of peptide antigen length on antibody binding by analyzing data on more than 900 peptides used for B-cell epitope mapping of immunodominant proteins of Chlamydia spp. We demonstrate that short 7-12-aa peptides of B-cell epitopes bind antibodies poorly; thus, epitope mapping with short peptide antigens falsely classifies many B-cell epitopes as non-epitopes. We also show in published datasets of confirmed epitopes and non-epitopes a direct correlation between length of peptide antigens and antibody binding. Elimination of short, 11-aa epitope/non-epitope sequences improved datasets for evaluation of in silico B-cell epitope prediction. Achieving up to 86% accuracy, protein disorder tendency is the best indicator of B-cell epitope regions for chlamydial and published datasets. For B-cell epitope prediction, the most effective approach is plotting disorder of protein sequences with the IUPred-L scale, followed by antibody reactivity testing of 16-30-aa peptides from peak regions. This strategy overcomes the well known inaccuracy of in silico B-cell epitope prediction from primary protein sequences.