

Prior knowledge can be used to predict or shorten the list of possible binding partners of a given peptide of interest, provided a peptide shares significant sequence similarity with other peptides or proteins whose binding partners are known. The functions of peptides, including their interacting partners, are determined by their sequence and similar to longer proteins, can be predicted based on sequence similarity. Driven by the great interest in the diverse applications of peptides, the new peptidomics field is rapidly emerging. Examples of diagnostic uses include membrane-translocating agents, receptor targeting agents, and enzyme substrates. Both naturally occurring and synthetic peptides are used in therapeutic applications, for example somatostatin analogs in tumor radiotherapy and oxytocin to induce labor. Naturally occurring peptides function as hormones, transmitters, and modulators of numerous biological processes. Many of the peptides used as pharmaceutical and diagnostic agents fall within this cut-off. We thus currently use an IUPAC-IUB length cut-off of 20 amino acid residues or less. In this paper, we use the term "peptides" as a common synonym for oligopeptides, which are defined as having "fewer than about 10–20 residues". Peptides are defined by International Union of Pure and Applied Chemistry and International Union of Biochemistry and Molecular Biology (IUPAC-IUB) as compounds "produced by amide formation between a carboxyl group of one amino acid and an amino group of another". We hope that the peptide database, the associated tools, and the text mining algorithm will be useful to the larger biomedical community. Examination of initial data yielded some surprises as well, providing an incentive for us to make further improvements to the database. The data, available through a web-based interface for simple and more advanced text search and BLAST and Smith-Waterman sequence similarity search, proved useful in our own work.
#PEPTIDE SEQUENCES FULL#
We therefore sought to address this issue by developing a combination of automatically mining MEDLINE abstracts for peptide sequences, combining the existing bioinformatics sources, and manually curating the full text articles and MEDLINE text mining results. Unfortunately, the wealth of the peptide sequences in these sources is often difficult to access by modern methods of sequence similarity searching, because peptide sequences are not extracted in a suitable format. While many excellent databases exist that provide protein sequence data, protein interaction data, and peptide data, a substantial fraction of literature data remains untapped. Peptides have emerged as important affinity ligands for diagnostic and therapeutic medical uses as well as materials for a host of applications in biotechnology.
#PEPTIDE SEQUENCES CODE#
The database is freely available on, and the text mining source code (Peptide::Pubmed) is freely available above as well as on CPAN ( ). The database has biological and medical applications, for example, to predict the binding partners of biologically interesting peptides, to develop peptide based therapeutic or diagnostic agents, or to predict molecular targets or binding specificities of peptides resulting from phage display selection. We have created and maintain a database of peptide sequences. We show the utility of the database in different examples of affinity ligand discovery. An additional, smaller part of the database is manually curated from sets of full text articles and text mining results. Another component of the database is the peptide sequence data from public sources (ASPD and UniProt). The major source of peptide sequence data comes from text mining of MEDLINE abstracts. The database has a web-based user interface with a simple, Google-like search function, advanced text search, and BLAST and Smith-Waterman search capabilities. We have constructed a new database (PepBank), which at the time of writing contains a total of 19,792 individual peptide entries. Rather, peptide sequences still have to be mined from abstracts and full-length articles, and/or obtained from the fragmented public sources.
#PEPTIDE SEQUENCES ARCHIVE#
To date, there does not exist a single, searchable archive for peptide sequences or associated biological data. Peptides are important molecules with diverse biological functions and biomedical uses.
