Teachers: These questions are aimed to summarize the use of the tool and a discussion in class is recommended.

Considering the experience and knowledge acquired in this task:

14. What are the uses of Prosite?
Teachers: As mentioned, Prosite predicts what motifs may occur in the studied protein by scanning a database of motifs and protein families. If a similarity between a region in the studied protein sequence and a certain motif sequence in the database is found, we could deduce that they both share a similar structure and/or function. Identifying previously known motifs in the protein sequence is indicative of the possible functions of the protein, its structure, its cellular localization etc. The tool also allows to predict which regions of the protein or even particular amino acids are essential for maintaining its functionality. We also noticed how sequence analysis by Prosite revealed that a change in the sequence, even of one amino acid, resulted in a decreased similarity score of the relevant motif, reflecting an impact on the motif, and hence the protein structure and function, due to impaired protein folding and degradation.

15. What is the underling principle of Prosite?
Teachers: Prosite search engine compares the query amino acid sequence to sequences of protein motifs with structural or functional significance, presented by profiles and patterns, and displays those most similar to sequence segments in the query sequence. Thus Prosite provides information regarding known motifs that may occur in the query protein sequence, their positions, their characteristics and role etc. In addition, the score given to each motif reflects the degree of similarity between the segment sequence in the query protein and the motif’s sequence in the database.

16. Does Prosite rely on the three-dimensional structure of the query protein to identify its possible motifs?
Teachers: No, the input of the tool Prosite is a query protein sequence only. A motif can consist of a few amino acids that hold a certain role, such as an active site of an enzyme. It can also consist of tens or even hundreds of amino acids folding in a characteristic structure. In both cases, the amino acids that compose the motif are conserved (in a short functional motif the sequence is highly conserved while in a large structural motif conservation is less strict). The sequence conservation if fundamental for the identification of the motif and its sequence determination in the database. It also allows, through sequence screening and alignment by Prosite, to identify the motifs in a query sequence. Determining a structure of a protein is a long and expensive process, and in fact, there are only few proteins for which the structure is defined. The ability to identify motifs using the protein sequence only, allows us to learn about the possible strcture and functions of a protein, without actually determining its spatial structure.

17. Among the bioinformatics tools in the toolbox, can we use another tool to obtain the information, or part of the information, obtained using Prosite?
Teachers: There is some overlap between the information that can be obtained from the use of different bioinformatics tools. Also, different bioinformatics tools may use different algorithms and softwares and display the resulting information in a different manner. For example, analyzing the fields Features of a protein record through a database of amino acid sequences provides information on structural and functional characteristics such as regions and active sites. However this presentation is displaced in a text form, is less user- friendly and does not show the sequence of the motif. Also, such analysis is limited to sequences registered in the database and may not fit the analysis of novel mutant sequences, for example. On the other hand, Protein Blast, or in short BLASTp, can identify proteins with a sequence that is similar to CFTR, which may suggest shared motifs and/or similar activity. Yet, this tool does not provide a detailed information on neither the sequence of the motifs nor their position and organization along the CFTR protein sequence. These tools, Entrez for text searching and analyzing protein records, or BLASTp for searching homologous sequences, do not provide detailed information regarding on protein motifs, their function and structure, and do not refer to protein families. To conclude, we can say that bioinformatics tools, other than Prosite, can sometimes provide some of the information obtained by Prosite. The advantages of Prosite are: it enables the analysis of any chosen sequence of amino acids; it is not limited to protein records previously registered in a database; it provides detailed information on motifs that have been identified as similar to a segment of any input sequence and on the similarity; and, Prosite relies on a designated and reliable database of motifs that mandates experts verification of its data, unlike some other databases that use a free automatically registered data.

18. Is there another bioinformatics tool that resembles Prosite in the way it work? In what ways are they alike and in what do they differ?
Teachers: In principle, Prosite resembles Protein BLAST. Both are searching engines that compare a query amino acid sequence to sequences in the database. Both report search results of sequences that are most similar to segments of the query sequence, indicating the region and degree of similarity. The main difference is that Prosite searches through a designated database of protein families and motif sequences, motifs with a known structure or function that is essential to protein activity. BLASTp, on the other hand, searches through protein records for proteins with sequence similarities, and the user can infer on the possible structure or activity of the query protein, and then indirectly on a structure- function relationship.

In this task we studied the protein CFTR and its mutations that can cause cystic fibrosis disease. At first, we studied the structural and functional motifs conveyed in the normal protein and their significance in its activity using Prosite. The tool compares the query sequence to sequences in a database of protein families and motifs that have been characterized. By sequences similarities, Prosite predicts which motifs may occur in the query sequence and in what position, and thus provides information regarding possible structure and function of the protein.
We found that the CFTR protein, which functions in the membrane as an ion channel, belongs to a protein family called ABC (a short for ATP-Binding Cassette). Members of this family share two types of conserved motifs: one that anchors the protein to the cell membrane (ABC_TM1F) and another that binds ATP molecule that is required for the active transfer of ions against the concentration gradient (ABC_TRANSPORTER2). The CFTR protein harbors two motifs of each type. Next, we learned about the CFTR gene mutation, F508del, that is prevalent among Jews and worldwide. Using Prosite, we examined how the mutation affects the protein motifs. We discovered that the F508del mutation leads to a minor change in the protein sequence (a deletion of one amino acid), that is positioned in the N-terminal ABC_TRANSPORTER2 motif. Yet, this mutation is sufficient to cause a defective protein folding and its subsequent degradation, and thus losing the activity of the CFTR protein in the cell. Patients with this mutant allele usually suffer a severe phenotype of the disease.