Memorandum
| To: |
BioInformatics Interns |
| From: |
Internship Coordinator |
| Subject: |
Methods for Locating Sequences |
I am looking forward to working with each of you on the next tasks ahead.
Your first goal will be to locate a set of molecular sequences from different strains of influenza at the NCBI database. You will test several methods because the sequences are located in many different places and we use different methods to find them. In the Avian Influenza project, you will find that the best methods will depend on the information you have on hand. For example, if a sequence has been described in a published paper, you can use the accession number from the paper to locate the sequence. If you don't have an accession number, but you know the name of the sequence, you can search with the sequence name. If you know the name of the organism that served as the source of the sequence, you can look for sequences from that organism, or, if you have a nucleic acid or protein sequence, you can use a search program called BLAST (Basic Local Alignment Search Tool) to find sequences that are similar.
You will test each of these methods in turn, and at the end, discuss the pros and cons of using different types of methods with your supervisor.
The deliverables:
- A multi-sequence data set that can be formatted for use in other work.
- Information about following: the number of H and N types in the NCBI taxonomy databank, the number of possible combinations of H and N types, the number of flu subtypes that have genome sequences. This information would be used to guide strategies for vaccination programs.
- Reasonable strategies for locating sequences in diverse databanks.
