[ensembl-dev] GRCh37 Protein sequence has asterisks

Luke Goodsell l.goodsell at achillestx.com
Tue Dec 5 12:05:37 GMT 2017


Thanks, Will,

Unfortunately, I cannot find cDNA from RefSeq – their sequences contain UTRs. Is there an easy way to identify the start and stop codons? The longest ORF is not always the correct one, unfortunately.

Incidentally, being able to get the sequences used by VEP is very important for us; we’re trying to construct the new protein sequences that result from variants using consequence information annotated by VEP. We’d very much appreciate the corrected sequences being incorporated into the otherfeatures database as soon as possible.

Kind regards,
Luke

From: William McLaren <wm2 at ebi.ac.uk>
Date: Tuesday, 5 December 2017 at 09:15
To: Luke Goodsell <l.goodsell at achillestx.com>, Ensembl developers list <dev at ensembl.org>, Alessandro Vullo <avullo at ebi.ac.uk>
Subject: Re: [ensembl-dev] GRCh37 Protein sequence has asterisks

Hi Luke,

There is no straightforward way to do this via Ensembl at the moment; I’d suggest you download the relevant files from NCBI.

The BAM files we use are obtained from ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/H_sapiens/GRCh37.p13_interim_annotation/; it seems there’s a protein and rna FASTA file in there which may have what you need.

Otherwise you may find what you need in the parent directory ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/H_sapiens. I’m not familiar with NCBI’s FASTA layout so you’d have to investigate yourself!

Regards

Will McLaren
Ensembl Variation



On 4 December 2017 at 5:55:38 pm, Luke Goodsell (l.goodsell at achillestx.com<mailto:l.goodsell at achillestx.com>) wrote:
Hi Allessandro,
Is there a way to extract the BAM-edited sequences? I'd simply like to get FASTA files of the RefSeq cDNA and proteins as used by VEP.
Kind regards,
Luke


From: Alessandro Vullo
Sent: Monday, 4 December, 17:44
Subject: Re: [ensembl-dev] GRCh37 Protein sequence has asterisks
To: Ensembl developers list, Luke Goodsell

Hi Luke, The problem is likely to depend on RefSeq differing from the reference. Are you using VEP and then retrieving the sequence as annotated by it? Quoting the relevant people (VEP): "VEP uses BAMs to correct RefSeqs that differ from the reference, and without those the API can give incorrect translations. This will hopefully be fixed in future when the SeqEdit objects that VEP creates from the BAMs are incorporated directly into the otherfeatures DB." Hope that helps, Alessandro
This e-mail message contains confidential information intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, please do not disseminate, distribute or copy this communication, by e-mail or otherwise. Instead, please notify us immediately by return e-mail and then delete and discard all copies of the e-mail. We have taken all reasonable precautions to check this e-mail and any attachments for viruses, but we cannot accept any liability for any damage sustained as a result of any virus, worm or other malicious software. Achilles Therapeutics Limited (10167668) is registered in England and Wales. The registered office is at 215 Euston Road, London, NW1 2BE, UK. _______________________________________________
Dev mailing list Dev at ensembl.org
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/
This e-mail message contains confidential information intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, please do not disseminate, distribute or copy this communication, by e-mail or otherwise. Instead, please notify us immediately by return e-mail and then delete and discard all copies of the e-mail. We have taken all reasonable precautions to check this e-mail and any attachments for viruses, but we cannot accept any liability for any damage sustained as a result of any virus, worm or other malicious software. Achilles Therapeutics Limited (10167668) is registered in England and Wales. The registered office is at 215 Euston Road, London, NW1 2BE, UK.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20171205/91193607/attachment.html>


More information about the Dev mailing list