[ensembl-dev] VEP: reporting HGVS identifiers with RefSeq accessions

Will McLaren wm2 at ebi.ac.uk
Tue Feb 14 10:05:35 GMT 2012


Hi Reece,

There's a shortcut which should cover a lot of situations. You can add
the flag --xref_refseq (a flag I neglected to add to the web docs,
although is in the help when you run with --help), which will add
RefSeq external reference identifiers to the Extra field of the
output. If you then add --ccds, you will also get CCDS identifiers.

For those rows of results that have both RefSeq and CCDS identifiers,
you know that the coding sequence is the same (although not
necessarily the whole transcript structure, UTRs may and will differ)
between RefSeq and ENST transcripts; hence you can lift over any
coding sequence derived results to the RefSeq identifier - this
includes any non-UTR consequences, SIFT/PolyPhen predictions and
HGVSc/HGVSp notations.

It would be simple to have a plugin do this lifting over for you in
these situations.

For those rows that don't have CCDS identifiers, you would have to
compare only the protein sequence between the RefSeq and the ENST
transcripts (as above this would obviously only let you transfer
between coding sequence derived info).

For anything outside the coding sequence, it would get a lot more hairy!

If you need any help doing this it probably wouldn't take long to
knock up some basic code to get you started.

Hope this helps!

Will McLaren
Ensembl Variation

On 14 February 2012 06:44, Reece Hart <reece at harts.net> wrote:
> Greetings-
>
> I'm using VEP 2.3 with Ensembl 65. I'd like to report HGVS tags with
> RefSeq NM accessions rather than ENST accessions where such is
> consistent with the ENST-based prediction. I'm also fetching SIFT and
> PolyPhen predictions, so I can't use --refseq.
>
> Has anyone already written a plugin to tackle this?
>
> If not, I'm going to give this a shot. I'd appreciate comments about
> the following approach.
>
> In plugin new():
> - get an adaptor for otherfeatures transcripts.
>
> In plugin run():
> - Identify the otherfeatures transcript that best corresponds to the
> ENST.  I'm concerned about differences in exon structure.  Is it
> sufficient to look for otherfeatures transcripts by xref, or do I need
> to compare exon structures (up to the variant, anyway)?
>
> - Construct a new TranscriptVariationAllele using the RefSeq
> transcript. I can use the same slice and compute relative transcript
> offsets, right?
>
> - Call hgvs_coding, _protein, etc. on the new TVA
>
> I have a table of the discordance between RefSeqs and GRCh37. I would
> probably post process with that, but I could put it in the plugin too.
>
>
> Is that about right? Thanks for tips.
>
>
> Thanks,
> Reece
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/




More information about the Dev mailing list