[ensembl-dev] VEP: Is it possible to add LRG annotations?

Will McLaren wm2 at ebi.ac.uk
Tue Apr 8 10:02:50 BST 2014


Hi Andrew,

LRGs have their own coordinate system in Ensembl, with each LRG being
mapped to its own "seq_region" (think of this like a mini-chromosome
containing the LRG gene and 5kb of flanking sequence either side). This is
because the reference sequence for an LRG can differ from the core human
reference sequence from GRC.

When you run the VEP with --lrg, the Ensembl API attempts to map your
genomic input coordinates to the LRG coordinate system, creates a duplicate
internal variation object representing this mapping, then does its usual
consequence calling thing on the custom LRG transcript(s) that exist on
that coordinate system. As a result of this, since the VEP sorts output by
chromosome, the results that map to LRGs will jump out of order here. This
will look particularly odd if you use VCF output as the coordinates in the
VCF will be retained from your input. I'm afraid currently there's no way
around this as it's just how the API operates.

When you use --pick, the code for this considers only the consequence calls
within one of these internal variation objects, so for each mapping it will
pick one consequence amongst the Ensembl transcripts and one amongst the
LRG transcripts.

Regarding the strand issue, this may be a bug - within the LRG coordinate
system the LRG gene is considered to map to the forward strand, even if
this whole coordinate system maps to the reverse strand of the reference
genome (as it would for a reverse strand gene). I'll take a look at this as
it may be that the VEP is doing the wrong thing here.

Cheers

Will


On 7 April 2014 22:28, Andrew Carson <acarson at invivoscribe.com> wrote:

> Hi Dr. McLaren,
>
> Sorry for the additional query, but I would like to add to my last posted question. As I investigate further, it seems that I don't understand how the --lrg flag works. Is it adding a second separate annotation if there is an LRG overlap?
> I notice that when I run the commands:
>
>
>
> perl variant_effect_predictor.pl --fork 4 --no_stats --everything --lrg --cache --format vcf --force_overwrite --check_existing --check_alleles --vcf --no_progress --pubmed --gmaf --maf_1kg --pick -i input.vcf -o output.VEP.vcf
>
>
>
> Whenever I get a LRG annotation, there is a separate regular annotation. For example:
>
>
>
> 1       92941604        .     C       T       .       .       CSQ=T|ENSG00000162676|ENST00000370332|Transcript|synonymous_variant|1570|1251|417|T|acG/acA|COSM1344916|||7/7||||||
>
> |-1||YES|GFI1|HGNC||||protein_coding|ENSP00000359357|PROSITE_profiles:PS50157&SMART_domains:SM00355&Superfamily_domains:SSF57667|CCDS30773.1|ENST00000370332.1:c.1251G>A|ENST00000370332.1:c.1251G>A(p.%3D)|||||
>
>
>
> Followed 105 lanes later by:
>
>
>
> 1       92941604        .     C       T       .       .       CSQ=A|LRG_63|LRG_63t1|Transcript|synonymous_variant|1501|1251|417|T|acG/acA||||7/7|||||||1||YES|LRG_63|LRG||||LRG_g
>
> ene|LRG_63p1|||LRG_63t1.1:c.1251G>A|LRG_63t1.1:c.1251G>A(p.%3D)|||||
>
>
>
> This causes the output .vcf to have out of order variants (no longer properly sorted). Not that this is the same consequence but one shown as the + strand (CSQ=T) and one is shown on the - strand (CSQ=A).
>
>
>
> Am I doing something wrong here?
>
> Any help would be appreciated.
>
> Thanks!
> Andrew
>
>
>
>
>
> >Thank you very much Dr. McLaren.
>
> >Just one clarification to the LRG choice. Is the LRG always presented as the first consequence (if it exists)? If this is true, then if the --pick chooses the worst consequence, and there are multiple transcripts with the same >"worst consequence", does VEP --pick the first transcript with that consequence? If that is true, if the LRG contains the "worst consequence" along with other similar transcripts, will --pick successfully choose this consequence >over other equal transcripts?
>
>
>
> >Any help on this would be appreciated.
>
> >Thank you again!
>
> >Andrew
>
> >
>
> >*I should also say that there's currently no way to prioritise LRG*
>
> >*consequences other than filtering using filter_vep.pl <http://filter_vep.pl>, though this wouldn't*
>
> >*be a complete solution.*
>
> >
>
> >*Will*
>
> >
>
> >
>
> >*On 1 April 2014 09:51, Will McLaren <wm2 at ebi.ac.uk <http://ebi.ac.uk><http://lists.ensembl.org/mailman/listinfo/dev <http://lists.ensembl.org/mailman/listinfo/dev>>> wrote:*
>
> >
>
> >* Hi Andrew,*
>
> >
>
> >* In fact this is already possible; just add the flag --lrg at runtime. Note*
>
> >* however that using LRGs depends on connecting to our database, so this will*
>
> >* not work using --offline and will connect to ensembldb.ensembl.org <http://ensembldb.ensembl.org> when*
>
> >* using --cache. Because of this database connection you may find that the*
>
> >* script runs more slowly as it attempts to remap your input variants to LRG*
>
> >* coordinates.*
>
> >
>
> >* I'm afraid this is missing from the documentation currently, I will get*
>
> >* that updated.*
>
> >
>
> >* Regards*
>
> >
>
> >* Will McLaren*
>
> >* Ensembl Variation*
>
> >
>
> >
>
> >* On 31 March 2014 22:24, Andrew Carson <acarson at invivoscribe.com <http://invivoscribe.com><http://lists.ensembl.org/mailman/listinfo/dev <http://lists.ensembl.org/mailman/listinfo/dev>>> wrote:*
>
> >
>
> >>* Hi ensembl-dev team,*
>
> >>
>
> >>* I was just wondering if there are plans to incorporate the LRG (locus*
>
> >>* reference genomic) records into the VEP annotation pipeline (from here:*
>
> >>* http://www.lrg-sequence.org/home <http://www.lrg-sequence.org/home>). I only ask because in the HGVS new*
>
> >>* clinical reporting guidelines they recommend using the LRG sequence (if one*
>
> >>* is present) to standardize variant reporting.*
>
> >>
>
> >>
>
> >>
>
> >>* It would also be very useful to have an option where, if a variant*
>
> >>* overlaps an LRG, you can choose to "pick" that consequence over other*
>
> >>* consequences.*
>
> >>
>
> >>
>
> >>
>
> >>* Any thoughts on if this could be added to the development for the next*
>
> >>* release cycle?*
>
> >>
>
> >>* Thanks for all of your help!*
>
> >>
>
> >>
>
> >>
>
> >>* Andrew R. Carson, Ph.D.*
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140408/9a52b8d5/attachment.html>


More information about the Dev mailing list