[ensembl-dev] VEP: Is it possible to add LRG annotations?

Will McLaren wm2 at ebi.ac.uk
Tue Apr 8 12:19:03 BST 2014


Just to follow up, I've pushed a fix to GitHub so that the allele is
reported correctly for the LRG consequence lines.

Will


On 8 April 2014 10:02, Will McLaren <wm2 at ebi.ac.uk> wrote:

> Hi Andrew,
>
> LRGs have their own coordinate system in Ensembl, with each LRG being
> mapped to its own "seq_region" (think of this like a mini-chromosome
> containing the LRG gene and 5kb of flanking sequence either side). This is
> because the reference sequence for an LRG can differ from the core human
> reference sequence from GRC.
>
> When you run the VEP with --lrg, the Ensembl API attempts to map your
> genomic input coordinates to the LRG coordinate system, creates a duplicate
> internal variation object representing this mapping, then does its usual
> consequence calling thing on the custom LRG transcript(s) that exist on
> that coordinate system. As a result of this, since the VEP sorts output by
> chromosome, the results that map to LRGs will jump out of order here. This
> will look particularly odd if you use VCF output as the coordinates in the
> VCF will be retained from your input. I'm afraid currently there's no way
> around this as it's just how the API operates.
>
> When you use --pick, the code for this considers only the consequence
> calls within one of these internal variation objects, so for each mapping
> it will pick one consequence amongst the Ensembl transcripts and one
> amongst the LRG transcripts.
>
> Regarding the strand issue, this may be a bug - within the LRG coordinate
> system the LRG gene is considered to map to the forward strand, even if
> this whole coordinate system maps to the reverse strand of the reference
> genome (as it would for a reverse strand gene). I'll take a look at this as
> it may be that the VEP is doing the wrong thing here.
>
> Cheers
>
> Will
>
>
> On 7 April 2014 22:28, Andrew Carson <acarson at invivoscribe.com> wrote:
>
>> Hi Dr. McLaren,
>>
>> Sorry for the additional query, but I would like to add to my last posted question. As I investigate further, it seems that I don't understand how the --lrg flag works. Is it adding a second separate annotation if there is an LRG overlap?
>>
>> I notice that when I run the commands:
>>
>>
>>
>> perl variant_effect_predictor.pl --fork 4 --no_stats --everything --lrg --cache --format vcf --force_overwrite --check_existing --check_alleles --vcf --no_progress --pubmed --gmaf --maf_1kg --pick -i input.vcf -o output.VEP.vcf
>>
>>
>>
>> Whenever I get a LRG annotation, there is a separate regular annotation. For example:
>>
>>
>>
>> 1       92941604        .     C       T       .       .       CSQ=T|ENSG00000162676|ENST00000370332|Transcript|synonymous_variant|1570|1251|417|T|acG/acA|COSM1344916|||7/7||||||
>>
>> |-1||YES|GFI1|HGNC||||protein_coding|ENSP00000359357|PROSITE_profiles:PS50157&SMART_domains:SM00355&Superfamily_domains:SSF57667|CCDS30773.1|ENST00000370332.1:c.1251G>A|ENST00000370332.1:c.1251G>A(p.%3D)|||||
>>
>>
>>
>> Followed 105 lanes later by:
>>
>>
>>
>> 1       92941604        .     C       T       .       .       CSQ=A|LRG_63|LRG_63t1|Transcript|synonymous_variant|1501|1251|417|T|acG/acA||||7/7|||||||1||YES|LRG_63|LRG||||LRG_g
>>
>> ene|LRG_63p1|||LRG_63t1.1:c.1251G>A|LRG_63t1.1:c.1251G>A(p.%3D)|||||
>>
>>
>>
>> This causes the output .vcf to have out of order variants (no longer properly sorted). Not that this is the same consequence but one shown as the + strand (CSQ=T) and one is shown on the - strand (CSQ=A).
>>
>>
>>
>> Am I doing something wrong here?
>>
>> Any help would be appreciated.
>>
>> Thanks!
>> Andrew
>>
>>
>>
>>
>>
>> >Thank you very much Dr. McLaren.
>>
>> >Just one clarification to the LRG choice. Is the LRG always presented as the first consequence (if it exists)? If this is true, then if the --pick chooses the worst consequence, and there are multiple transcripts with the same >"worst consequence", does VEP --pick the first transcript with that consequence? If that is true, if the LRG contains the "worst consequence" along with other similar transcripts, will --pick successfully choose this consequence >over other equal transcripts?
>>
>>
>>
>> >Any help on this would be appreciated.
>>
>> >Thank you again!
>>
>> >Andrew
>>
>> >
>>
>> >*I should also say that there's currently no way to prioritise LRG*
>>
>> >*consequences other than filtering using filter_vep.pl <http://filter_vep.pl>, though this wouldn't*
>>
>> >*be a complete solution.*
>>
>> >
>>
>> >*Will*
>>
>> >
>>
>> >
>>
>> >*On 1 April 2014 09:51, Will McLaren <wm2 at ebi.ac.uk <http://ebi.ac.uk><http://lists.ensembl.org/mailman/listinfo/dev <http://lists.ensembl.org/mailman/listinfo/dev>>> wrote:*
>>
>> >
>>
>> >* Hi Andrew,*
>>
>> >
>>
>> >* In fact this is already possible; just add the flag --lrg at runtime. Note*
>>
>> >* however that using LRGs depends on connecting to our database, so this will*
>>
>> >* not work using --offline and will connect to ensembldb.ensembl.org <http://ensembldb.ensembl.org> when*
>>
>> >* using --cache. Because of this database connection you may find that the*
>>
>> >* script runs more slowly as it attempts to remap your input variants to LRG*
>>
>> >* coordinates.*
>>
>> >
>>
>> >* I'm afraid this is missing from the documentation currently, I will get*
>>
>> >* that updated.*
>>
>> >
>>
>> >* Regards*
>>
>> >
>>
>> >* Will McLaren*
>>
>> >* Ensembl Variation*
>>
>> >
>>
>> >
>>
>> >* On 31 March 2014 22:24, Andrew Carson <acarson at invivoscribe.com <http://invivoscribe.com><http://lists.ensembl.org/mailman/listinfo/dev <http://lists.ensembl.org/mailman/listinfo/dev>>> wrote:*
>>
>> >
>>
>> >>* Hi ensembl-dev team,*
>>
>> >>
>>
>> >>* I was just wondering if there are plans to incorporate the LRG (locus*
>>
>> >>* reference genomic) records into the VEP annotation pipeline (from here:*
>>
>> >>* http://www.lrg-sequence.org/home <http://www.lrg-sequence.org/home>). I only ask because in the HGVS new*
>>
>> >>* clinical reporting guidelines they recommend using the LRG sequence (if one*
>>
>> >>* is present) to standardize variant reporting.*
>>
>> >>
>>
>> >>
>>
>> >>
>>
>> >>* It would also be very useful to have an option where, if a variant*
>>
>> >>* overlaps an LRG, you can choose to "pick" that consequence over other*
>>
>> >>* consequences.*
>>
>> >>
>>
>> >>
>>
>> >>
>>
>> >>* Any thoughts on if this could be added to the development for the next*
>>
>> >>* release cycle?*
>>
>> >>
>>
>> >>* Thanks for all of your help!*
>>
>> >>
>>
>> >>
>>
>> >>
>>
>> >>* Andrew R. Carson, Ph.D.*
>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140408/3a702908/attachment.html>


More information about the Dev mailing list