[ensembl-dev] Dev Digest, Vol 23, Issue 5

Duarte Molha Duarte.Molha at ogt.co.uk
Wed May 2 11:50:51 BST 2012


Thank you Will. That is exactly what I was hoping for!

Thank you very much. I will eagerly wait for the new version release :)
If it is not too much to ask would it be possible to also export the complete genotype associated with the variation for plug-in access and the FORMAT-field?

GT:AD:DP:GQ:PL	1/1:0,40:40:99:1456,114,0

It would be also very useful since I do analysis on the variant calls using the quality metrics associated with them.

Best regards

Duarte

-----Original Message-----
From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf Of dev-request at ensembl.org
Sent: 02 May 2012 11:41
To: dev at ensembl.org
Subject: Dev Digest, Vol 23, Issue 5

Send Dev mailing list submissions to
	dev at ensembl.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://lists.ensembl.org/mailman/listinfo/dev
or, via email, send a message with subject or body 'help' to
	dev-request at ensembl.org

You can reach the person managing the list at
	dev-owner at ensembl.org

When replying, please edit your Subject line so it is more specific than "Re: Contents of Dev digest..."


Today's Topics:

   1. Re: Question regarding the varian_effect_predictor VCF
      support for multiple samples (Will McLaren)


----------------------------------------------------------------------

Message: 1
Date: Wed, 2 May 2012 11:40:43 +0100
From: Will McLaren <wm2 at ebi.ac.uk>
Subject: Re: [ensembl-dev] Question regarding the
	varian_effect_predictor VCF support for multiple samples
To: Ensembl developers list <dev at ensembl.org>
Message-ID:
	<CAMVEDX3YfPzEVLT5yOYt+jH41rX=zBABTJ8MWsQx-P2QkPUxzQ at mail.gmail.com>
Content-Type: text/plain; charset="windows-1252"

Hi Duarte,

I think if I'm understanding you correctly that I've implemented such a feature for v2.5 of the VEP, due for release next week.

You will be able to supply an individual ID, more than one as a comma-separated list, or "all" to consider each individually e.g.:

perl variant_effect_predictor.pl -individual MYSAMPLEID1,MYSAMPLEID2

or

perl variant_effect_predictor.pl -individual all

and the consequences will be calculated only where the genotype for that individual is non-reference (either hetero- or homozygous). The individual ID will appear as a field in the "Extra" column. Note that unless you are using VCF output, a locus that is homozygous for the reference allele will not be considered as a variant and will not appear in the output (if you are using VCF output, it will appear but with no consequence data added).

In terms of plugins, what is actually happening internally is that for each individual, a copy of the variation feature object is being created with an appropriate allele string. You can access this in a plugin as follows:

sub run {
    my ($self, $tva) = @_;

    my $tv = $tva->transcript_variation;
    my $vf = $tv->variation_feature;

    my $individual_id = $vf->{individual};

....

}

Hope this helps!

Cheers

Will McLaren
Ensembl Variation

On 2 May 2012 11:18, Duarte Molha <Duarte.Molha at ogt.co.uk> wrote:

> Dear Developers****
>
> ** **
>
> I have been playing around with the latest version of the VEP and I 
> would like to congratulate you for the many nice features you have 
> been able to
> include.****
>
> I particularly like the new plug-in feature support. This will allow 
> me to develop new features into my analysis pipeline without having to 
> hack your code to much J.****
>
> ** **
>
> There is however a very important features I would love to be included 
> in your VEP ? VCF with multiple sample support.****
>
> ** **
>
> I had to change a lot of your code in a previous version of VEP in 
> order to get some sort of support for this and it becomes very 
> complicated to be able to merge what I have done with  your earlier 
> version of VEP into the new versions because the code is evolving very 
> fast.****
>
> ** **
>
> I noticed that you say that you now support all fields on a VCF. Does 
> this mean that your script is reading in the sample fields but 
> disregards them for the analysis?****
>
> It would be great if the VEP could do the analysis of each variant and 
> for each allelic substitution it could include the sample information 
> for wish it is relevant.****
>
> ** **
>
> Here is an example of what your code outputs and what would I think 
> would be very usefull to have it do:****
>
> ** **
>
> Input VCF entry:****
>
> #CHROM             POS        ID            REF         ALT
> QUAL    FILTER   INFO      FORMAT              sample_01
> sample_02                sample_03****
>
> 1              50311454             .               G
> A             5322.41 PASS
> AC=3;AF=0.500;AN=6;BaseQRankSum=5.991;DP=271;Dels=0.00;FS=3.551;HRun=0;HaplotypeScore=2.6095;MQ=59.14;MQ0=0;MQRankSum=0.759;QD=28.46;ReadPosRankSum=-0.332;SB=-2325.79;SF=0,1,2
> GT:AD:DP:GQ:PL              1/1:0,40:40:99:1456,114,0
> 0/0:37,0:37:99:0,102,1245                0/1:23,26:49:99:839,0,617****
>
> ** **
>
> Current OUTPUT:****
>
> ** **
>
> #Uploaded_variatio****
>
> Location****
>
> Allele****
>
> Gene****
>
> Feature****
>
> Feature_type****
>
> Consequence****
>
> cDNA_position****
>
> CDS_position****
>
> Protein_position****
>
> Amino_acids****
>
> Codons****
>
> Existing_variation****
>
> Extra****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> ENSG00000186094****
>
> ENST00000371839****
>
> Transcript****
>
> INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> ENSP=ENSP00000360905;HGVSc=ENST00000371839.1:c.157+5614C>T;INTRON=2/13
> ****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> ENSG00000215887****
>
> ENST00000502859****
>
> Transcript****
>
> WITHIN_NON_CODING_GENE****
>
> 1348****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> HGVSc=ENST00000502859.1:1348G>A;EXON=3/3****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> ENSG00000186094****
>
> ENST00000411952****
>
> Transcript****
>
> INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> ENSP=ENSP00000411423;HGVSc=ENST00000411952.2:c.139+5614C>T;INTRON=2/14
> ****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> ENSG00000186094****
>
> ENST00000497451****
>
> Transcript****
>
> WITHIN_NON_CODING_GENE,INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> HGVSc=ENST00000497451.1:123+5614C>T;INTRON=1/2****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> ENSG00000186094****
>
> ENST00000371838****
>
> Transcript****
>
> INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> ENSP=ENSP00000360904;HGVSc=ENST00000371838.1:c.157+5614C>T;INTRON=2/8*
> ***
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> ENSG00000186094****
>
> ENST00000371836****
>
> Transcript****
>
> INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> ENSP=ENSP00000360902;HGVSc=ENST00000371836.1:c.157+5614C>T;INTRON=2/6*
> ***
>
> ** **
>
> ** **
>
> Same output but containing sample information for non-reference 
> samples:**
> **
>
> ** **
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> Sample_01****
>
> GT:AD:DP:GQ:PL****
>
> 1/1:0,40:40:99:1456,114,0****
>
> ENSG00000186094****
>
> ENST00000371839****
>
> Transcript****
>
> INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> ENSP=ENSP00000360905;HGVSc=ENST00000371839.1:c.157+5614C>T;INTRON=2/13
> ****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> Sample_01****
>
> GT:AD:DP:GQ:PL****
>
> 1/1:0,40:40:99:1456,114,0****
>
> ENSG00000215887****
>
> ENST00000502859****
>
> Transcript****
>
> WITHIN_NON_CODING_GENE****
>
> 1348****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> HGVSc=ENST00000502859.1:1348G>A;EXON=3/3****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> Sample_01****
>
> GT:AD:DP:GQ:PL****
>
> 1/1:0,40:40:99:1456,114,0****
>
> ENSG00000186094****
>
> ENST00000411952****
>
> Transcript****
>
> INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> ENSP=ENSP00000411423;HGVSc=ENST00000411952.2:c.139+5614C>T;INTRON=2/14
> ****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> Sample_01****
>
> GT:AD:DP:GQ:PL****
>
> 1/1:0,40:40:99:1456,114,0****
>
> ENSG00000186094****
>
> ENST00000497451****
>
> Transcript****
>
> WITHIN_NON_CODING_GENE,INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> HGVSc=ENST00000497451.1:123+5614C>T;INTRON=1/2****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> Sample_01****
>
> GT:AD:DP:GQ:PL****
>
> 1/1:0,40:40:99:1456,114,0****
>
> ENSG00000186094****
>
> ENST00000371838****
>
> Transcript****
>
> INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> ENSP=ENSP00000360904;HGVSc=ENST00000371838.1:c.157+5614C>T;INTRON=2/8*
> ***
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> Sample_01****
>
> GT:AD:DP:GQ:PL****
>
> 1/1:0,40:40:99:1456,114,0****
>
> ENSG00000186094****
>
> ENST00000371836****
>
> Transcript****
>
> INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> ENSP=ENSP00000360902;HGVSc=ENST00000371836.1:c.157+5614C>T;INTRON=2/6*
> ***
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> Sample_03****
>
> GT:AD:DP:GQ:PL****
>
> 0/1:23,26:49:99:839,0,617****
>
> ENSG00000186094****
>
> ENST00000371839****
>
> Transcript****
>
> INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> ENSP=ENSP00000360905;HGVSc=ENST00000371839.1:c.157+5614C>T;INTRON=2/13
> ****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> Sample_03****
>
> GT:AD:DP:GQ:PL****
>
> 0/1:23,26:49:99:839,0,617****
>
> ENSG00000215887****
>
> ENST00000502859****
>
> Transcript****
>
> WITHIN_NON_CODING_GENE****
>
> 1348****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> HGVSc=ENST00000502859.1:1348G>A;EXON=3/3****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> Sample_03****
>
> GT:AD:DP:GQ:PL****
>
> 0/1:23,26:49:99:839,0,617****
>
> ENSG00000186094****
>
> ENST00000411952****
>
> Transcript****
>
> INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> ENSP=ENSP00000411423;HGVSc=ENST00000411952.2:c.139+5614C>T;INTRON=2/14
> ****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> Sample_03****
>
> GT:AD:DP:GQ:PL****
>
> 0/1:23,26:49:99:839,0,617****
>
> ENSG00000186094****
>
> ENST00000497451****
>
> Transcript****
>
> WITHIN_NON_CODING_GENE,INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> HGVSc=ENST00000497451.1:123+5614C>T;INTRON=1/2****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> Sample_03****
>
> GT:AD:DP:GQ:PL****
>
> 0/1:23,26:49:99:839,0,617****
>
> ENSG00000186094****
>
> ENST00000371838****
>
> Transcript****
>
> INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> ENSP=ENSP00000360904;HGVSc=ENST00000371838.1:c.157+5614C>T;INTRON=2/8*
> ***
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> Sample_03****
>
> GT:AD:DP:GQ:PL****
>
> 0/1:23,26:49:99:839,0,617****
>
> ENSG00000186094****
>
> ENST00000371836****
>
> Transcript****
>
> INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> ENSP=ENSP00000360902;HGVSc=ENST00000371836.1:c.157+5614C>T;INTRON=2/6*
> ***
>
> ** **
>
> ** **
>
> I would do this myself if there was a way for the  plug-in feature to 
> give be the sample information for each variant.****
>
> Any ideas how this can be accomplished?****
>
> ** **
>
> Best regards, ****
>
> ** **
>
> Duarte Molha****
>
> ** **
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ensembl.org/pipermail/dev/attachments/20120502/e3f95725/attachment.htm>

------------------------------

_______________________________________________
Dev mailing list
Dev at ensembl.org
http://lists.ensembl.org/mailman/listinfo/dev


End of Dev Digest, Vol 23, Issue 5
**********************************




More information about the Dev mailing list