[ensembl-dev] Missing gene symbol for VEP with refseq

Will McLaren wm2 at ebi.ac.uk
Thu Jun 9 15:35:03 BST 2016


Hi Wallace,

Thank you so much for the detailed report and analysis, we really
appreciate it when users take the time to step into the code.

This will be fixed in the next version of VEP; if it is crucial to your
analysis I can have the fix applied to the current release also?

Regards

Will McLaren
Ensembl Variation



On 7 June 2016 at 09:44, Wallace Ko <myko at l3-bioinfo.com> wrote:

> Hi there,
>
> There is probably a bug on cache in Variation API with annotating with
> refseq transcript.
>
> For a single variant A "1:g.121116121T>C", the online VEP (84) result is:
>
> http://grch37.ensembl.org/Homo_sapiens/Tools/VEP/Results?db=core;tl=jIjMm1R03KeDvYW4-1799042
> Observe that the 2 rows with Feature type Transcript contains Symbol
> SRGAP2C.
>
> When variant A is analysed together with another variant B
> "1:g.120935661T>C", the result is:
>
> http://grch37.ensembl.org/Homo_sapiens/Tools/VEP/Results?db=core;tl=dNaMvffRoZ6scMMp-1799049;field1=Location;operator1=is;value1=1:121116121-121116121
> Observe that Symbol is missing from second row with Feature type Transcript
> .
>
> In the subroutine fetch_transcripts of Bio::EnsEMBL::Variation::Utils::VEP
> (
> https://github.com/Ensembl/ensembl-variation/blob/release/84/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm#L3738
> ):
>
> my %seen_trs;
>> ...
>> foreach my $chr(...) {
>>     foreach my $region(...) {
>>         ...
>>         my %refseq_stuff = ();
>>         if(defined($tmp_cache->{$chr})) {
>>             TRANSCRIPT: while(my $tr = shift @{$tmp_cache->{$chr}}) {
>>                 ...
>>                 if($seen_trs{$dbID}) {
>>                     $count_duplicates++;
>>                     next;
>>                 }
>>                 ...
>>                 if(defined($config->{refseq}) ||
>> defined($config->{merged})) {
>>                     # put data to $refseq_stuff
>>                 }
>>                 $seen_trs{$dbID} = 1;
>>                 ...
>>             }
>>         }
>>         ...
>>     }
>> }
>
>
> The scope of variable %seen_trs is for all regions is all chromosomes
> while the scope of variable %refseq_stuff is for a single region only.
> In the second analysis above, transcript for region of variant B is loaded
> to cache and marked seen using %seen_trs. When it came to region of
> variant A, cache loading is skipped according to %seen_trs but the
> %refseq_stuff variable is actually empty for this new region.
>
> Regards,
> Wallace Ko
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160609/cfe514f1/attachment.html>


More information about the Dev mailing list