[ensembl-dev] Missing gene symbol for VEP with refseq

Wallace Ko myko at l3-bioinfo.com
Fri Jun 10 03:40:17 BST 2016


Thanks Will. It does not matter for now as we have workaround on it.

Regards,
Wallace Ko

On Thu, Jun 9, 2016 at 10:35 PM, Will McLaren <wm2 at ebi.ac.uk> wrote:

> Hi Wallace,
>
> Thank you so much for the detailed report and analysis, we really
> appreciate it when users take the time to step into the code.
>
> This will be fixed in the next version of VEP; if it is crucial to your
> analysis I can have the fix applied to the current release also?
>
> Regards
>
> Will McLaren
> Ensembl Variation
>
>
>
> On 7 June 2016 at 09:44, Wallace Ko <myko at l3-bioinfo.com> wrote:
>
>> Hi there,
>>
>> There is probably a bug on cache in Variation API with annotating with
>> refseq transcript.
>>
>> For a single variant A "1:g.121116121T>C", the online VEP (84) result is:
>>
>> http://grch37.ensembl.org/Homo_sapiens/Tools/VEP/Results?db=core;tl=jIjMm1R03KeDvYW4-1799042
>> Observe that the 2 rows with Feature type Transcript contains Symbol
>> SRGAP2C.
>>
>> When variant A is analysed together with another variant B
>> "1:g.120935661T>C", the result is:
>>
>> http://grch37.ensembl.org/Homo_sapiens/Tools/VEP/Results?db=core;tl=dNaMvffRoZ6scMMp-1799049;field1=Location;operator1=is;value1=1:121116121-121116121
>> Observe that Symbol is missing from second row with Feature type
>> Transcript.
>>
>> In the subroutine fetch_transcripts of
>> Bio::EnsEMBL::Variation::Utils::VEP (
>> https://github.com/Ensembl/ensembl-variation/blob/release/84/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm#L3738
>> ):
>>
>> my %seen_trs;
>>> ...
>>> foreach my $chr(...) {
>>>     foreach my $region(...) {
>>>         ...
>>>         my %refseq_stuff = ();
>>>         if(defined($tmp_cache->{$chr})) {
>>>             TRANSCRIPT: while(my $tr = shift @{$tmp_cache->{$chr}}) {
>>>                 ...
>>>                 if($seen_trs{$dbID}) {
>>>                     $count_duplicates++;
>>>                     next;
>>>                 }
>>>                 ...
>>>                 if(defined($config->{refseq}) ||
>>> defined($config->{merged})) {
>>>                     # put data to $refseq_stuff
>>>                 }
>>>                 $seen_trs{$dbID} = 1;
>>>                 ...
>>>             }
>>>         }
>>>         ...
>>>     }
>>> }
>>
>>
>> The scope of variable %seen_trs is for all regions is all chromosomes
>> while the scope of variable %refseq_stuff is for a single region only.
>> In the second analysis above, transcript for region of variant B is
>> loaded to cache and marked seen using %seen_trs. When it came to region
>> of variant A, cache loading is skipped according to %seen_trs but the
>> %refseq_stuff variable is actually empty for this new region.
>>
>> Regards,
>> Wallace Ko
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160610/32e19faa/attachment.html>


More information about the Dev mailing list