[ensembl-dev] Missing gene symbol for VEP with refseq
Wallace Ko
myko at l3-bioinfo.com
Fri Jun 10 03:40:17 BST 2016
Thanks Will. It does not matter for now as we have workaround on it.
Regards,
Wallace Ko
On Thu, Jun 9, 2016 at 10:35 PM, Will McLaren <wm2 at ebi.ac.uk> wrote:
> Hi Wallace,
>
> Thank you so much for the detailed report and analysis, we really
> appreciate it when users take the time to step into the code.
>
> This will be fixed in the next version of VEP; if it is crucial to your
> analysis I can have the fix applied to the current release also?
>
> Regards
>
> Will McLaren
> Ensembl Variation
>
>
>
> On 7 June 2016 at 09:44, Wallace Ko <myko at l3-bioinfo.com> wrote:
>
>> Hi there,
>>
>> There is probably a bug on cache in Variation API with annotating with
>> refseq transcript.
>>
>> For a single variant A "1:g.121116121T>C", the online VEP (84) result is:
>>
>> http://grch37.ensembl.org/Homo_sapiens/Tools/VEP/Results?db=core;tl=jIjMm1R03KeDvYW4-1799042
>> Observe that the 2 rows with Feature type Transcript contains Symbol
>> SRGAP2C.
>>
>> When variant A is analysed together with another variant B
>> "1:g.120935661T>C", the result is:
>>
>> http://grch37.ensembl.org/Homo_sapiens/Tools/VEP/Results?db=core;tl=dNaMvffRoZ6scMMp-1799049;field1=Location;operator1=is;value1=1:121116121-121116121
>> Observe that Symbol is missing from second row with Feature type
>> Transcript.
>>
>> In the subroutine fetch_transcripts of
>> Bio::EnsEMBL::Variation::Utils::VEP (
>> https://github.com/Ensembl/ensembl-variation/blob/release/84/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm#L3738
>> ):
>>
>> my %seen_trs;
>>> ...
>>> foreach my $chr(...) {
>>> foreach my $region(...) {
>>> ...
>>> my %refseq_stuff = ();
>>> if(defined($tmp_cache->{$chr})) {
>>> TRANSCRIPT: while(my $tr = shift @{$tmp_cache->{$chr}}) {
>>> ...
>>> if($seen_trs{$dbID}) {
>>> $count_duplicates++;
>>> next;
>>> }
>>> ...
>>> if(defined($config->{refseq}) ||
>>> defined($config->{merged})) {
>>> # put data to $refseq_stuff
>>> }
>>> $seen_trs{$dbID} = 1;
>>> ...
>>> }
>>> }
>>> ...
>>> }
>>> }
>>
>>
>> The scope of variable %seen_trs is for all regions is all chromosomes
>> while the scope of variable %refseq_stuff is for a single region only.
>> In the second analysis above, transcript for region of variant B is
>> loaded to cache and marked seen using %seen_trs. When it came to region
>> of variant A, cache loading is skipped according to %seen_trs but the
>> %refseq_stuff variable is actually empty for this new region.
>>
>> Regards,
>> Wallace Ko
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160610/32e19faa/attachment.html>
More information about the Dev
mailing list