[ensembl-dev] update API issue

nconte at ebi.ac.uk nconte at ebi.ac.uk
Mon Nov 18 14:54:22 GMT 2013


Hi Andy,
Thanks for your precious help, indeed the script will produce the
coordinates you are finding - The script I sent you is part of a more
complex script and I didn't want to send all of it as it is not finalized
yet, but that is not solving the problem.
In this big script,  I am using a subroutine for printing  gene
annotations , these genes are called depending on their overlaps with a
variation feature.  I can recreate the issue with new.pl which is closer
to what I am doing-
APOM is the gene overlapping rs3117582
output
perl new.pl rs3117582
APOM:ENSG00000204444:6:31620193:31625987
APOM:ENSG00000226215:HSCHR6_MHC_MCF_CTG1:2999873:3005667
APOM:ENSG00000231974:HSCHR6_MHC_MANN_CTG1:2963082:2968876
APOM:ENSG00000224290:HSCHR6_MHC_COX_CTG1:3129812:3135605
APOM:ENSG00000206409:HSCHR6_MHC_QBL_CTG1:2913831:2919624
APOM:ENSG00000227567:HSCHR6_MHC_DBB_CTG1:2905772:2911566
BX511262.2:ENSG00000257670:HSCHR6_MHC_SSTO_CTG1:2892034:2998999
APOM:ENSG00000235754:HSCHR6_MHC_SSTO_CTG1:2950992:2956785


2 issues there:
I am printing wrong coordinates for MHC according to your previous script
and I would like only the reference genome. ie only
APOM:ENSG00000204444:6:31620193:31625987

thanks for any tips
Nat


> Hi,
>
> I've run your code with APOM and got back the following
>
> APOM:ENSG00000204444:6:31620193:31625987,
> APOM:ENSG00000235754:HSCHR6_MHC_SSTO:31610134:31615927,
> APOM:ENSG00000206409:HSCHR6_MHC_QBL:31610434:31616227,
> APOM:ENSG00000231974:HSCHR6_MHC_MANN:31659685:31665479,
> APOM:ENSG00000224290:HSCHR6_MHC_COX:31607608:31613401,
> APOM:ENSG00000227567:HSCHR6_MHC_DBB:31602375:31608169,
> APOM:ENSG00000226215:HSCHR6_MHC_MCF:31696476:31702270,
>
> I'm really not seeing how you're managing to come to this location using
> the code you've sent in. In fact if you look at the website for APOM on
> MHC_MCF it's at position 31696476-31702270.
>
> http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000226215;r=HSCHR6_MHC_MCF:31696476-31702270
>
> Are you trying to project the feature to a lower level? I've attached the
> script with some minor modifications to ensure it can run (there were some
> issues) so you can confirm that I'm running the right thing. Is there any
> other code you're not sharing with us?
>
> Andy
>
>
>
> On 15 Nov 2013, at 19:10, Nathalie Conte <nconte at ebi.ac.uk> wrote:
>
>> hi, sure
>> example there
>> APOM:ENSG00000226215:HSCHR6_MHC_MCF_CTG1:2999873:3005667,
>> thanks
>> Nat
>> On 15 Nov 2013, at 16:53, Andy Yates <ayates at ebi.ac.uk> wrote:
>>
>>> Hi,
>>>
>>> Sorry could you send a HGNC symbol that causes the error please? I've
>>> tried with a number of symbols including BRAF, BRCA2, ABO & HCG21 none
>>> of which show the behaviour you're seeing. In fact running the code for
>>> HCG21 gives me:
>>>
>>> HCG21:ENSG00000233529:6:30913756:30922639,
>>>
>>> If I take out the test for $gene->slice->is_reference() I get:
>>>
>>> HCG21:ENSG00000233529:6:30913756:30922639,
>>> HCG21:ENSG00000235747:HSCHR6_MHC_DBB:30904474:30913356,
>>> HCG21:ENSG00000230541:HSCHR6_MHC_QBL:30903331:30912213,
>>> HCG21:ENSG00000223454:HSCHR6_MHC_COX:30903456:30912338,
>>> HCG21:ENSG00000228448:HSCHR6_MHC_MCF:30992301:30992681,
>>>
>>> I'm just not seeing the issue you're reporting. Also the sequence
>>> region name you gave is not a toplevel sequence. As you can see from
>>> this output HSCHR6_MHC_MCF is the toplevel region; HSCHR6_MHC_MCF_CTG1
>>> is an underlying supercontig and is *not* a toplevel region. We only
>>> use the non-reference attribute when working with toplevel sequence
>>> regions. That's why calling is_reference() on a slice pointing at
>>> HSCHR6_MHC_MCF_CTG1
>>>
>>> Andy
>>>
>>>
>>> On 15 Nov 2013, at 15:54, nconte at ebi.ac.uk wrote:
>>>
>>>> HI,
>>>> This is the bit of the code which will get the is_reference test-
>>>> if you replace $Ggene with any gene symbol from HGNC this should work.
>>>> #!/usr/local/bin/perl
>>>> use strict;
>>>> use warnings;
>>>>
>>>> use Bio::EnsEMBL::Registry;
>>>>
>>>> Bio::EnsEMBL::Registry->load_registry_from_db(
>>>> 	-host=>"ensembldb.ensembl.org", -user=>"anonymous",
>>>>       -port=>'5306', 'db_version' => 73,);
>>>> my $gene_adaptor=Bio::EnsEMBL::Registry->get_adaptor( 'human', 'Core',
>>>> 'Gene' );
>>>>
>>>>
>>>> my
>>>> $all_genes=$gene_adaptor->fetch_all_by_external_name($Ggene,'HGNC'); #
>>>> this is where I capture the ensembl genes from each key
>>>> #######
>>>> if (!scalar (@{$all_genes})) {print  $Ggene.'no ensembl gene object
>>>> ',"\t","no ensembl variation no object","\t","no ensembl orthologues
>>>> no
>>>> object","\t"; next; } elsif  (scalar (@{$all_genes})){
>>>> 	 foreach my $gene (@{$all_genes}){
>>>> if (!$gene){print $Ggene.' '. 'no_ensembl_gene'; next} elsif
>>>> 	 ( $gene && $gene->slice->is_reference ) {
>>>> # no  genes which are on patches are removed only the ref assembly are
>>>> kept , -!!!!
>>>> #print   'GWAS_gene2: '.$Ggene,  "\t",defined($gene->external_name) ?
>>>> print   defined($gene->external_name) ?
>>>> $gene->external_name :'no_ensembl_gene',":", defined($gene->stable_id)
>>>> ?
>>>> $gene->stable_id : 'no_stableID',":", defined($gene->seq_region_name)
>>>> ?
>>>> $gene->seq_region_name:
>>>> 'no_seq_region_name:',":",defined($gene->seq_region_start) ?
>>>> $gene->seq_region_start:
>>>> 'no_seq_region_start',":",defined($gene->seq_region_end) ?
>>>> $gene->seq_region_end: 'no_seq_region_end',"," ;}
>>>>
>>>> Thanks
>>>>
>>>> Nat
>>>>
>>>>> Hi there,
>>>>>
>>>>> A reference sequence is detected by the non-existence of a non_ref
>>>>> flag in
>>>>> seq_region_attrib. I've done this query in 73:
>>>>>
>>>>> select *
>>>>> from seq_region sr
>>>>> join seq_region_attrib sra using (seq_region_id)
>>>>> join attrib_type a using (attrib_type_id)
>>>>> where sr.name = 'HSCHR6_MHC_MCF_CTG1'
>>>>> and a.code = 'non_ref'
>>>>>
>>>>> And have got no rows back suggesting that
>>>>> $gene->slice->is_reference()
>>>>> when slice is HSCHR6_MHC_MCF_CTG1 will return false. Could you send
>>>>> us
>>>>> your code please?
>>>>>
>>>>> Andy
>>>>>
>>>>> On 15 Nov 2013, at 14:26, nconte at ebi.ac.uk wrote:
>>>>>
>>>>>> Hi,
>>>>>> I have been updating my ensembl API versions from 71 to 73 today,
>>>>>> and
>>>>>> reran a script to test it was working.
>>>>>> In my script I am excluding genes  Ids for haplotypes /  LRG /
>>>>>> pseudoautomosal
>>>>>> or  assembly patches/fixes regions. I am using this test/method to
>>>>>> exclude
>>>>>> them
>>>>>> (if $gene->slice->is_reference)
>>>>>>
>>>>>> I am now getting genes mapping on HSCHR6_MHC_MCF_CTG1 which were
>>>>>> excluded
>>>>>> on my last version of the API. Is the slice->is_reference
>>>>>> deprecated?
>>>>>> Thanks
>>>>>> Nathalie
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: new.pl
Type: application/x-perl
Size: 1840 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20131118/d5d9ac26/attachment.pl>


More information about the Dev mailing list