[ensembl-dev] update API issue

Kieron Taylor ktaylor at ebi.ac.uk
Mon Nov 18 17:41:47 GMT 2013


Dear Nathalie,

The Variation team had a small problem with release 73 concerning the 
coordinate system locations of these overlapping genes. Some of the 
problematic data will remain in e74 as well, but it will be consistently 
correct in e75.

To avoid this problem until e75, you can project the MHC results onto 
the Chromosomal coordinate system as follows:

$toplevel_vf = $vf->transform('toplevel');

This means the coordinates will be all consistent with each other.

You mention that you only really want the hits on the reference (as 
opposed to patches). You can identify the ones you want by asking the 
feature for its $feature->coord_system_name(). Filter out any that don't 
have 'chromosome' and you're there without needing to do any feature 
projection.

Note also that your print_genes() subroutine can be greatly shortened by 
asking $feature->slice->name

Regards,


-- 
Kieron Taylor PhD.
Ensembl Core team
EBI

On 18/11/2013 14:54, nconte at ebi.ac.uk wrote:
> Hi Andy,
> Thanks for your precious help, indeed the script will produce the
> coordinates you are finding - The script I sent you is part of a more
> complex script and I didn't want to send all of it as it is not finalized
> yet, but that is not solving the problem.
> In this big script,  I am using a subroutine for printing  gene
> annotations , these genes are called depending on their overlaps with a
> variation feature.  I can recreate the issue with new.pl which is closer
> to what I am doing-
> APOM is the gene overlapping rs3117582
> output
> perl new.pl rs3117582
> APOM:ENSG00000204444:6:31620193:31625987
> APOM:ENSG00000226215:HSCHR6_MHC_MCF_CTG1:2999873:3005667
> APOM:ENSG00000231974:HSCHR6_MHC_MANN_CTG1:2963082:2968876
> APOM:ENSG00000224290:HSCHR6_MHC_COX_CTG1:3129812:3135605
> APOM:ENSG00000206409:HSCHR6_MHC_QBL_CTG1:2913831:2919624
> APOM:ENSG00000227567:HSCHR6_MHC_DBB_CTG1:2905772:2911566
> BX511262.2:ENSG00000257670:HSCHR6_MHC_SSTO_CTG1:2892034:2998999
> APOM:ENSG00000235754:HSCHR6_MHC_SSTO_CTG1:2950992:2956785
>
>
> 2 issues there:
> I am printing wrong coordinates for MHC according to your previous script
> and I would like only the reference genome. ie only
> APOM:ENSG00000204444:6:31620193:31625987
>
> thanks for any tips
> Nat
>
>
>> Hi,
>>
>> I've run your code with APOM and got back the following
>>
>> APOM:ENSG00000204444:6:31620193:31625987,
>> APOM:ENSG00000235754:HSCHR6_MHC_SSTO:31610134:31615927,
>> APOM:ENSG00000206409:HSCHR6_MHC_QBL:31610434:31616227,
>> APOM:ENSG00000231974:HSCHR6_MHC_MANN:31659685:31665479,
>> APOM:ENSG00000224290:HSCHR6_MHC_COX:31607608:31613401,
>> APOM:ENSG00000227567:HSCHR6_MHC_DBB:31602375:31608169,
>> APOM:ENSG00000226215:HSCHR6_MHC_MCF:31696476:31702270,
>>
>> I'm really not seeing how you're managing to come to this location using
>> the code you've sent in. In fact if you look at the website for APOM on
>> MHC_MCF it's at position 31696476-31702270.
>>
>> http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000226215;r=HSCHR6_MHC_MCF:31696476-31702270
>>
>> Are you trying to project the feature to a lower level? I've attached the
>> script with some minor modifications to ensure it can run (there were some
>> issues) so you can confirm that I'm running the right thing. Is there any
>> other code you're not sharing with us?
>>
>> Andy
>>
>>
>>
>> On 15 Nov 2013, at 19:10, Nathalie Conte <nconte at ebi.ac.uk> wrote:
>>
>>> hi, sure
>>> example there
>>> APOM:ENSG00000226215:HSCHR6_MHC_MCF_CTG1:2999873:3005667,
>>> thanks
>>> Nat
>>> On 15 Nov 2013, at 16:53, Andy Yates <ayates at ebi.ac.uk> wrote:
>>>
>>>> Hi,
>>>>
>>>> Sorry could you send a HGNC symbol that causes the error please? I've
>>>> tried with a number of symbols including BRAF, BRCA2, ABO & HCG21 none
>>>> of which show the behaviour you're seeing. In fact running the code for
>>>> HCG21 gives me:
>>>>
>>>> HCG21:ENSG00000233529:6:30913756:30922639,
>>>>
>>>> If I take out the test for $gene->slice->is_reference() I get:
>>>>
>>>> HCG21:ENSG00000233529:6:30913756:30922639,
>>>> HCG21:ENSG00000235747:HSCHR6_MHC_DBB:30904474:30913356,
>>>> HCG21:ENSG00000230541:HSCHR6_MHC_QBL:30903331:30912213,
>>>> HCG21:ENSG00000223454:HSCHR6_MHC_COX:30903456:30912338,
>>>> HCG21:ENSG00000228448:HSCHR6_MHC_MCF:30992301:30992681,
>>>>
>>>> I'm just not seeing the issue you're reporting. Also the sequence
>>>> region name you gave is not a toplevel sequence. As you can see from
>>>> this output HSCHR6_MHC_MCF is the toplevel region; HSCHR6_MHC_MCF_CTG1
>>>> is an underlying supercontig and is *not* a toplevel region. We only
>>>> use the non-reference attribute when working with toplevel sequence
>>>> regions. That's why calling is_reference() on a slice pointing at
>>>> HSCHR6_MHC_MCF_CTG1
>>>>
>>>> Andy
>>>>
>>>>
>>>> On 15 Nov 2013, at 15:54, nconte at ebi.ac.uk wrote:
>>>>
>>>>> HI,
>>>>> This is the bit of the code which will get the is_reference test-
>>>>> if you replace $Ggene with any gene symbol from HGNC this should work.
>>>>> #!/usr/local/bin/perl
>>>>> use strict;
>>>>> use warnings;
>>>>>
>>>>> use Bio::EnsEMBL::Registry;
>>>>>
>>>>> Bio::EnsEMBL::Registry->load_registry_from_db(
>>>>> 	-host=>"ensembldb.ensembl.org", -user=>"anonymous",
>>>>>        -port=>'5306', 'db_version' => 73,);
>>>>> my $gene_adaptor=Bio::EnsEMBL::Registry->get_adaptor( 'human', 'Core',
>>>>> 'Gene' );
>>>>>
>>>>>
>>>>> my
>>>>> $all_genes=$gene_adaptor->fetch_all_by_external_name($Ggene,'HGNC'); #
>>>>> this is where I capture the ensembl genes from each key
>>>>> #######
>>>>> if (!scalar (@{$all_genes})) {print  $Ggene.'no ensembl gene object
>>>>> ',"\t","no ensembl variation no object","\t","no ensembl orthologues
>>>>> no
>>>>> object","\t"; next; } elsif  (scalar (@{$all_genes})){
>>>>> 	 foreach my $gene (@{$all_genes}){
>>>>> if (!$gene){print $Ggene.' '. 'no_ensembl_gene'; next} elsif
>>>>> 	 ( $gene && $gene->slice->is_reference ) {
>>>>> # no  genes which are on patches are removed only the ref assembly are
>>>>> kept , -!!!!
>>>>> #print   'GWAS_gene2: '.$Ggene,  "\t",defined($gene->external_name) ?
>>>>> print   defined($gene->external_name) ?
>>>>> $gene->external_name :'no_ensembl_gene',":", defined($gene->stable_id)
>>>>> ?
>>>>> $gene->stable_id : 'no_stableID',":", defined($gene->seq_region_name)
>>>>> ?
>>>>> $gene->seq_region_name:
>>>>> 'no_seq_region_name:',":",defined($gene->seq_region_start) ?
>>>>> $gene->seq_region_start:
>>>>> 'no_seq_region_start',":",defined($gene->seq_region_end) ?
>>>>> $gene->seq_region_end: 'no_seq_region_end',"," ;}
>>>>>
>>>>> Thanks
>>>>>
>>>>> Nat
>>>>>
>>>>>> Hi there,
>>>>>>
>>>>>> A reference sequence is detected by the non-existence of a non_ref
>>>>>> flag in
>>>>>> seq_region_attrib. I've done this query in 73:
>>>>>>
>>>>>> select *
>>>>>> from seq_region sr
>>>>>> join seq_region_attrib sra using (seq_region_id)
>>>>>> join attrib_type a using (attrib_type_id)
>>>>>> where sr.name = 'HSCHR6_MHC_MCF_CTG1'
>>>>>> and a.code = 'non_ref'
>>>>>>
>>>>>> And have got no rows back suggesting that
>>>>>> $gene->slice->is_reference()
>>>>>> when slice is HSCHR6_MHC_MCF_CTG1 will return false. Could you send
>>>>>> us
>>>>>> your code please?
>>>>>>
>>>>>> Andy
>>>>>>
>>>>>> On 15 Nov 2013, at 14:26, nconte at ebi.ac.uk wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>> I have been updating my ensembl API versions from 71 to 73 today,
>>>>>>> and
>>>>>>> reran a script to test it was working.
>>>>>>> In my script I am excluding genes  Ids for haplotypes /  LRG /
>>>>>>> pseudoautomosal
>>>>>>> or  assembly patches/fixes regions. I am using this test/method to
>>>>>>> exclude
>>>>>>> them
>>>>>>> (if $gene->slice->is_reference)
>>>>>>>
>>>>>>> I am now getting genes mapping on HSCHR6_MHC_MCF_CTG1 which were
>>>>>>> excluded
>>>>>>> on my last version of the API. Is the slice->is_reference
>>>>>>> deprecated?
>>>>>>> Thanks
>>>>>>> Nathalie
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/




More information about the Dev mailing list