[ensembl-dev] Main gene vs. alternate

Mahmood Naderan mahmood.nt at gmail.com
Tue Nov 21 13:09:29 GMT 2017


Dear Mag,
The following code

my $gene_adaptor = $registry->get_adaptor('human', 'core', 'Gene');
my @genes = @{ $gene_adaptor->fetch_all_by_external_name("HLA-DRB1") };
while (my $gene = shift @genes) {
  my $d_id = $gene->display_xref->display_id;
  print "display_id=" . $d_id . " ";
  print "stable_id=" . $gene->stable_id . " ";
  print "is_reference=" . $gene->slice->is_reference() . "\n";
}

generates the following output


display_id=HLA-A stable_id=ENSG00000206503 is_reference=1
display_id=HLA-B stable_id=ENSG00000234745 is_reference=1
display_id=HLA-DRB1 stable_id=ENSG00000196126 is_reference=1
display_id=HLA-DRB5 stable_id=ENSG00000198502 is_reference=1
display_id=HLA-DRB3$0301 stable_id=ENSG00000230463 is_reference=0
display_id=HLA-B stable_id=ENSG00000224608 is_reference=0
display_id=DRB4 stable_id=ENSG00000231021 is_reference=0
display_id=HLA-DRB1 stable_id=ENSG00000206306 is_reference=0
display_id=HLA-B stable_id=ENSG00000232126 is_reference=0
display_id=HLA-A stable_id=ENSG00000206505 is_reference=0
display_id=HLA-A stable_id=ENSG00000231834 is_reference=0
display_id=HLA-DRB1 stable_id=ENSG00000228080 is_reference=0
display_id=DRB4 stable_id=ENSG00000227826 is_reference=0
display_id=HLA-B stable_id=ENSG00000223532 is_reference=0
display_id=HLA-B stable_id=ENSG00000206450 is_reference=0
display_id=HLA-DRB1 stable_id=ENSG00000206240 is_reference=0
display_id=HLA-DRB1 stable_id=ENSG00000236884 is_reference=0
display_id=HLA-A stable_id=ENSG00000227715 is_reference=0
display_id=HLA-A stable_id=ENSG00000229215 is_reference=0
display_id=HLA-A stable_id=ENSG00000235657 is_reference=0
display_id=HLA-DRB3 stable_id=ENSG00000196101 is_reference=0
display_id=HLA-A stable_id=ENSG00000223980 is_reference=0
display_id=HLA-DRB4 stable_id=ENSG00000227357 is_reference=0
display_id=HLA-A stable_id=ENSG00000224320 is_reference=0
display_id=HLA-B stable_id=ENSG00000228964 is_reference=0
display_id=HLA-DRB1 stable_id=ENSG00000229074 is_reference=0


Some points:
1- ENSG00000196126 is the reference and I verified that with the web
frontend.
2- HLA-DRB5 (ENSG00000198502) is retrieved as a reference while I queried
HLA-DRB1. Is that something important that I have to consider? Is it safe
to ignore that and stick to the ENSG00000196126.
3- There are many alternatives for what I queried. You said that it is
better to consider the alternatives. So, which one to pick up?

Regards,
Mahmood



On Mon, Nov 20, 2017 at 12:16 PM, mag <mr6 at ebi.ac.uk> wrote:

> Hi Mahmood,
>
> What you call the main instance of a gene is a gene located on a
> chromosome or scaffold. These can be identified with the following API call:
> $gene->slice->is_reference
>
> The "Human alternative sequence Gene" are genes located on alternate
> sequences, either patch fixes or haplotypes.
> In some cases, there is a sequencing error on the reference chromosome and
> the gene on the alternate sequence is a better choice.
> You can select these with the following API call:
>
> my $aag_adaptor = Bio::EnsEMBL::Registry->get_DBAdaptor("Human","core","
> AltAlleleGroup");
> my $aag = $aag_adaptor->fetch_Group_by_dbID($gene->dbID);
> my $reference_gene = $aag->get_representative_Gene;
>
> If you are looking at retrieving only one version of a gene name and are
> looking for the most representative, I would recommend the second solution
> rather than arbitrarily selecting the one on the reference chromosome.
>
>
> I hope this helps,
> Magali
>
>
> On 18/11/2017 17:37, Mahmood Naderan wrote:
>
> Hi,
> I use the following code to retrieve all instances of a gene name. Then I
> compare each display_id with the gene name that I have and if they match, I
> go further to process them.
>
> my @genes = @{ $gene_adaptor->fetch_all_by_external_name('HLA-DRB1') };
> while (my $gene = shift @genes) {
>   my $big_string = $gene->display_xref->display_id;
>   my $pat = "HLA-DRB1";
>   my $match_found = $big_string =~ /$pat/i;
>   if ($match_found) {
>      ..
>   }
> }
>
> Problem is that I get multiple items and they display_id are equal to what
> I have (HLA-DRB1). On the website, the main instance is named and the
> others are named "Human Alternative sequence Gene". I don't want to store
> them. I just need the main "Human Gene".
>
> What is the correct attribute to distinguish that?
>
>
> Regards,
> Mahmood
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20171121/33547458/attachment.html>


More information about the Dev mailing list