[ensembl-dev] Missmatch from database and website

mag mr6 at ebi.ac.uk
Thu Jul 20 09:31:01 BST 2017


Hi Mahmood,

We map a large number of external references to Ensembl features.
These can be proteins (eg UniProt), mRNAs (eg RefSeq), non coding RNAs 
(RFAM, miRBase), as well as a number of annotations, for example 
aberrant sites (DBASS) or pathways (Reactome).

Some of these have gene symbols associated with them and we use those 
links to name our genes. For human, this will generally be HGNC, as this 
is the official nomenclature committee.
This means the most trusted, confident link will be used as what we call 
the display_xref, to assign the gene name. There might still be some 
other links which could be used as gene symbol but are of lower 
priority. For example, we can have an HGNC symbol as well as an 
EntrezGene link.

As a result, fetch_all_by_display_name will only use external references 
used for the selected naming symbol, usually HGNC, while 
fetch_all_by_external_name will query across all external references 
associated with the Ensembl gene.


Hope that helps,
Magali

On 20/07/2017 09:21, Mahmood Naderan wrote:
> Hi Mag,
> Some of my questions were answered, though some new questions arose. 
> For example, what is the difference between fetch_all_by_external_name 
> and fetch_all_by_display_name. The descriptions in core document seems 
> to be similar and I can not find where should I use the first and 
> where should I use the second.
>
>
> Regards,
> Mahmood
>
>
>
> On Mon, Jul 17, 2017 at 7:02 PM, mag <mr6 at ebi.ac.uk 
> <mailto:mr6 at ebi.ac.uk>> wrote:
>
>     Hi Mahmood,
>
>     The fetch_all_by_external_name returns a list of genes for which
>     atxn3 is an associated link.
>     For GRCh37, there are two genes which qualify, as can be seen on
>     the search page:
>     http://grch37.ensembl.org/Homo_sapiens/Search/Results?q=atxn3;site=ensembl_all;page=1;facet_feature_type=Gene;facet_species=Human
>     <http://grch37.ensembl.org/Homo_sapiens/Search/Results?q=atxn3;site=ensembl_all;page=1;facet_feature_type=Gene;facet_species=Human>
>     If you check the second element of the list, you will get
>     ENSG00000066427
>
>     For ENSG00000259634, atxn3 is not the main display name, but it
>     has a link to the corresponding NCBIgene entry for atxn3.
>     http://grch37.ensembl.org/Homo_sapiens/Gene/Matches?db=core;g=ENSG00000259634;r=14:92523341-92575863;t=ENST00000558190
>     <http://grch37.ensembl.org/Homo_sapiens/Gene/Matches?db=core;g=ENSG00000259634;r=14:92523341-92575863;t=ENST00000558190>
>
>     If you are only interested in genes for which atxn3 is the chosen
>     symbol, you can use the fetch_all_by_display_label method instead.
>
>     However, please be aware that the fetch_all_by_display_label will
>     still return a list of genes, which could have more than one element.
>     For example, two genes can share the same name if one is on the
>     reference while the other one is on a haplotype.
>     There are also cases where a name is misassigned to a gene,
>     resulting in a duplication. This can happen when two genes are
>     overlapping.
>
>     Because of this, I would recommend looping through the resulting
>     list rather than assume the first result is the one you want.
>     You can then check for various gene attributes to ensure this is
>     the one you expect.
>
>
>     Hope that helps,
>     Magali
>
>
>
>     On 15/07/2017 12:55, Mahmood Naderan wrote:
>>     I have an update that may shed a light but I cannot figure out.
>>     With the command in my previous email, I see that the stableID is
>>     ENSG00000259634. As I enter this ID in the web site, I see
>>
>>     Gene: RP11-529H20.5 ENSG00000259634  . Location Chromosome 14:
>>     92,524,896-92,525,877 reverse strand.
>>
>>     As you can see the start and end numbers matches with my previous
>>     email and its name is not ATXN3 which I requested in the command.
>>     So, the question is that why fetch_all_by_external_name("atxn3")
>>     returns that.
>>
>>     In my previous questions, Emily pointed that function may returns
>>     LRGs. For me it is hard to understand since I am not an expert in
>>     that field. I want to the get the main gene and not anything else.
>>
>>     Regards,
>>     Mahmood
>>
>>
>>
>>     On Sat, Jul 15, 2017 at 2:15 PM, Mahmood Naderan
>>     <mahmood.nt at gmail.com <mailto:mahmood.nt at gmail.com>> wrote:
>>
>>         Hi,
>>         With this code
>>
>>           my @genes = @{
>>         $gene_adaptor->fetch_all_by_external_name("atxn3) };
>>           my $gene  = @genes[0];
>>           my $start = $gene->start();
>>           my $end   = $gene->end();
>>
>>         I see that
>>         start=92524896
>>         end=92525877
>>
>>         However, from the website, I see
>>           Chromosome 14: 92,524,896-92,572,965
>>
>>         As you can see, the end numbers are different.
>>         http://grch37.ensembl.org/Homo_sapiens/Gene/Sequence?db=core;g=ENSG00000066427;r=14:92524896-92572965
>>         <http://grch37.ensembl.org/Homo_sapiens/Gene/Sequence?db=core;g=ENSG00000066427;r=14:92524896-92572965>
>>
>>
>>         Is there any reason for that?
>>
>>         Regards,
>>         Mahmood
>>
>>
>>
>>
>>
>>     _______________________________________________
>>     Dev mailing listDev at ensembl.org <mailto:Dev at ensembl.org>
>>     Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>>     <http://lists.ensembl.org/mailman/listinfo/dev>
>>     Ensembl Blog:http://www.ensembl.info/
>
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     <http://lists.ensembl.org/mailman/listinfo/dev>
>     Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20170720/932b91fe/attachment.html>


More information about the Dev mailing list