[ensembl-dev] Missmatch from database and website

Mahmood Naderan mahmood.nt at gmail.com
Mon Aug 7 19:44:36 BST 2017


Dear Mag,
Regarding your explanation, I understand most of it (as I said before I am
not expert in this field) but can you answer this:
What information will be missed if I use fetch_all_by_external_name and
what information will be missed if I use fetch_all_by_display_name?

Regards,
Mahmood



On Thu, Jul 20, 2017 at 1:01 PM, mag <mr6 at ebi.ac.uk> wrote:

> Hi Mahmood,
>
> We map a large number of external references to Ensembl features.
> These can be proteins (eg UniProt), mRNAs (eg RefSeq), non coding RNAs
> (RFAM, miRBase), as well as a number of annotations, for example aberrant
> sites (DBASS) or pathways (Reactome).
>
> Some of these have gene symbols associated with them and we use those
> links to name our genes. For human, this will generally be HGNC, as this is
> the official nomenclature committee.
> This means the most trusted, confident link will be used as what we call
> the display_xref, to assign the gene name. There might still be some other
> links which could be used as gene symbol but are of lower priority. For
> example, we can have an HGNC symbol as well as an EntrezGene link.
>
> As a result, fetch_all_by_display_name will only use external references
> used for the selected naming symbol, usually HGNC, while
> fetch_all_by_external_name will query across all external references
> associated with the Ensembl gene.
>
>
> Hope that helps,
> Magali
>
>
> On 20/07/2017 09:21, Mahmood Naderan wrote:
>
> Hi Mag,
> Some of my questions were answered, though some new questions arose. For
> example, what is the difference between fetch_all_by_external_name and
> fetch_all_by_display_name. The descriptions in core document seems to be
> similar and I can not find where should I use the first and where should I
> use the second.
>
>
> Regards,
> Mahmood
>
>
>
> On Mon, Jul 17, 2017 at 7:02 PM, mag <mr6 at ebi.ac.uk> wrote:
>
>> Hi Mahmood,
>>
>> The fetch_all_by_external_name returns a list of genes for which atxn3 is
>> an associated link.
>> For GRCh37, there are two genes which qualify, as can be seen on the
>> search page:
>> http://grch37.ensembl.org/Homo_sapiens/Search/Results?q=atxn
>> 3;site=ensembl_all;page=1;facet_feature_type=Gene;facet_species=Human
>> If you check the second element of the list, you will get ENSG00000066427
>>
>> For ENSG00000259634, atxn3 is not the main display name, but it has a
>> link to the corresponding NCBIgene entry for atxn3.
>> http://grch37.ensembl.org/Homo_sapiens/Gene/Matches?db=core;
>> g=ENSG00000259634;r=14:92523341-92575863;t=ENST00000558190
>>
>> If you are only interested in genes for which atxn3 is the chosen symbol,
>> you can use the fetch_all_by_display_label method instead.
>>
>> However, please be aware that the fetch_all_by_display_label will still
>> return a list of genes, which could have more than one element.
>> For example, two genes can share the same name if one is on the reference
>> while the other one is on a haplotype.
>> There are also cases where a name is misassigned to a gene, resulting in
>> a duplication. This can happen when two genes are overlapping.
>>
>> Because of this, I would recommend looping through the resulting list
>> rather than assume the first result is the one you want.
>> You can then check for various gene attributes to ensure this is the one
>> you expect.
>>
>>
>> Hope that helps,
>> Magali
>>
>>
>>
>> On 15/07/2017 12:55, Mahmood Naderan wrote:
>>
>> I have an update that may shed a light but I cannot figure out.
>> With the command in my previous email, I see that the stableID is
>> ENSG00000259634. As I enter this ID in the web site, I see
>>
>> Gene: RP11-529H20.5 ENSG00000259634  . Location  Chromosome 14:
>> 92,524,896-92,525,877 reverse strand.
>>
>> As you can see the start and end numbers matches with my previous email
>> and its name is not ATXN3 which I requested in the command. So, the
>> question is that why fetch_all_by_external_name("atxn3") returns that.
>>
>> In my previous questions, Emily pointed that function may returns LRGs.
>> For me it is hard to understand since I am not an expert in that field. I
>> want to the get the main gene and not anything else.
>>
>> Regards,
>> Mahmood
>>
>>
>>
>> On Sat, Jul 15, 2017 at 2:15 PM, Mahmood Naderan <mahmood.nt at gmail.com>
>> wrote:
>>
>>> Hi,
>>> With this code
>>>
>>>   my @genes = @{ $gene_adaptor->fetch_all_by_external_name("atxn3) };
>>>   my $gene  = @genes[0];
>>>   my $start = $gene->start();
>>>   my $end   = $gene->end();
>>>
>>> I see that
>>>   start=92524896
>>>   end=92525877
>>>
>>> However, from the website, I see
>>>   Chromosome 14: 92,524,896-92,572,965
>>>
>>> As you can see, the end numbers are different.
>>> http://grch37.ensembl.org/Homo_sapiens/Gene/Sequence?db=core
>>> ;g=ENSG00000066427;r=14:92524896-92572965
>>>
>>>
>>> Is there any reason for that?
>>>
>>> Regards,
>>> Mahmood
>>>
>>>
>>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20170807/acb359ef/attachment.html>


More information about the Dev mailing list