[ensembl-dev] (no subject)

Healy, Matthew Matthew.Healy at bms.com
Mon Apr 2 18:39:49 BST 2012

I am new to BIOMART and the Ensembl Perl API, so probably I am just confused.  I would be grateful for some enlightenment from those with more experience.

I am trying to map protein features into chromosomal nucleotide coordinates.

First I use the fetch_by_stable_id() method of the transcript adaptor to get a transcript object given its ENSTxxx identifier.
Then I use the get_all_ProteinFeatures() method of that transcript object to get all its protein features.
Then I use the pep2genomic method of Bio::EnsEMBL::TranscriptMapper to map these coordinates into nucleotide space.

Usually this works as I would expect it to work: if a protein domain feature spans multiple exons, then I get back multiple pairs of genomic coordinates.

When the domain overlaps the start or the end of the translation, I also get a gap object in transcript nucleotide coordinates (start and end both zero or both minus one or both length of transcript plus one), indicating some of that domain is missing from this translation.

However, I have also found an oddity in BioMart.  I downloaded a table of transcript coordinates from BIOMART.  In most cases,
these coordinates are exactly the same as the coordinates displayed by the genome browser.  But I have seen a few cases where
they are different.

For example, in the ENSEMBL genome browser right now the coordinates for ENST00000392973 are given as
Chromosome 16: 90,071,281-90,086,526 reverse strand.  But when I downloaded all transcripts for
ENSG00000003249 using http://useast.ensembl.org/biomart/martview/ the relevant row of output is:

Ensembl Gene ID        Ensembl Transcript ID    Ensembl Protein ID       Gene Start (bp)  Gene End (bp)    Strand   Transcript Start (bp)    Transcript End (bp)      Chromosome Name

ENSG00000003249        ENST00000392973  ENSP00000376699  90071273         90086536         -1       90071281         90076619         16

In which Transcript End (bp) is 90076619 versus 90,086,526 displayed in browser views.

This message (including any attachments) may contain confidential, proprietary, privileged and/or private information.  The information is intended to be for the use of the individual or entity designated above.  If you are not the intended recipient of this message, please notify the sender immediately, and delete the message and any attachments.  Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited.

More information about the Dev mailing list