[ensembl-dev] FW: Download human body map 2.0 transcript coordinates

Thibaut Hourlier th3 at sanger.ac.uk
Mon Jan 30 14:43:39 GMT 2012


Dear Ying,
As I said on the link you are quoting, we recommend people to use the 
perl API : http://www.ensembl.org/info/docs/Doxygen/core-api/index.html

The only way you can map the reads with a gene is with the 
seq_region(_id/_start/_end/_strand) information.
If you know the stable id (ENSG000....) of the gene you are interested 
in, it's quite simple with the API:

$db = new Bio::EnsEMBL::DBAdaptor(
                 -host =>  'ensembldb.ensembl.org',
                 -port =>  5306,
                 -user =>  'anonymous',
                 -dbname =>  'homo_sapiens_core_65_37');
$ga = $db->get_GeneAdaptor();
$gene = $ga->fetch_by_stable_id("ENSG000XXXX");
$slice = $gene->slice;
$rnaseqdb = new Bio::EnsEMBL::DBAdaptor(
                 -host =>  'ensembldb.ensembl.org',
                 -port =>  5306,
                 -user =>  'anonymous',
                 -dbname =>  'homo_sapiens_rnaseq_65_37');
$rnaseqsa = $rnaseqdb->get_SliceAdaptor();
$rnaseqslice = $rnaseqsa->fetch_by_name($slice->name);
@transcripts = @{$rnaseqslice->get_all_Transcripts('skeletal_rnaseq')};
foreach my $transcript (@transcripts) {
    foreach my $sf (@{$transcript->get_all_supporting_features()}) {
	#We print the number of reads that spanned accross the intron
	print STDOUT $sf->hit_name, ' :', $sf->score, "\n";
    }
}

The number of reads that span the introns is the score you can find in 
the dna_align_feature table of the rnaseq database.

Regards,
Thibaut


On 27/01/12 19:09, Li, Ying L wrote:

Dear Thibaut,



I am trying to get the ensemble human bodymap with gene or transcript.  And I followed your instruction at the this blog site:http://lists.ensembl.org/pipermail/dev/2011-August/001593.html



I am able to setup an oracle schema to do the following query:

SELECT t.* , sr.name

FROM rnaseq37_analysis a, rnaseq37_transcript t

LEFT JOIN rnaseq37_seq_region sr

ON sr.seq_region_id = t.seq_region_id

WHERE t.analysis_id = a.analysis_id

AND a.logic_name = 'skeletal_rnaseq'

;



TRANSCRIPT_ID

GENE_ID

ANALYSIS_ID

SEQ_REGION_ID

SEQ_REGION_START

SEQ_REGION_END

SEQ_REGION_STRAND

DISPLAY_XREF_ID

BIOTYPE

STATUS

DESCRIPTION

IS_CURRENT

CANONICAL_TRANSLATION_ID

STABLE_ID

VERSION

CREATED_DATE

MODIFIED_DATE

NAME

840249

585754

8244

27517

184298181

184300196

1

\N

protein_coding

PREDICTED

\N

1

593641

ROUGHT00000241809

1

2011-01-12 10:33:07

2011-01-12 10:33:07

3

840251

585756

8244

27523

74332013

74659111

-1

\N

protein_coding

PREDICTED

\N

1

593643

ROUGHT00000241811

1

2011-01-12 10:33:07

2011-01-12 10:33:07

8

840252

585758

8244

27523

74702071

74742711

-1

\N

protein_coding

PREDICTED

\N

1

593644

ROUGHT00000241812

1

2011-01-12 10:33:07

2011-01-12 10:33:07

8

840254

585759

8244

27523

74857620

74885297

-1

\N

protein_coding

PREDICTED

\N

1

593646

ROUGHT00000241814

1

2011-01-12 10:33:07

2011-01-12 10:33:07

8

840256

585761

8244

27523

74887723

74895618

1

\N

protein_coding

PREDICTED

\N

1

593648

ROUGHT00000241816

1

2011-01-12 10:33:07

2011-01-12 10:33:07

8

840257

585762

8244

27523

74903474

74917165

1

\N

protein_coding

PREDICTED

\N

1

593649

ROUGHT00000241817

1

2011-01-12 10:33:07

2011-01-12 10:33:07

8

840259

585764

8244

27523

74921628

74941367

1

\N

protein_coding

PREDICTED

\N

1

593651

ROUGHT00000241819

1

2011-01-12 10:33:07

2011-01-12 10:33:07

8

840261

585766

8244

27523

75015772

75019126

1

\N

protein_coding

PREDICTED

\N

1

593653

ROUGHT00000241820

1

2011-01-12 10:33:07

2011-01-12 10:33:07

8

840264

585768

8244

27519

15260701

15375468

-1

\N

protein_coding

PREDICTED

\N

1

593656

ROUGHT00000241821

1

2011-01-12 10:33:07

2011-01-12 10:33:07

12

840266

585770

8244

27519

15742384

15751506

1

\N

protein_coding

PREDICTED

\N

1

593658

ROUGHT00000241822

1

2011-01-12 10:33:07

2011-01-12 10:33:07

12



Now I need to map the gene_id or transcript_id to some kind of standard id (eg ensg00000*****) so that I can tell what gene is the gene_id regards to, do you know what is the best way to do so? if you can tell me how to map the gene_id? In additional, do you know if there is a '# of read" for the rnaseq data?



Thanks a lot for your help,



Best regards,

Ying


> Hi there,
>
> I am on your mailing list, so resubmitting this question -- see attached file.
>
> Thanks a lot for your help.
>
> Best,
> Ying
> -----Original Message-----
> From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf Of dev-owner at ensembl.org
> Sent: Wednesday, January 25, 2012 4:52 PM
> To: Li, Ying L {PXTP~Nutley}
> Subject: Re: [ensembl-dev] Download human body map 2.0 transcript coordinates
>
> The Ensembl dev mailing list only accepts postings from people who are subscribed. You can subscribe or unsubscribe at http://lists.ensembl.org/mailman/listinfo/dev
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20120130/87c6345e/attachment.html>


More information about the Dev mailing list