[ensembl-dev] Download human body map 2.0 transcript coordinates

Thibaut Hourlier th3 at sanger.ac.uk
Mon Aug 22 11:09:17 BST 2011


Hi Jyoti
We used the core schema
(http://www.ensembl.org/info/docs/api/index.html) for the rnaseq
database, here are some SQL queries:
To get all the transcripts, the first column is the name of the
chromosome:
SELECT sr.name, t.* from transcript t left join seq_region sr on
sr.seq_region_id = t.seq_region_id;

Get all the transcripts build from the skeletal cells on chromosome 2:
SELECT t.* FROM analysis a, transcript t LEFT JOIN seq_region sr ON
sr.seq_region_id = t.seq_region_id WHERE t.analysis_id = a.analysis_id
AND sr.name = 2 AND a.logic_name = "skeletal_rnaseq";

Get the exons for the transcript ROUGHT00000000004:
SELECT sr.name, e.seq_region_start, e.seq_region_end,
e.seq_region_strand FROM seq_region sr, transcript_stable_id tsi, exon e
LEFT JOIN exon_transcript et ON et.exon_id = e.exon_id WHERE
et.transcript_id = tsi.transcript_id AND sr.seq_region_id =
e.seq_region_id AND tsi.stable_id = "ROUGHT00000000004";

BUT I recommend you to use the perl API,
http://www.ensembl.org/info/docs/Doxygen/core-api/main.html :

$db = new Bio::EnsEMBL::DBAdaptor(
                -host => 'ensembldb.ensembl.org',
                -port => 5306O,
                -user => 'anonymous',
                -dbname => 'homo_sapiens_rnaseq_63_37');

$slice_adaptor = $db->get_SliceAdaptor();
$chr_slice = $slice_adaptor->fetch_by_region( 'chromosome', '2' );
foreach $transcript (@{chr_slice->get_all_Transcripts()}) {
...
	foreach $exon (@{$transcript->get_all_Exons()}) {
	...
	}
...
}

or 

foreach $transcript
(@{chr_slice->get_all_Transcripts('skeletal_rnaseq')}) {
...
}

Regards
Thibaut

On Fri, 2011-08-19 at 14:23 -0400, Jyoti Shah wrote:
> Hi Thibaut,
> 
> 
> Thansk for your response. The schema seems complicated and I could not
> find any proper documentation on this database instance. Can you point
> me to the SQL queries that can help me fetch transcripts for Human
> body map 2.0 from this MYSQL instance? I could see a file
> transcript.txt with start and end position but there is no chromosome
> number attached to it. Where can I download these build transcripts
> from? 
> 
> 
> Thanks,
> Jyoti
> 
> On Fri, Aug 19, 2011 at 12:12 PM, Thibaut Hourlier <th3 at sanger.ac.uk>
> wrote:
>         Hi
>         you can use either the ftp site and create a local instance of
>         the
>         database:
>         ftp://ftp.ensembl.org/pub/current_mysql/homo_sapiens_rnaseq_63_37/
>         
>         or you can use the public MySQL Server:
>         
>         $db = new Bio::EnsEMBL::DBAdaptor(
>                -host => 'ensembldb.ensembl.org',
>                -port => 5306O,
>                -user => 'anonymous',
>                -dbname => 'homo_sapiens_rnaseq_63_37');
>         Then if you want only some tissues you can filter with the
>         logic_name:
>         +-----------------------+
>         | logic_name            |
>         +-----------------------+
>         | adipose_rnaseq        |
>         | adrenal_rnaseq        |
>         | blood_rnaseq          |
>         | brain_rnaseq          |
>         | breast_rnaseq         |
>         | colon_rnaseq          |
>         | heart_rnaseq          |
>         | kidney_rnaseq         |
>         | liver_rnaseq          |
>         | lung_rnaseq           |
>         | lymph_rnaseq          |
>         | ovary_rnaseq          |
>         | prostate_rnaseq       |
>         | skeletal_rnaseq       |
>         | testes_rnaseq         |
>         | thyroid_rnaseq        |
>         +-----------------------+
>         
>         Regards
>         Thibaut
>         
>         
>         On Fri, 2011-08-19 at 09:18 -0400, Jyoti Shah wrote:
>         > Hi,
>         >
>         >
>         > What is the best way to download the transcripts that were
>         built using
>         > Illumina data (Human body map 2.0) explained here:
>         >
>         >
>         >
>         http://www.ensembl.info/blog/2011/05/24/human-bodymap-2-0-data-from-illumina/
>         >
>         >
>         > I need to download the genomic coordinates.
>         >
>         >
>         > Thanks!
>         
>         > _______________________________________________
>         > Dev mailing list    Dev at ensembl.org
>         > List admin (including subscribe/unsubscribe):
>         http://lists.ensembl.org/mailman/listinfo/dev
>         > Ensembl Blog: http://www.ensembl.info/
>         
>         
>         
>         --
>          The Wellcome Trust Sanger Institute is operated by Genome
>         Research
>          Limited, a charity registered in England with number 1021457
>         and a
>          company registered in England with number 2742969, whose
>         registered
>          office is 215 Euston Road, London, NW1 2BE.
> 
> 





More information about the Dev mailing list