[ensembl-dev] Downloading annotation from Ensembl

Giulietta gspudich at ebi.ac.uk
Wed Jan 22 15:14:11 GMT 2014


On 21/01/2014 15:05, Greg Slodkowicz wrote:
> Hi Fiona,
>
>     We do have GTF files for our gene annotations. You can see a table
>     of all our download files here:
>     http://www.ensembl.org/info/data/ftp/index.html
>
>
> Thanks for getting back to me. I've previously downloaded the GTF 
> file under "Gene sets" but it seems that it contained coordinates of 
> genes, exons and CDS but not other functional features such as 
> domains, secondary structure predictions etc..  Is this the file 
> you're referring to?
Hi Greg,

Apologies, I am not completely clear on what information you would like 
to access.  The nature paper focuses on conserved regions and potential 
gene regulatory sequences, so I will address how to find that 
information in Ensembl:

The Perl API allows programmatic access to conserved and potentially 
functional regions of the genome.

Installation instructions are here:

http://www.ensembl.org/info/docs/api/api_installation.html

For conserved regions ('constrained elements') the Compara Perl API is 
an option- have a look at this tutorial:

http://www.ensembl.org/info/docs/api/compara/compara_tutorial.html

If you are looking for hypersensitive sites, transcription binding 
sites, and histone modifications, go via the Regulation API:

http://www.ensembl.org/info/docs/api/funcgen/regulation_tutorial.html

There is documentation for both our Compara and Regulation resources here:

http://www.ensembl.org/info/genome/compara/index.html
and
http://www.ensembl.org/info/genome/funcgen/index.html

****
If you are looking for protein domains from InterProscan, and 
coiled-coil regions from the ncoils program, you can do so through 
BioMart or the Perl API.

BioMart provides an interface for programmers and non-programmers 
alike.  To learn a bit about how to use the web interface, watch our 
quick tutorial:

BioMart: An Introduction
http://youtu.be/DXPaBdPM2vs

You would want to use 'Filters' in the 'PROTEIN DOMAINS' section.

The API access would be through the Core API.
****

Let us know if you have any trouble accessing the information you want, 
or if you need some further explanation on specific data types.

Best wishes,
Giulietta

>     What are your features? If they are variation data you could use
>     our variant effect predictor (VEP) http://www.ensembl.org/VEP.
>
>
> They're sitewise predictions of evolutionary constraint, very similar 
> to those from 
> http://www.nature.com/nature/journal/v478/n7370/abs/nature10530.html.
>
> Best,
> Greg
>
> -- 
> Greg Slodkowicz
> PhD student, Nick Goldman group
> European Bioinformatics Institute (EMBL-EBI)
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140122/eb0e8776/attachment.html>


More information about the Dev mailing list