[ensembl-dev] [Gmod-gbrowse] GFF file into Gbrowse

Chris Fields cjfields at illinois.edu
Wed Feb 9 14:52:19 GMT 2011


On Feb 9, 2011, at 7:57 AM, Sung Gong wrote:

> Hi,
> 
> I was wondering whether the coordinates from GTF file of Ensembl
> (ftp://ftp.ensembl.org/pub/current/gtf/homo_sapiens/) are compatible
> with the FASTA sequence from UCSC
> (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/).
> 
> What I want to do is to display short reads from the Solid4 platform
> using Gbrowse2.0. The read mapping and pairing were done based on the
> fasta file of UCSC hg_19 without masking repeat options.
> 
> Also wondering the coordinates of the GTF are based on repeat-masking or not?
> 
> The GTF file was converted into GFF file from the sequence ontology
> website (http://www.sequenceontology.org/cgi-bin/converter.cgi).
> 
> Where do you usually download FASTA and GFF files to display your short reads?
> 
> Cheers,
> Sung

Sung,

If you're only interested in simple comparisons of your short-read data with a genome, both UCSC and ensembl allow one to use BAM output:

http://uswest.ensembl.org/Homo_sapiens/Info/Index
(under 'Manage your data')

http://genome.ucsc.edu/goldenPath/help/bam.html

If you want to run your own local version using GBrowse, I would stick with one or the other (ensembl or UCSC), run all analyses on whichever version you choose, and use that for GBrowse.  It's generally not a good idea to mix data sources unless you want downstream headaches.  For instance, I believe the ensembl sequence naming convention differ from UCSC's ('1' vs 'chr1', 'X' vs 'chrX', etc), so you would immediately run into problems with that.  Might also be differences with how unmapped and organelle sequences are dealt with, differences in gene models, how features are lifted over from one assembly to another, etc (caveat: only the UCSC/ensembl folks can answer that in any detail).  

Personally I generally use Ensembl over UCSC, but I have a bias of sorts, namely that ensembl GTF output is gene-centric and fairly easily converted to a GFF3 gene->transcript->exon->CDS hierarchy via gtf2gff3.pl, whereas UCSC GTF tends to be transcript-centric and does not convert w/o some additional work.  But we also have a ton of UCSC-centric folks here locally, so I have to dabble with both.

chris



More information about the Dev mailing list