[ensembl-dev] [Gmod-gbrowse] GFF file into Gbrowse

Sung Gong sung at bio.cc
Wed Feb 9 16:48:37 GMT 2011


On 9 February 2011 14:52, Chris Fields <cjfields at illinois.edu> wrote:
> On Feb 9, 2011, at 7:57 AM, Sung Gong wrote:
>
>> Hi,
>>
>> I was wondering whether the coordinates from GTF file of Ensembl
>> (ftp://ftp.ensembl.org/pub/current/gtf/homo_sapiens/) are compatible
>> with the FASTA sequence from UCSC
>> (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/).
>>
>> What I want to do is to display short reads from the Solid4 platform
>> using Gbrowse2.0. The read mapping and pairing were done based on the
>> fasta file of UCSC hg_19 without masking repeat options.
>>
>> Also wondering the coordinates of the GTF are based on repeat-masking or not?
>>
>> The GTF file was converted into GFF file from the sequence ontology
>> website (http://www.sequenceontology.org/cgi-bin/converter.cgi).
>>
>> Where do you usually download FASTA and GFF files to display your short reads?
>>
>> Cheers,
>> Sung
>
> Sung,
>
> If you're only interested in simple comparisons of your short-read data with a genome, both UCSC and ensembl allow one to use BAM output:
>
> http://uswest.ensembl.org/Homo_sapiens/Info/Index
> (under 'Manage your data')
>
> http://genome.ucsc.edu/goldenPath/help/bam.html
>
> If you want to run your own local version using GBrowse, I would stick with one or the other (ensembl or UCSC), run all analyses on whichever version you choose, and use that for GBrowse.  It's generally not a good idea to mix data sources unless you want downstream headaches.  For instance, I believe the ensembl sequence naming convention differ from UCSC's ('1' vs 'chr1', 'X' vs 'chrX', etc), so you would immediately run into problems with that.  Might also be differences with how unmapped and organelle sequences are dealt with, differences in gene models, how features are lifted over from one assembly to another, etc (caveat: only the UCSC/ensembl folks can answer that in any detail).
>


Yes I'm aware of the problem - the coordinate names of the GTF are
reformatted with 'chr' prefix.


> Personally I generally use Ensembl over UCSC, but I have a bias of sorts, namely that ensembl GTF output is gene-centric and fairly easily converted to a GFF3 gene->transcript->exon->CDS hierarchy via gtf2gff3.pl, whereas UCSC GTF tends to be transcript-centric and does not convert w/o some additional work.  But we also have a ton of UCSC-centric folks here locally, so I have to dabble with both.
>

Do you know any good place for GFF files of UCSC?
Gbrowse NGS tutorial (http://gmod.org/wiki/GBrowse_NGS_Tutorial)
points CBRG (https://gbrowse.molbiol.ox.ac.uk/cgi-bin/gbrowse/HUMAN_HG18/),
but it's built on hg18.

> chris




More information about the Dev mailing list