[ensembl-dev] which version of genome to use fro RNA-seq mapping
Julien Roux
julien.roux at unil.ch
Wed Nov 14 21:08:39 GMT 2012
Dear Ensembl team,
I am aiming at mapping RNA-seq reads to the human genome using tophat,
however I am unsure which version of the genome I should use to do this.
After some searches I have seen that people tend to use the non-masked
"toplevel" version of the genome
(ftp://ftp.ensembl.org/pub/release-69/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.69.dna.toplevel.fa.gz),
but I am wondering if this is a good idea: because of the redundant
sequences found in the patches, the aligner will conclude that some
reads map at multiple locations and these reads will be discarded.
Another option would be to map to the "primary assembly"
(ftp://ftp.ensembl.org/pub/release-69/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.69.dna.primary_assembly.fa.gz),
but this ignores some of recent improvements to the genome (fix patches).
Ideally I would like to use a "golden path" assembly (sum of all
top-level sequences, omitting any redundant regions). What would you
suggest?
Thanks
Julien
--
Julien Roux, PhD
Gilad lab, Department of Human Genetics, University of Chicago
http://giladlab.uchicago.edu/
920 East 58th Street, CLSC 317, Chicago, IL 60637, USA
tel: +1-773-834-1984 fax: +1-773-834-8470
More information about the Dev
mailing list