[ensembl-dev] Streaming BAM files in IGV

Andy Yates ayates at ebi.ac.uk
Mon Mar 10 10:59:55 GMT 2014


Hi there,

1). Yes I can load this in by going to File -> Load from URL… & pasting ftp://ftp.ensembl.org/pub/release-75/data_files/homo_sapiens/GRCh37/rnaseq/GRCh37.HumanBodyMap.brain.1.bam into the popup that appears. I didn't see any data until I zoomed into ~24kb of sequence and then the I saw both coverage and reads appearing.

2). Sorry I didn't explain myself too well. Ensembl's GTF does not have features stored by their genomic start. Sorting the file did not take too long to do. Once I had that I got IGV to produce the index (via the igvtools GUI). With a sorted file and index my memory usage still went up to just shy of 4GB. It did load it eventually and I did have Ensembl models correctly represented but I expected lower memory usage.

I should say that my experience of IGV extends to approximately 2hrs of trying out a few things. Please do not take this as a definitive answer on the topic as I'm sure other more experienced users will be able to say exactly where and how I went wrong :)

All the best,

Andy

------------
Andrew Yates - Ensembl Support Coordinator
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
Tel: +44-(0)1223-492538
Fax: +44-(0)1223-494468
http://www.ensembl.org/

On 10 Mar 2014, at 10:02, Genomeo Dev <genomeodev at gmail.com> wrote:

> Thanks Andy. 
> 
> 1) Just to confirm. Are you able to load this file into IGV from URL?
> 
> ftp://ftp.ensembl.org/pub/release-75/data_files/homo_sapiens/GRCh37/rnaseq/GRCh37.HumanBodyMap.brain.1.bam
> 
> If yes, do you paste this link as ftp:// or http:// in the URL box?
> 
> 2) When you mentioned 'I've tried sorting on a local GTF file but to little success without it taking up a lot of RAM', you actually were not able to sort the file as opposed to having sorted it and not been able to load it into IGV. Am I right? How much memory are you using with IGV?
> 
> Thanks,
> 
> 
> 
> On 8 March 2014 11:48, Andy Yates <ayates at ebi.ac.uk> wrote:
> Hi,
> 
> Apologies from my initial reading of their docs it seemed that there was no support for FTP. I have been able to load our BAMs into a local IGV instance so I don't think there is a problem here. It was probably related to our FTP issues yesterday.
> 
> As for loading the models I've tried sorting on a local GTF file but to little success without it taking up a lot of RAM. So at the moment I would suggest using the DAS track for the gene annotations & using the flat files for the RNA-seq data. If we can manage to make something better we will let you know.
> 
> Andy
> 
> On 7 Mar 2014, at 11:20, Genomeo Dev <genomeodev at gmail.com> wrote:
> 
> > Hi,
> >
> > There are other things in Ensembl which I want to use inside igv, not just the annotations such as RNAseq and Epigenetic data, hence I have been looking at loading from URL. Well, based on this IGV documentation, IGV also accepts ftp:
> >
> > Load from URL
> >
> > To load data from an HTTP URL:
> >
> >       • Select File>Load from URL.
> >       • Enter the HTTP or FTP URL for a data file or sample information file, then click OK.
> > http://www.broadinstitute.org/igv/loaddata
> >
> > For the other question, would it be safe for me to do the gtf to bigBed conversion myself?
> >
> > G.
> >
> > On 7 March 2014 09:11, Andy Yates <ayates at ebi.ac.uk> wrote:
> > Hi Genomeo
> >
> > So firstly looking at the information at igv it can only render tracks hosted on http. Those files are on FTP only. Without locally downloading it you won't be able to visualise. We are looking into making this available on http but I cannot give any firm timelines.
> >
> > Secondly I did manage to load the gtf in igv (did it about 2 weeks ago) but it took 4gb and ages to load. Like cup of coffee timescales but it did work in the end. The solution here would be to use an alternative format such as bigBed. We don't make the genes available in that format at the moment but I am working towards making this happen. We recognise how important this is with so many new browsers now available we need yo make it easier to access Ensembl data and in particular the gene sets.
> >
> > As an alternative you could always download bed from the rest api's region endpoint (http://beta.rest.ensembl.org/documentation/info/feature_region) so long as you can work in smaller regions (5mb). Or you could attach the Ensembl das source which igv says it can understand. There's info from http://www.ensembl.org/info/data/ensembl_das.html
> >
> > Hope this helps a little
> >
> > Andy
> >
> > Sent from my tablet
> >
> > On 6 Mar 2014, at 18:13, Genomeo Dev <genomeodev at gmail.com> wrote:
> >
> >> Hi,
> >>
> >> (1)
> >>
> >> I am trying to use IGV browser to load bam files form this location:
> >>
> >> ftp://ftp.ensembl.org/pub/release-75/data_files/homo_sapiens/GRCh37/rnaseq/
> >>
> >> But I am getting an error message that not possible to establish SOCKS proxy connection.
> >>
> >> I have found that igv mention that 'for .bam, .tdf, and indexed file formats the server must support byte-range requests.' (http://www.broadinstitute.org/igv/loaddata)
> >>
> >> I was wondering whether there is any thing from the ensembl server side that is causing this error.
> >>
> >> Thanks.
> >>
> >> (2)
> >>
> >> I want to load annotation into IGV genome browser from ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/Homo_sapiens.GRCh37.75.gtf.gz
> >>
> >> But the file is too large to handle at once. Any advice from the Ensembl browser team on how I should be conducting this operation?
> >>
> >> Thanks,
> >>
> >> --
> >> G.
> >> _______________________________________________
> >> Dev mailing list    Dev at ensembl.org
> >> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> >> Ensembl Blog: http://www.ensembl.info/
> >
> > _______________________________________________
> > Dev mailing list    Dev at ensembl.org
> > Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> > Ensembl Blog: http://www.ensembl.info/
> >
> >
> >
> >
> > --
> > G.
> > _______________________________________________
> > Dev mailing list    Dev at ensembl.org
> > Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> > Ensembl Blog: http://www.ensembl.info/
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
> 
> 
> 
> -- 
> G.
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list