[ensembl-dev] whole chromosome alignment with LastZ
    Alice Iob 
    alice.iob at cragenomica.es
       
    Wed Mar 25 11:36:52 GMT 2020
    
    
  
Good morning,
I am a Phd student woking on plant genomics and I am stuggling with an issue regarding LastZ.
I choose to use LastZ because it was used for plant genome alignments in Ensembl Plants, so I hope that someone who succesfully used it before can help me with this.
I am trying to align two references genomes from very close species: I have two FASTA files,
representing the same chromosome in the two species, each around 800Mb long, with at least one long repetitive region.
the command I am using:
lastz target.fasta query.fasta --notransition --step=20 --maxwordcount=70 ‑‑exact=20 --chain --gapped --ambiguous=iupac --rdotplot=plot --format=differences > alignment.differences
I always get the same error:
FAILURE: in add_segment()
table size (4,869,542,152 for 101,448,794 segments) exceeds allocation limit of 4,294,967,279;
consider raising scoring threshold (--hspthresh or --exact) or breaking your target sequence into smaller pieces.
I tried several strategies to overcome this issue:
increasing values of exact up to 100
using values of hspthresh up to 10 000
adding --seed=match12
dividing my target sequence in two (one multiFASTA file)
working with just half chromosome (400Mb)
set the parameters as they were set to align T. aestivum and A. tauschii (https://plants.ensembl.org/mlss.html?mlss=9814)
Still, I get the same error.
Just a few times I was able to get an output (e.g. when exact=100), but it is always more than 700Gb big, thus, even if the file is generated, I run out of memory and I can not work on it.
I also used LastZ_32 but the process gets killed without giving me any info.
I was wondering if you can help me with this issue, maybe I am not using properly some of the options, or give me some advice on how to proberly deal with this alignment.
Thank you.
Alice Iob
PhD student
Plant and Animal Genomics Program
CRAG, Centre for Research in Agricultural Genomics
Campus UAB - CRAG Building | 08193 Cerdanyola | BARCELONA
Office: 3.01
Tel. +34 935636600 ext 3351
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20200325/51980bf9/attachment.html>
    
    
More information about the Dev
mailing list