Warnings / Issues
/!\
: Act with care; this workflow uses significant memory if you increase the values inmasterconfig
. We recommend keeping the default settings and running a test first.
/!\
: For now dont run multiple split at once
How to Use
A. Running on a cluster
1. Set up
Clone the Git repository
git clone https://forgemia.inra.fr/pangepop/MSpangepop.git && cd MSpangepop
2. Add your files
- Add a
.fasta.gz
file; an example can be found in the repository.
3. Configure the pipeline
- Edit the
masterconfig
file in the.config/
directory with your sample information. - Edit the
visor_sv_type.yaml
file with the mutations you want. - Edit
job.sh
with your email and add path to the needed modules (Singularity/Apptainer
,Miniconda3
) - Provide the needed conda environement in
job.sh
, undersource activate wf_env
you can create it using :
conda create -n wf_env -c conda-forge -c bioconda snakemake=8.4.7 snakemake-executor-plugin-slurm
conda init bash
4. Run the WF
The workflow has two parts: split
and simulate
. Always run the split first and once its done (realy quick) run the simulate.
sbatch job.sh [split or simulate] dry
If no warnings are displayed, run:
sbatch job.sh [split or simulate]
Nb 1: If the your account name cant be automaticly determined, add it in the
.config/snakemake/profiles/slurm/config.yaml
file.
Nb 2: to create a visual representation of the workflow, use
dag
instead ofdry
. Open the generated.dot
file with a viewer that supports the format.
Nb 3: The workflow is in two parts because we want to execute the simulations chromosome by chromosome. Snakemake cannot retrieve the number of chromosomes in one go and needs to index and split first.
B. Run localy
- Ensure
snakemake
andSingularity/Apptainer
are installed on your machine, then run the workflow:
./local_run [split or simulate] dry
If the workflow cannot download images from the container registry, install Docker
, log in with your credentials, and rerun the workflow:
docker login -u "<your_username>" -p "<your_token>" "registry.forgemia.inra.fr"
Workflow
More informations
The variants generation is inspired by VISOR.
You can extract a VCF from the graph using the vg deconstruct
command. It is not implemented in the pipeline.
Helper script
You can use the script workflow/scripts/split_path.sh
to cut the final fasta into chromosome level fasta files.
Example use :
./workflow/scripts/split_path.sh input.fasta results/test_sample1_results/06_graph_paths/test_sample1_paths.fasta ./out
Dependencies
TODO
Containers : Miniconda 3, Singularity/Apptainer
Python : pandas, msprime, argprase, os, multiprocessing, yaml, Bio.Seq
Workflow : snakemake, snakemake-executor-plugin-slurm, vg 1.60.0, bcftools 1.12, bgzip, tabix 1.7.