Skip to content
Snippets Groups Projects
PIAT LUCIEN's avatar
PIAT LUCIEN authored
Small modifications

See merge request !12
d6a4785c
History

Warnings / Issues

/!\: Act with care; this workflow uses significant memory if you increase the values in masterconfig. We recommend keeping the default settings and running a test first.

/!\: For now dont run multiple split at once

How to Use

A. Running on a cluster

1. Set up

Clone the Git repository

git clone https://forgemia.inra.fr/pangepop/MSpangepop.git && cd MSpangepop

2. Add your files

  • Add a .fasta.gz file; an example can be found in the repository.

3. Configure the pipeline

  • Edit the masterconfig file in the .config/ directory with your sample information.
  • Edit the visor_sv_type.yaml file with the mutations you want.
  • Edit job.sh with your email and add path to the needed modules (Singularity/Apptainer, Miniconda3)
  • Provide the needed conda environement in job.sh, under source activate wf_envyou can create it using :
conda create -n wf_env -c conda-forge -c bioconda snakemake=8.4.7 snakemake-executor-plugin-slurm
conda init bash

4. Run the WF

The workflow has two parts: split and simulate. Always run the split first and once its done (realy quick) run the simulate.

sbatch job.sh [split or simulate] dry

If no warnings are displayed, run:

sbatch job.sh [split or simulate] 

Nb 1: If the your account name cant be automaticly determined, add it in the .config/snakemake/profiles/slurm/config.yaml file.

Nb 2: to create a visual representation of the workflow, use dag instead of dry. Open the generated .dot file with a viewer that supports the format.

Nb 3: The workflow is in two parts because we want to execute the simulations chromosome by chromosome. Snakemake cannot retrieve the number of chromosomes in one go and needs to index and split first.

B. Run localy

  • Ensure snakemake and Singularity/Apptainer are installed on your machine, then run the workflow:
./local_run [split or simulate] dry

If the workflow cannot download images from the container registry, install Docker, log in with your credentials, and rerun the workflow:

docker login -u "<your_username>" -p "<your_token>" "registry.forgemia.inra.fr" 

Workflow

Dag of the workflow

More informations

The variants generation is inspired by VISOR.

You can extract a VCF from the graph using the vg deconstruct command. It is not implemented in the pipeline.

Helper script

You can use the script workflow/scripts/split_path.sh to cut the final fasta into chromosome level fasta files.

Example use :

./workflow/scripts/split_path.sh input.fasta results/test_sample1_results/06_graph_paths/test_sample1_paths.fasta ./out

Dependencies

TODO

Containers : Miniconda 3, Singularity/Apptainer

Python : pandas, msprime, argprase, os, multiprocessing, yaml, Bio.Seq

Workflow : snakemake, snakemake-executor-plugin-slurm, vg 1.60.0, bcftools 1.12, bgzip, tabix 1.7.