This page shows notes for each module. Note that the parameters can be set up in the configure_user.txt file.
Adding the barcode information to the name of each read in the fastq files)
Input: one of
Output: Demultiplexed PE1 and PE2 fastq files with index information embedded in the read name as: @index3_index2_index1:original_read_name, saved in output/demplxed_fastq/
Note: there could be multiple index files
Perform sequence adapter trimming.
Input: demultiplexed PE1 and PE2 fastq files, separated by a comma
Output: trimmed and demultiplexed PE1 and PE2 fastq files, saved in output/trimmed_fastq/
Parameters:
Note: you can specify TRIM_METHOD=none to ignore trimming step, which usually is OK.
Sequence alignment.
Input: the demultiplexed (and trimmed) paired-end fastq files, separated by comma: PE1.fastq,PE2.fastq
Output: Position sorted bam file, and position sorted MAPQ30 bam file, saved in output/mapping_result/ and plain text files of mapping QC metrics and fragments.txt file saved in output/summary/
Parameters:
Note: only need to set up one of BWA_OPTS, BOWTIE_OPTS and BOWTIE2_OTPS, and only one of BWA_INDEX, BOWTIE_OPTS and BOWTIE2_OPTS, corresponding to the specified mapping method (MAPPING_METHOD)
Peak calling using aggregated bam file.
Input: the MAPQ30 bam file outputted from the mapping step.
Output: peak files, saved as output/peaks/PEAK_CALLER/OUTPUT_PREFIX_features_Blacklist_Removed.bed
Parameters:
Aggregate and normalize signal into .bw or .bedgraph file (can be uploaded to UCSC genome browser).
Input: the MAPQ30 bam file outputted from the mapping step.
Output: Aggregated data in .bw and .bedgraph file, saved in output/signal/
Generate quality control metrics for each barcode.
Input: fragments.tsv.gz file (outputted from module mapping) and peak file (outputted from module call_peak), separated by comma
Output: qc_per_barcode.txt file, saved in output/summary/
Note: these qc metrics for each cell will be loaded into the seurat object as meta data when the clustering module was executed
Build raw peak-by-cell matrix
input: fragments.tsv.gz file, outputted from the mapping module, and features/peak file, outputted from the call_peak module, separated by a comma
output: sparse peak-by-cell count matrix in Matrix Market format, barcodes and feature files in plain text format, saved in output/raw_matrix/PEAK_CALLER/
Perform cell calling
input: raw peak-by-barcode matrix file, outputted from the get_mtx module
output: filtered peak-by-cell matrix in Market Matrix format and .rds format, barcodes and features, saved in output/filtered_matrix/PEAK_CALLER/CELL_CALLER/
Parameters:
Extract bam file for cell barcodes and calculate mapping stats correspondingly
input: A bam file for aggregated data outputted from mapping module and a barcodes.txt file outputted from module call_cell, separated by comma
output: A bam file saved in output/mapping_results and mapping stats (optional) saved in output/summary for cell barcodes
Some of the processing modules can be run together by a single command:
processing data - including demplx_fastq, mapping, call_peak, get_mtx, aggr_signal, qc_per_barcode, call_cell and get_bam4Cells
input: either fastq files for both reads and index, separated by comma, or path to folder of 10x fastq files like: fastq1,fastq2,index_fastq1,index_fastq2, index_fastq3…, or the PATH_TO_10xfastqs_folder
output: peak-by-cell matrix and all intermediate results
Conduct all processing modules except demultiplexing step
input: demultiplexed fastq files for both reads and index, separated by comma like: fastq1,fastq2;
output: peak-by-cell matrix and all intermediate results
Conduct all processing modules after mapping step
input: bam file for aggregated data, outputted from the mapping module
output: filtered peak-by-cell matrix and all intermediate results
Remove potential doublets
input: a peak-by-cell matrix file or a seurat object file in .rds format, and the expected fraction of doublets, separated by a comma
output: doublets removed matrix.rds and barcodes.txt file and seurat objects w/ and w/o doublets saved in the input directory (and a umap plot colored by singlet/doubet)
cell clustering
input: filtered peak-by-cell matrix file, outputted from the call_cell module (or a seurat.rds file)
output: seurat objects with clustering label in the metadata (.rds file) and barcodes with cluster labels (cell_cluster_table.tsv file), and umap plot colorred
Parameters to specify (in configure_user.txt file):
Motif analysis based on chromVAR
input: filtered peak-by-cell matrix file, outputted from the call_cell module
output: a chromVAR object with TF-by-cell deviation score/zscore, a table and heatmap indicating TF enrichment for each cell cluster, saved in output/downstream_analysiss/PEAK_CALLER/CELL_CALLER/
Perform differential accessibility analysis for peaks
input: path_to_seurat_object with two groups of clusters to compare, could be like: seurat_obj.rds,0:1,2 (will compare cells in cluster 0 or cluster 1 with cells in cluster2 for the given seurat object) or seurat_obj.rds,0,rest (will compare cells in cluster 0 with the rest of cells) or seurat_obj.rds,one,rest (will compare cells in any one of the clusters with the rest of the cells)
output: differential accessibility peaks in a tsv file saved in the same in the same folder of the input seurat object
Parameters:
preform GO term enrichment analysis for genes close to cluster specific peaks
input: differential accessible features file, outputted from runDA module (.tsv file)
output: enriched GO terms in .xlsx format saved in the same directory as the input file
Run cicero for calculating gene activity score and predicting cis chromatin interactions
input: seurat_obj.rds file outputted from the clustering module
output: cicero gene activity in .rds format and predicted interactions in .txt format, saved in output/downstream_analysiss/PEAK_CALLER/CELL_CALLER/
Split bam file to generate bam file for each cluster
input: barcodes with cluster label (cell_cluster_table.tsv file, outputted from clustering module)
note: users can specify any two column text files, for barcodes and the corresponding cluster/subpopulation label.
output: .bam file (saved in output/downstream/PEAK_CALLER/CELL_CALLER/data_by_cluster), .bw, .bedgraph (saved in output/signal/) file for each cluster/subpopulation
Perform TF footprinting analysis, supports comparison between two sets of cell clusters and one cluster vs the rest of cell clusters (one-vs-rest)
input: two groups of cells (separated by a comma), each group is labeled with a combination of cluster labels, default 0:1,2, comparing cluster0,1 to cluster2
Note: you can also specify ‘one,rest’ to conduct all one cluster vs the rest clusters comparisons.
output: footprinting summary statistics in tables and heatmap, saved in output/downstream/PEAK_CALLER/CELL_CALLER/
Perform all downstream analyses, including clustering, motif_analysis, split_bam (optional) and footprinting analysis (optional), the corresponding parameters should be set up in configure_user.txt file.
input: filtered peak-by-cell matrix file, outputted from call_cell module
output: all outputs from each module
Merge peaks (called from different data sets) if the distance is less than a given size in basepairs (200 if not specified)
input: peak files and a distance parameter separated by comma: peakFile1,peakFile2,peakFile3,200
output: merged peaks saved in file output/peaks/merged.bed
reconstruct peak-by-cell matrix given peak file, fragments.tsv.gz file, barcodes.txt and an optional path for the output reconstructed matrix
input: different files separated by comma: peakFilePath,fragmentFilePath,barcodesPath,reconstructMatrixPath
output: reconstructed peak-by-cell matrix saved in reconstructMatrixPath, if reconstructMatrixPath is not specified, a sub-folder reConstruct_matrix will be created under the same path as the input barcodes.txt file
Perform integration of two ore more data matrices, which have the same rownames (set of peaks)
input: mtx1,mtx2, separated by comma like, mtx1_path,mtx2_path
output: integrated seurat obj and umap plot, saved in output/integrated/
Perform integration of two ore more data sets, given the corresponding peaks for each data sets.
input: peak/feature files and a optional distance parameter separated by comma: peak_file1,peak_file2,200
output: merged peaks, reconstructed matrix, integrated seurat obj and umap plot, saved in output/integrated/
Note: this module will search corresponding fragments.tsv.gz file, barcodes.txt file for each data set, and merge all the peaks within 200bp distance, reconstruct the matrix with the mergered peaks, and perform matrix integration. In other words, it’s combination of modules mergePeaks, reconstMtx and integrate_mtx.
Label transfer (cell annotation) from scRNA-seq data
input: paths for a seurat object for scATAC-seq, a seurat object for scRNA-seq data in .rds format, and an optional .gtf file for gene annotation, separated by a comma.
output: a updated seurat object for atac with the Predicted_Cell_Type as a metadata variable and an umap plot colored by Predicted_Cell_Type, saved in the same directory as the input atac seurat object.
Note: the cell annotation should be given as a metadata (named Cell_Type) in the seurat object of scRNA-seq. Both seurat objects should have pca and umap dimemsion reduction done.
Generate summary report in html file
input: directory to QC files, output/summary as default
output: summary report in html format, saved in output/summary and .eps figures for each panel saved in output/summary/Figures and tables in output/summary/Tables
Interactively visualize the data through VisCello
input: VisCello_obj directory, outputted from the clustering module
output: launch VisCello through web browser for interactively visualization"
Convert bam file in 10x genomics format to bam file in scATAC-pro format
input: bam file (position sorted) in 10x format
output: position sorted bam file in scATAC-pro format saved in output/mapping_result, mapping qc stat and fragment.txt files saved in output/summary/
Add cell barcode tag to bam file
input: a bam file generated by scATAC-pro
output: the bam file with column ‘CB:Z:cellbarcode’ added (saved in the same directory as the input bam file)