PIGEON
pigeon is a stand-only program that takes in a list of peptide pool csv and outputs a cleaned csv of pooled data. It also outputs a rangeslist table for merge in FEATHER.
CMD Options
Type pigeon --help
to see the following options:
--s
, --seq
, --sequence
: sequence(s) of target proteins. Required--f
, --files
: path(s) to input mgf(s). Required--path
: path for output directory.--o
, --output
: path(s) for peptide pool output, if any.--r
, --rangeslist
: path(s) for output rangeslist(s), if any.--n
, --nums
: path for cut statistics ouput, if any.--p
, --plots
: directory for output plots.--maxz
: maximum charge for theoretical peptides and fragments. Default: 3--minl
: minimum length for theoretical peptides. Default: 3--maxl
: maximum length for theoretical peptides. Default: 15--ions
: which fragment ions to consider in match. Default: ‘b’, ‘b-H2O’, ‘b-NH3’, ‘y’, ‘y-H2O’, ‘y-NH3’--ft
, --threshold
: m/z threshold for fragment matches. Default: 0.02--ppmc_w
: ppm error cutoff for initial match. Default: 30--ppmc_n
: ppm error cutoff after fit. Default: 7--scorec
: score threshold for provisional cut for trendline fit. Default: 0.05--fit
: how to fit systematic error. Options: ‘quad’, ‘inv’. Default: ‘quad’--maxfev
: maximum iterations for systematic error fit.--c
, --cm
, --method
: flag specifying how to treat co-eluting peptides. Options: ‘keep’, ‘drop’. Default: ‘drop’--rtc
: RT cutoff for duplicate peptides. Default: 0.5--mzc
: m/z cutoff for duplicate peptides. Default: 0.1--pvc
: p-value cutoff for duplicate peptides. Default: 0.05--scorefloor
, --sf
: minimum score for score cut, if any. Default: 0Default inputs
--f
(list of MS2 files in .mgf format)--s
(list of protein sequences)Default outputs
--o
(peptide pool csv(s) batched and cleaned)--r
(rangeslist table(s) for merge in post-PIGEON)--n
(peptide pool features at each step)In the plot directory, the following files are generated:
score histograms:
all-hist.pdf: Histogram of score for all pooled data.truth-hist.pdf: Histogram of score for high scoring data used for curve fit.ppm_cut-hist.pdf: Histogram of score after cut at ppmC from trendline.single-hist.pdf: Histogram of score after dropping all but best match for each peptide.ambig-hist.pdf: Histogram of score for discarded duplicates.clean-hist.pdf: Histogram of score for final cleaned data.
scatterplots:
all-scatter.pdf: PPM error vs. m/z (colorbar score) for all pooled data.truth-scatter.pdf: PPM error vs. m/z (colorbar score) for high scoring data used for curve fit.ppm_cut-scatter.pdf: PPM error vs. m/z (colorbar score) after cut at ppmC from trendline.single-scatter.pdf: PPM error vs. m/z (colorbar score) after dropping all but best match for each peptide.ambig-scatter.pdf: PPM error vs. m/z (colorbar score) for discarded duplicates.clean-scatter.pdf: PPM error vs. m/z (colorbar score) for final cleaned data.
Example Usage
Below are examples of how to use the pigeon
program with various arguments:
pigeon --s [protein sequence] [pepsin sequence] --c keep --p keep-plots --o keep-pool.csv keep-pepsin-pool.csv --r keep-ranges.csv keep-pepsin-ranges.csv --n keep-nums.out --f apo.mgf
Alternative usage example:
pigeon --f apo1.mgf apo2.mgf apo3.mgf --s [protein 1 sequence] [protein 2 sequence] --p drop-plots --ov drop --o drop-pooled-1.csv drop-pooled-2 --mzc 0.05 --rtc 0.25 --pvc 1 --scorec 0.01 --ppmc_n 5 --maxfev 2000