PIGEON

pigeon is a stand-only program that takes in a list of peptide pool csv and outputs a cleaned csv of pooled data. It also outputs a rangeslist table for merge in FEATHER.

CMD Options

Type pigeon --help to see the following options:

--s, --seq, --sequence: sequence(s) of target proteins. Required
--f, --files: path(s) to input mgf(s). Required
--path: path for output directory.
--o, --output: path(s) for peptide pool output, if any.
--r, --rangeslist: path(s) for output rangeslist(s), if any.
--n, --nums: path for cut statistics ouput, if any.
--p, --plots: directory for output plots.

--maxz: maximum charge for theoretical peptides and fragments. Default: 3
--minl: minimum length for theoretical peptides. Default: 3
--maxl: maximum length for theoretical peptides. Default: 15
--ions: which fragment ions to consider in match. Default: ‘b’, ‘b-H2O’, ‘b-NH3’, ‘y’, ‘y-H2O’, ‘y-NH3’
--ft, --threshold: m/z threshold for fragment matches. Default: 0.02

--ppmc_w: ppm error cutoff for initial match. Default: 30
--ppmc_n: ppm error cutoff after fit. Default: 7
--scorec: score threshold for provisional cut for trendline fit. Default: 0.05
--fit: how to fit systematic error. Options: ‘quad’, ‘inv’. Default: ‘quad’
--maxfev: maximum iterations for systematic error fit.

--c, --cm, --method: flag specifying how to treat co-eluting peptides. Options: ‘keep’, ‘drop’. Default: ‘drop’
--rtc: RT cutoff for duplicate peptides. Default: 0.5
--mzc: m/z cutoff for duplicate peptides. Default: 0.1
--pvc: p-value cutoff for duplicate peptides. Default: 0.05
--scorefloor, --sf: minimum score for score cut, if any. Default: 0

Default inputs

--f (list of MS2 files in .mgf format)

--s (list of protein sequences)

Default outputs

--o (peptide pool csv(s) batched and cleaned)
--r (rangeslist table(s) for merge in post-PIGEON)
--n (peptide pool features at each step)

In the plot directory, the following files are generated:

score histograms:

all-hist.pdf: Histogram of score for all pooled data.

truth-hist.pdf: Histogram of score for high scoring data used for curve fit.

ppm_cut-hist.pdf: Histogram of score after cut at ppmC from trendline.

single-hist.pdf: Histogram of score after dropping all but best match for each peptide.

ambig-hist.pdf: Histogram of score for discarded duplicates.

clean-hist.pdf: Histogram of score for final cleaned data.

scatterplots:

all-scatter.pdf: PPM error vs. m/z (colorbar score) for all pooled data.

truth-scatter.pdf: PPM error vs. m/z (colorbar score) for high scoring data used for curve fit.

ppm_cut-scatter.pdf: PPM error vs. m/z (colorbar score) after cut at ppmC from trendline.

single-scatter.pdf: PPM error vs. m/z (colorbar score) after dropping all but best match for each peptide.

ambig-scatter.pdf: PPM error vs. m/z (colorbar score) for discarded duplicates.

clean-scatter.pdf: PPM error vs. m/z (colorbar score) for final cleaned data.

Example Usage

Below are examples of how to use the pigeon program with various arguments:

pigeon --s [protein sequence] [pepsin sequence] --c keep --p keep-plots --o keep-pool.csv keep-pepsin-pool.csv --r keep-ranges.csv keep-pepsin-ranges.csv --n keep-nums.out --f apo.mgf

Alternative usage example:

pigeon --f apo1.mgf apo2.mgf apo3.mgf --s [protein 1 sequence] [protein 2 sequence] --p drop-plots --ov drop --o drop-pooled-1.csv drop-pooled-2 --mzc 0.05 --rtc 0.25 --pvc 1 --scorec 0.01 --ppmc_n 5 --maxfev 2000