PIGEON ====== **pigeon** is a stand-only program that takes in a list of peptide pool csv and outputs a cleaned csv of pooled data. It also outputs a rangeslist table for merge in FEATHER. CMD Options ----------- Type ``pigeon --help`` to see the following options: | ``--s``, ``--seq``, ``--sequence``: sequence(s) of target proteins. Required | ``--f``, ``--files``: path(s) to input mgf(s). Required | ``--path``: path for output directory. | ``--o``, ``--output``: path(s) for peptide pool output, if any. | ``--r``, ``--rangeslist``: path(s) for output rangeslist(s), if any. | ``--n``, ``--nums``: path for cut statistics ouput, if any. | ``--p``, ``--plots``: directory for output plots. | ``--maxz``: maximum charge for theoretical peptides and fragments. Default: 3 | ``--minl``: minimum length for theoretical peptides. Default: 3 | ``--maxl``: maximum length for theoretical peptides. Default: 15 | ``--ions``: which fragment ions to consider in match. Default: 'b', 'b-H2O', 'b-NH3', 'y', 'y-H2O', 'y-NH3' | ``--ft``, ``--threshold``: m/z threshold for fragment matches. Default: 0.02 | ``--ppmc_w``: ppm error cutoff for initial match. Default: 30 | ``--ppmc_n``: ppm error cutoff after fit. Default: 7 | ``--scorec``: score threshold for provisional cut for trendline fit. Default: 0.05 | ``--fit``: how to fit systematic error. Options: 'quad', 'inv'. Default: 'quad' | ``--maxfev``: maximum iterations for systematic error fit. | ``--c``, ``--cm``, ``--method``: flag specifying how to treat co-eluting peptides. Options: 'keep', 'drop'. Default: 'drop' | ``--rtc``: RT cutoff for duplicate peptides. Default: 0.5 | ``--mzc``: m/z cutoff for duplicate peptides. Default: 0.1 | ``--pvc``: p-value cutoff for duplicate peptides. Default: 0.05 | ``--scorefloor``, ``--sf``: minimum score for score cut, if any. Default: 0 Default inputs | ``--f`` (list of MS2 files in .mgf format) | ``--s`` (list of protein sequences) Default outputs | ``--o`` (peptide pool csv(s) batched and cleaned) | ``--r`` (rangeslist table(s) for merge in post-PIGEON) | ``--n`` (peptide pool features at each step) In the plot directory, the following files are generated: score histograms: | `all-hist.pdf`: Histogram of score for all pooled data. | `truth-hist.pdf`: Histogram of score for high scoring data used for curve fit. | `ppm_cut-hist.pdf`: Histogram of score after cut at ppmC from trendline. | `single-hist.pdf`: Histogram of score after dropping all but best match for each peptide. | `ambig-hist.pdf`: Histogram of score for discarded duplicates. | `clean-hist.pdf`: Histogram of score for final cleaned data. scatterplots: | `all-scatter.pdf`: PPM error vs. m/z (colorbar score) for all pooled data. | `truth-scatter.pdf`: PPM error vs. m/z (colorbar score) for high scoring data used for curve fit. | `ppm_cut-scatter.pdf`: PPM error vs. m/z (colorbar score) after cut at ppmC from trendline. | `single-scatter.pdf`: PPM error vs. m/z (colorbar score) after dropping all but best match for each peptide. | `ambig-scatter.pdf`: PPM error vs. m/z (colorbar score) for discarded duplicates. | `clean-scatter.pdf`: PPM error vs. m/z (colorbar score) for final cleaned data. Example Usage ------------- Below are examples of how to use the ``pigeon`` program with various arguments: .. code-block:: bash pigeon --s [protein sequence] [pepsin sequence] --c keep --p keep-plots --o keep-pool.csv keep-pepsin-pool.csv --r keep-ranges.csv keep-pepsin-ranges.csv --n keep-nums.out --f apo.mgf Alternative usage example: .. code-block:: bash pigeon --f apo1.mgf apo2.mgf apo3.mgf --s [protein 1 sequence] [protein 2 sequence] --p drop-plots --ov drop --o drop-pooled-1.csv drop-pooled-2 --mzc 0.05 --rtc 0.25 --pvc 1 --scorec 0.01 --ppmc_n 5 --maxfev 2000