{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 01_load_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In principle, FEATHER can accept the output from any HX/MS software.\n", "\n", "There are two types of input files:\n", "1. Peptide pools with centroid deuteration values\n", "2. Raw mass spectra (deconvoluted)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read the centroid data " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **Table:** The peptide pool.\n", "* **Range List:** A file that defines the peptides to include or exclude.\n", "* **n_fastamides:** In an HDX experiment, the first two residues of a peptide at the N-terminus do not contribute to deuterium uptake due to rapid back exchange.\n", "* **Saturation:** The percentage of deuterium in the D2O buffer.\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from pigeon_feather.data import *\n", "from pigeon_feather.plot import *\n", "from pigeon_feather.hxio import *\n", "from pigeon_feather.spectra import *\n", "\n", "\n", "import numpy as np\n", "import pandas as pd\n", "\n", "import datetime\n", "import os\n", "import pickle\n", "import datetime" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "./data/ecDHFR_tutorial.csv\n", "rangeslist included !\n" ] } ], "source": [ "tables = ['./data/ecDHFR_tutorial.csv']\n", "\n", "ranges = ['./data/rangeslist.csv']\n", "\n", "\n", "raw_spectra_paths = [\n", " f\"./data/SpecExport/\",\n", "]\n", "\n", "protein_sequence = \"MTGHHHHHHENLYFQSISLIAALAVDRVIGMENAMPWNLPADLAWFKRNTLDKPVIMGRHTWESIGRPLPGRKNIILSSQPGTDDRVTWVKSVDEAIAACGDVPEIMVIGGGRVYEQFLPKAQKLYLTHIDAEVEGDTHFPDYEPDDWESVFSEFHDADAQNSHSYCFEILERR\"\n", "\n", "# load the data\n", "hdxms_data_list = []\n", "for i in range(len(tables)):\n", " # for i in [4]:\n", " print(tables[i])\n", "\n", " # read the data and clean it\n", " cleaned = read_hdx_tables([tables[i]], [ranges[i]], exclude=False, states_subset=['APO','TRI'])\n", " \n", " # convert the cleaned data to hdxms data object\n", " hdxms_data = load_dataframe_to_hdxmsdata(\n", " cleaned,\n", " n_fastamides=2,\n", " protein_sequence=protein_sequence,\n", " fulld_approx=False,\n", " saturation=0.9,\n", " )\n", "\n", " hdxms_data_list.append(hdxms_data)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "check the basic statics_info of the hdxms_data_list" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "============================================================\n", " HDX-MS Data Statistics\n", "============================================================\n", "States names: ['APO', 'TRI']\n", "Time course (s): [46.0, 373.5, 572.5, 2011.0, 7772.0, 30811.5, 43292.0]\n", "Number of time points: 7\n", "Protein sequence length: 174\n", "Average coverage: 0.97\n", "Number of unique peptides: 261\n", "Average peptide length: 9.8\n", "Redundancy (based on average coverage): 14.7\n", "Average peptide length to redundancy ratio: 0.7\n", "Backexchange average, IQR: 0.27, 0.26\n", "============================================================\n" ] } ], "source": [ "from pigeon_feather.hxio import get_all_statics_info\n", "\n", "get_all_statics_info(hdxms_data_list)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load the raw spectrum" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Removed 0 peptides from state APO due to missing raw MS data.\n", "Removed 70 peptides from state APO due to high back exchange.\n", "Removed 2 peptides from state TRI due to missing raw MS data.\n", "Removed 70 peptides from state TRI due to high back exchange.\n", "Done loading raw MS data.\n" ] } ], "source": [ "# spectrum could be easily loaded to the hdxms_data object\n", "for i in range(len(tables)):\n", " load_raw_ms_to_hdxms_data(\n", " hdxms_data,\n", " raw_spectra_paths[i],\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note:** One common error is that the correct spectra file cannot be found. Please ensure that the `protein_state.state_name` corresponds to the files in the spectrum folder, with the correct time points and charge states." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# save the raw data as a pickle file\n", "import pickle\n", "\n", "today = datetime.date.today().strftime(\"%Y%m%d\")\n", "today = \"20240722\"\n", "\n", "with open(f\"./data/hdxms_data_raw_{today}.pkl\", \"wb\") as f:\n", " pickle.dump(hdxms_data_list, f)\n", "\n", "# with open(f\"./data/hdxms_data_raw_{today}.pkl\", \"rb\") as f:\n", "# hdxms_data_list = pickle.load(f)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.8" } }, "nbformat": 4, "nbformat_minor": 2 }