01_load_data
In principle, FEATHER can accept the output from any HX/MS software.
There are two types of input files: 1. Peptide pools with centroid deuteration values 2. Raw mass spectra (deconvoluted)
Read the centroid data
Table: The peptide pool.
Range List: A file that defines the peptides to include or exclude.
n_fastamides: In an HDX experiment, the first two residues of a peptide at the N-terminus do not contribute to deuterium uptake due to rapid back exchange.
Saturation: The percentage of deuterium in the D2O buffer.
[6]:
from pigeon_feather.data import *
from pigeon_feather.plot import *
from pigeon_feather.hxio import *
from pigeon_feather.spectra import *
import numpy as np
import pandas as pd
import datetime
import os
import pickle
import datetime
[2]:
tables = ['./data/ecDHFR_tutorial.csv']
ranges = ['./data/rangeslist.csv']
raw_spectra_paths = [
f"./data/SpecExport/",
]
protein_sequence = "MTGHHHHHHENLYFQSISLIAALAVDRVIGMENAMPWNLPADLAWFKRNTLDKPVIMGRHTWESIGRPLPGRKNIILSSQPGTDDRVTWVKSVDEAIAACGDVPEIMVIGGGRVYEQFLPKAQKLYLTHIDAEVEGDTHFPDYEPDDWESVFSEFHDADAQNSHSYCFEILERR"
# load the data
hdxms_data_list = []
for i in range(len(tables)):
# for i in [4]:
print(tables[i])
# read the data and clean it
cleaned = read_hdx_tables([tables[i]], [ranges[i]], exclude=False, states_subset=['APO','TRI'])
# convert the cleaned data to hdxms data object
hdxms_data = load_dataframe_to_hdxmsdata(
cleaned,
n_fastamides=2,
protein_sequence=protein_sequence,
fulld_approx=False,
saturation=0.9,
)
hdxms_data_list.append(hdxms_data)
./data/ecDHFR_tutorial.csv
rangeslist included !
check the basic statics_info of the hdxms_data_list
[3]:
from pigeon_feather.hxio import get_all_statics_info
get_all_statics_info(hdxms_data_list)
============================================================
HDX-MS Data Statistics
============================================================
States names: ['APO', 'TRI']
Time course (s): [46.0, 373.5, 572.5, 2011.0, 7772.0, 30811.5, 43292.0]
Number of time points: 7
Protein sequence length: 174
Average coverage: 0.97
Number of unique peptides: 261
Average peptide length: 9.8
Redundancy (based on average coverage): 14.7
Average peptide length to redundancy ratio: 0.7
Backexchange average, IQR: 0.27, 0.26
============================================================
Load the raw spectrum
[4]:
# spectrum could be easily loaded to the hdxms_data object
for i in range(len(tables)):
load_raw_ms_to_hdxms_data(
hdxms_data,
raw_spectra_paths[i],
)
Removed 0 peptides from state APO due to missing raw MS data.
Removed 70 peptides from state APO due to high back exchange.
Removed 2 peptides from state TRI due to missing raw MS data.
Removed 70 peptides from state TRI due to high back exchange.
Done loading raw MS data.
Note: One common error is that the correct spectra file cannot be found. Please ensure that the protein_state.state_name
corresponds to the files in the spectrum folder, with the correct time points and charge states.
[7]:
# save the raw data as a pickle file
import pickle
today = datetime.date.today().strftime("%Y%m%d")
today = "20240722"
with open(f"./data/hdxms_data_raw_{today}.pkl", "wb") as f:
pickle.dump(hdxms_data_list, f)
# with open(f"./data/hdxms_data_raw_{today}.pkl", "rb") as f:
# hdxms_data_list = pickle.load(f)