DataStructures - Alignment¶
This document contains information about the data structures used in the TRIC algorithm.
Run
contains all data pertaining to a LC-MS/MS run, particularly references to measured precursorsPrecursorGroup
represents a set of precursors (e.g. precursors deriving from the same peptide sequence but identified by different charge states and isotopic labelling); see alsoCyPrecursorGroup
for a Cython implementation- A Precursor represents a single precursor (e.g. a single measured analyte with a precursor m/z identified by its chemical formula, charge state and isotopic labelling)
PrecursorBase
is a base implementation of a PrecursorPrecursor
is the implementation of a Precursor using minimal memory, see alsoCyPrecursorWrapperOnly
for a Cython implementationGeneralPrecursor
is the default implementation of a Precursor
- A peak group represents a single RT region in the chromatogram of a single Precursor
PeakGroupBase
is a base implementation of a PeakgroupMinimalPeakGroup
is the implementation of a Peakgroup using minimal memory, see alsoCyPeakgroupWrapperOnly
for a Cython implementationGeneralPeakGroup
is the default implementation of a PeakgroupGuiPeakGroup
is the implementation used by the GUI
Run
Module¶
Run¶
-
class
msproteomicstoolslib.data_structures.Run.
Run
(header, header_dict, runid, orig_input_filename=None, filename=None, aligned_filename=None, useCython=False)¶ A run contains references to identified precursor groups and precursors.
The run stores a reference to precursor groups (heavy/light pairs) identified in the run. It has a unique id and stores the headers from the csv
- A run has the following attributes:
- an identifier that is unique to this run
- a filename where it originally came from
- a dictionary of precursor groups which are accessible through the following functions - getPrecursorGroup - hasPrecursor - getPrecursor - addPrecursor
Parameters: - header (str) – Run header
- header_dict (dict) – Run header dictionary
- runid (str) – Run header dictionary
- orig_input_filename (str) – Original filname of the csv file
- filename (str) – Original filname of the mzML (e.g. the column “filename”)
- aligned_filename (str) – Aligned filename (e.g. the column “align_origfilename”)
-
addPrecursor
(precursor, peptide_group_label)¶ Add a new precursor to the run using a specific peptide label.
If the corresponding precursor group does not yet exist, a new precursor group is created. Otherwise the precursor is added to the precursor group.
Parameters: - precursor (
CyPrecursor
,Precursor
orGeneralPrecursor
) – Precursor to be added (e.g. PEPT[+98]IDE/2) - peptide_group_label (str) – Label of the corresponding peptide group (e.g. PEPTIDE)
- precursor (
-
getPrecursor
(peptide_group_label, trgr_id)¶ Return precursor corresponding to the given peptide label group and the transition group id
-
getPrecursorGroup
(curr_id)¶
-
get_aligned_filename
()¶
-
get_best_peaks
()¶ Return the best peakgroup for each peptide precursor
-
get_best_peaks_with_cutoff
(cutoff)¶ Return the best peak per run (with cutoff)
-
get_id
()¶
-
get_openswath_filename
()¶
-
get_original_filename
()¶
-
hasPrecursor
(peptide_group_label, trgr_id)¶
PrecursorGroup
Module¶
PrecursorGroup¶
-
class
msproteomicstoolslib.data_structures.PrecursorGroup.
PrecursorGroup
(peptide_group_label, run)¶ Bases:
object
A set of precursors that are isotopically modified versions or different charge states of each other.
A collection of precursors that are isotopically modified versions or different charge states of the same underlying peptide sequence. Generally these are heavy/light forms. This class groups these Precursors together.
-
- self.peptide_group_label_
Identifier or precursor group
-
- self.run_
Reference to the
Run
where this PrecursorGroup is from
-
- self.precursors_
List of actual precursors
-
addPrecursor
(self, precursor)¶ Add precursor to peptide group
-
getAllPeakgroups
(self)¶ Generator of all peakgroups attached to the precursors in this group
-
getAllPrecursors
(self)¶ Return a list of all precursors in this precursor group
-
getOverallBestPeakgroup
(self)¶ Get the best peakgroup (by fdr score) of all precursors contained in this precursor group
-
getPeptideGroupLabel
(self)¶ Get peptide group label
-
getPrecursor
(self, curr_id)¶ Get the precursor for the given transition group id
-
get_decoy
()¶ Whether the current peptide is a decoy or not
Returns: decoy – Whether the peptide is decoy or not Return type: bool
-
peptide_group_label_
¶
-
precursors_
¶
-
run_
¶
-
Precursor
Module¶
PrecursorBase¶
GeneralPrecursor¶
-
class
msproteomicstoolslib.data_structures.Precursor.
GeneralPrecursor
(this_id, run)¶ Bases:
msproteomicstoolslib.data_structures.Precursor.PrecursorBase
A set of peakgroups that belong to the same precursor in a single run.
== Implementation details ==
This is a plain implementation where all peakgroup objects are stored in a simple list, this is not very efficient since many objects need to be created which in Python takes a lot of memory.
-
add_peakgroup
(peakgroup)¶
-
append
(transitiongroup)¶
-
find_closest_in_iRT
(delta_assay_rt)¶
-
getProteinName
()¶
-
getRun
()¶
-
getRunId
()¶
-
getSequence
()¶
-
get_all_peakgroups
()¶
-
get_best_peakgroup
()¶ Return the best peakgroup according to fdr score
-
get_run_id
()¶
-
get_selected_peakgroup
()¶
-
id
¶
-
peakgroups
¶
-
precursor_group
¶
-
protein_name
¶
-
run
¶
-
sequence
¶
-
setProteinName
(p)¶
-
setSequence
(s)¶
-
set_precursor_group
(p)¶
-
Precursor¶
-
class
msproteomicstoolslib.data_structures.Precursor.
Precursor
(this_id, run)¶ Bases:
msproteomicstoolslib.data_structures.Precursor.PrecursorBase
A set of peakgroups that belong to the same precursor in a single run.
Each precursor has a backreference to its precursor group (heavy/light pair) it belongs to, the run it belongs to as well as its amino acid sequence. Furthermore, a unique id for the precursor and the protein name are stored.
A precursor can return its best transition group, the selected peakgroup, or can return the transition group that is closest to a given iRT time. Its id is the transition_group_id (e.g. the id of the chromatogram)
The “selected” peakgroup is represented by the peakgroup that belongs to cluster number 1 (cluster_id == 1) which in this case is “special”.
== Implementation details ==
For memory reasons, we store all information about the peakgroup in a tuple (invariable). This tuple contains a unique feature id, a score and a retention time. Additionally, we also store, in which cluster the peakgroup belongs (if the user sets this).
- A peakgroup has the following attributes:
- an identifier that is unique among all other precursors
- a set of peakgroups
- a back-reference to the run it belongs to
-
add_peakgroup_tpl
(pg_tuple, tpl_id, cluster_id=-1)¶ Adds a peakgroup to this precursor.
- The peakgroup should be a tuple of length 4 with the following components:
- id
- quality score (FDR)
- retention time (normalized)
3. intensity (4. d_score optional)
-
cluster_ids_
¶
-
find_closest_in_iRT
(delta_assay_rt)¶
-
getAllPeakgroups
()¶
-
getClusteredPeakgroups
()¶
-
getPrecursorGroup
()¶
-
getProteinName
()¶
-
getRun
()¶
-
getRunId
()¶
-
getSequence
()¶
-
get_all_peakgroups
()¶
-
get_best_peakgroup
()¶
-
get_id
()¶
-
get_run_id
()¶
-
get_selected_peakgroup
()¶
-
id
¶
-
peakgroups_
¶
-
precursor_group
¶
-
protein_name
¶
-
run
¶
-
select_pg
(this_id)¶
-
sequence
¶
-
setClusterID
(this_id, cl_id)¶
-
setProteinName
(p)¶
-
setSequence
(s)¶
-
set_precursor_group
(p)¶
-
unselect_all
()¶
-
unselect_pg
(this_id)¶
PeakGroup
Module¶
PeakGroupBase¶
-
class
msproteomicstoolslib.data_structures.PeakGroup.
PeakGroupBase
¶ Bases:
object
A single peakgroup that is defined by a retention time in a chromatogram of multiple transitions. Additionally it has an fdr_score and it has an aligned RT (e.g. retention time in normalized space). A peakgroup can be selected for quantification or not (this is stored as having cluster_id == 1).
For each precursor, there can be multiple clusters of peakgroups, with the first (or best) one generally being in cluster 1, therefore we store a cluster id. Generally, an alignment algorithm will assign a cluster id to zero, one or more peakgroups of each precursor.
-
cluster_id_
¶
-
fdr_score
¶
-
get_cluster_id
()¶
-
get_fdr_score
()¶
-
get_feature_id
()¶
-
get_intensity
()¶
-
get_normalized_retentiontime
()¶
-
get_value
(value)¶
-
id_
¶
-
intensity_
¶
-
is_selected
()¶
-
normalized_retentiontime
¶
-
select_this_peakgroup
()¶
-
set_fdr_score
(fdr_score)¶
-
set_feature_id
(id_)¶
-
set_intensity
(intensity)¶
-
set_normalized_retentiontime
(normalized_retentiontime)¶
-
set_value
(key, value)¶
-
MinimalPeakGroup¶
-
class
msproteomicstoolslib.data_structures.PeakGroup.
MinimalPeakGroup
(unique_id, fdr_score, assay_rt, selected, cluster_id, peptide, intensity=None, dscore=None)¶ Bases:
msproteomicstoolslib.data_structures.PeakGroup.PeakGroupBase
See
PeakGroupBase
for a detailed description.This implementation is designed to be immutable as the actual data is stored in the
Precursor
class which generates this object on-the-fly to improve memory performance.-
getPeptide
()¶
-
get_cluster_id
()¶
-
get_dscore
()¶
-
print_out
()¶
-
select_this_peakgroup
()¶ Select this peakgroup for quantification (assigns cluster id 1; works since it calls back to its Precursor obj)
-
setClusterID
(id_)¶ Set cluster id (works since it calls back to its Precursor obj)
-
set_fdr_score
(fdr_score)¶ Raises exception as this object is immutable
-
set_feature_id
(id_)¶ Raises exception as this object is immutable
-
set_intensity
(intensity)¶ Raises exception as this object is immutable
-
set_normalized_retentiontime
(normalized_retentiontime)¶ Raises exception as this object is immutable
-
GuiPeakGroup¶
-
class
msproteomicstoolslib.data_structures.PeakGroup.
GuiPeakGroup
(fdr_score, intensity, leftWidth, rightWidth, assay_rt, peptide)¶ Bases:
msproteomicstoolslib.data_structures.PeakGroup.PeakGroupBase
See
PeakGroupBase
for a detailed description.This implementation stores additional information including left/right width.
-
get_value
(value)¶
-
GeneralPeakGroup¶
-
class
msproteomicstoolslib.data_structures.PeakGroup.
GeneralPeakGroup
(row, run, peptide)¶ Bases:
msproteomicstoolslib.data_structures.PeakGroup.PeakGroupBase
See
PeakGroupBase
for a detailed description.This implementation stores the full row read from the CSV file including all meta-data. It is generally not recommended to use this implementation unless for toy examples.
-
getPeptide
()¶
-
get_dscore
()¶
-
get_value
(value)¶
-
peptide
¶
-
print_out
()¶
-
row
¶
-
run
¶
-
setClusterID
(clid)¶
-
set_value
(key, value)¶
-
DataStructures - Basic¶
Aminoacides
Module¶
Aminoacid¶
-
class
msproteomicstoolslib.data_structures.aminoacides.
Aminoacid
(name, code, code3, composition)¶ Class to hold information about a single Amino Acid (AA)
-
code
= None¶ One letter code
-
code3
= None¶ Three letter code
-
composition
= None¶ Elemental composition
-
elementsLib
= None¶ Library of elements
-
name
= None¶ Full name of the AA
-
Modifications
Module¶
Modification¶
Modifications¶
-
class
msproteomicstoolslib.data_structures.modifications.
Modifications
(default_mod_file=None)¶ A collection of modifications
-
appendModification
(modification)¶
-
is_bool
(expression)¶
-
printModifications
()¶
-
readModificationsFile
(modificationsfile)¶ It reads a tsv file with additional modifications. Modifications will be appended to the default modifications of this class. Tsv file headers & an example: modified-AA TPP-nomenclature Unimod-Accession ProteinPilot-nomenclature is_a_labeling composition-dictionary S S[167] 21 [Pho] False {‘H’ : 1,’O’ : 3, ‘P’ : 1}
-
translateModificationsFromSequence
(sequence, code, aaLib=None)¶ Returns a Peptide object, given a sequence with modifications in any of the available codes. The code (TPP, Unimod,...) to be translated must be given.
-
Peak
Module¶
Peptide
Module¶
Peptide¶
-
class
msproteomicstoolslib.data_structures.peptide.
Peptide
(sequence, modifications={}, protein='', aminoacidLib=None)¶ -
addSpectrum
(spectrum)¶ Deprecated definition
-
all_ions
(ionseries=None, frg_z_list=[1, 2], fragmentlossgains=[0], mass_limits=None, label='')¶ Returns all the fragment ions of the peptide in a tuple of two objects: (annotated, ionmasses_only) annotated is a list of tuples as : (ion_type, ion_number, ion_charge, lossgain, fragment_mz) ionmasses_only is a list of fragment masses. When ionseries is not provided, all existing ion series (see: Peptide.iontypes) will be calculated. When frg_z_list is not provided, fragment ion charge states +1 and +2 will be used.
-
calIsoforms
(switchingModification, modLibrary)¶ This returns the full list of peptide species of the same peptide family (isobaric, same composition, different modification site. The list is given as a list of Peptide objects. switchingModification must be given as a Modification object.
-
cal_UIS
(otherPeptidesList, UISorder=2, ionseries=None, fragmentlossgains=[0], precision=1e-08, frg_z_list=[1, 2], mass_limits=None)¶ It calculates the UIS for a given peptide referred to a given list of other peptides. It returns a tuple of two objects all_UIS, and all_UIS_annotated. all_UIS contains only a mass list.
-
comparePeptideFragments
(otherPeptidesList, ionseries=None, fragmentlossgains=[0], precision=1e-08, frg_z_list=[1, 2])¶ This returns a tuple of lists: (CommonFragments, differentialFragments). The differentialFragmentMasses are the masses of the __self__ peptide are not shared with any of the peptides listed in the otherPeptidesList. otherPeptidesList must be a list of Peptide objects. The fragments are reported as a tuple : (ionserie,ion_number,ion_charge,frqgmentlossgain,mass)
-
fragmentSequence
(ion_type, frg_number)¶
-
getDeltaMassFromSequence
(sequence)¶
-
getMZ
(charge, label='')¶
-
getMZfragment
(ion_type, ion_number, ion_charge, label='', fragmentlossgain=0.0)¶
-
getSequenceWithMods
(code)¶
-
get_decoy_Q3
(frg_serie, frg_nr, frg_z, blackList=[], max_tries=1000)¶
-
pseudoreverse
(sequence='None')¶
-
shuffle_sequence
()¶
-
Residues
Module¶
Residues¶
-
class
msproteomicstoolslib.data_structures.Residues.
Residues
(type='mono')¶ A class that contains information elements, amino acids and modifications. It stores mainly masse of these but also chemical formulas.
- The most commonly used properties are:
- Residues.average_elments : element weights
- Residues.monoisotopic_elments : element weights
- Residues.aa_codes : Three and One letter amino acid codes
- Residues.aa_names : English names of the amino acids
- Residues.aa_sum_formulas_text : Chemical formulas of all amino acids
- Residues.aa_sum_formulas: Chemical formulas of all amino acids as hash
- Residues.mass_xxx: monoisotopic masses of different compounds (NH3, H2O, CO, HPO4 etc)
- Residues.average_data: average weight of amino acids
- Residues.monoisotopic_data: monoisotopic weight of amino acids
- Residues.monoisotopic_mod: monoisotopic modification data
- Residues.mod_mapping: mapping of + notation to absolute weight notation (K[+8] to K[136])
- Residues.Hydropathy: Hydropathy of amino acids (gravy scores)
- TODO hydrophobicity of amino acids
- TODO basicity of amino acids
- TODO helicity of amino acids
- Residues.pI: pI of amino acids