Optimized Cython¶
This document contains information about the optimized Cython data structures and algorithms available within this package.
The main data structures are (see DataStructures - Alignment for a detailed description):
CyPeakgroupWrapperOnly(wraps a C++ peakgroup)CyPrecursorWrapperOnly(wraps a C++ precursor)CyPrecursorGroup(a precursor group, a Cython version ofPrecursorGroup)
For linear interpolation and retention time transformation, the
CyLinearInterpolateWrapper wraps a C++ interpolation function,
allowing access to a very fast linear interpolator. The
CyLightTransformationData is a Cython version of
LightTransformationData.
The CyDataCacher is a Cython class that holds pairwise RT alignment
data cached for later use.
Finally, the MST alignment algorithm can be called through
static_cy_alignBestCluster.
Peakgroup (optimized)¶
-
class
msproteomicstoolslib.cython._optimized.CyPeakgroupWrapperOnly¶ Bases:
objectSee
PeakGroupBasefor a detailed description.This implementation stores a pointer to a C++ object holding the actual data. The data access works very similarly as for any
PeakGroupBase.-
getPeptide()¶
-
get_cluster_id()¶
-
get_dscore()¶
-
get_fdr_score()¶
-
get_feature_id()¶
-
get_intensity()¶
-
get_normalized_retentiontime()¶
-
get_value()¶
-
select_this_peakgroup()¶
-
setClusterID()¶
-
set_fdr_score()¶
-
set_feature_id()¶
-
set_intensity()¶
-
set_normalized_retentiontime()¶
-
set_value()¶
-
Precursor (optimized)¶
-
class
msproteomicstoolslib.cython._optimized.CyPrecursorWrapperOnly¶ Bases:
objectA set of peakgroups that belong to the same precursor in a single run.
Each precursor has a backreference to its precursor group identifier it belongs to, the run it belongs to as well as its amino acid sequence and protein name.
Each precursor has a list of
CyPeakgroupWrapperOnlythat are found in the chromatogram of this precursor in this particular run.-
add_peakgroup_tpl()¶ Adds a peakgroup to this precursor.
- The peakgroup should be a tuple of length 4 with the following components:
- id
- quality score (FDR)
- retention time (normalized)
3. intensity (4. d_score optional)
-
getAllPeakgroups()¶
-
getAllPrecursors()¶
-
getClusteredPeakgroups()¶
-
getProteinName()¶
-
getRunId()¶
-
getSequence()¶
-
get_all_peakgroups()¶
-
get_best_peakgroup()¶
-
get_decoy()¶
-
get_id()¶
-
get_selected_peakgroup()¶ return the selected peakgroup of this precursor, we can only select 1 or zero groups per chromatogram!
-
printAddresses()¶
-
setProteinName()¶
-
setSequence()¶
-
set_decoy()¶
-
set_precursor_group()¶
-
unselect_all()¶
-
PrecursorGroup (optimized)¶
-
class
msproteomicstoolslib.cython._optimized.CyPrecursorGroup¶ Bases:
objectSee
PrecursorGroupfor a description.This implementation is pure Cython.
-
- self.peptide_group_label_ Identifier or precursor group
-
- self.run_ Reference to the
Runwhere this PrecursorGroup is from
-
- self.precursors_ List of
CyPrecursorWrapperOnly
-
addPrecursor(self, precursor)¶ Add precursor to peptide group
-
getAllPeakgroups(self)¶ Generator of all peakgroups attached to the precursors in this group
-
getAllPrecursors(self)¶ Return a list of all precursors in this precursor group
-
getOverallBestPeakgroup(self)¶ Get the best peakgroup (by fdr score) of all precursors contained in this precursor group
-
getPeptideGroupLabel(self)¶ Get peptide group label
-
getPrecursor(self, curr_id)¶ Get the precursor for the given transition group id
-
get_decoy()¶ Whether the current peptide is a decoy or not
Returns: decoy – Whether the peptide is decoy or not Return type: bool
-
CyLinearInterpolateWrapper (optimized)¶
-
class
msproteomicstoolslib.cython._optimized.CyLinearInterpolateWrapper¶ Bases:
objectCython wrapper around c_linear_interpolate
Another smoother that interpolates between the given data points. It is fast because its written in C++.
This class expectes already smoothed x,y data (e.g. computed using a lowess or spline smoothing) but for applying the transformation, new x-values will be requested. The corresponding y values will be calculated by interpolation.
- The class provides the following methods:
- def __init__(self, x, y, double abs_err): initialize with two vectors, x and y
- def predict(self, list xnew): predict for Python
- cdef double predict_cy(self, double xnew): predict for Cython (low overhead)
-
predict()¶ Prediction for Python, returns a Python list
CyLightTransformationData (optimized)¶
-
class
msproteomicstoolslib.cython._optimized.CyLightTransformationData¶ Bases:
objectCython implementation of
LightTransformationDataA lightweight data structure to store a transformation between retention times of multiple runs.
-
addData()¶ Add raw data for the transformation between two runs
-
addTrafo()¶ Add transformation between two runs
-
getData()¶
-
getReferenceRunID()¶
-
getStdev()¶
-
getTrafo()¶
-
getTransformation()¶
-
DataCacher (optimized)¶
-
class
msproteomicstoolslib.algorithms.alignment.DataCacher.CyDataCacher¶ Bases:
objectWrapper around c_data_cache which allows storage of multiple lists rt/fdr values for later alignment.
Basically the data cacher allows storage of an fdr and a rt vector for each of N runs.
Retrieval occurs by asking for these vectors for a specific combination of runs, the cacher returns a vector of RT pairs that occur in both runs.
-
appendValuesForPeptide()¶ Store the retention time values for a single peptide across all runs.
Parameters: cached_values (list( list( double ) )) – A list of length N (number of runs) where for each run a pair of values is provided: (fdr,rt). In case the peptide was identified, None can be provided instead.
-
retrieveValues()¶ Retrieve all paired RT values for two given runs
Parameters: - run1 (int) – Index of the the first run
- run2 (int) – Index of the the second run
Returns: tuple(list(double), list(double))
Return type: the two lists containing matched RT values
-
MST Algorithm (optimized)¶
-
msproteomicstoolslib.cython._optimized.static_cy_alignBestCluster()¶ See
TreeConsensusAlignmentfor a detailed description.This is a Cython implementation of
TreeConsensusAlignment.alignBestClusterwhich uses an minimum spanning tree (MST) for alignment.Parameters: - multipeptides (list of
Multipeptide) – a list of multipeptides on which the alignment should be performed. After alignment, each peakgroup that should be quantified can be retrieved by calling get_selected_peakgroups() on the multipeptide. - tree (list of tuple) – a minimum spanning tree (MST) represented as list of edges (for example [(‘0’, ‘1’), (‘1’, ‘2’)] ). Node names need to correspond to run ids.
- tr_data (
CyLightTransformationData) – structure to hold binary transformations between two different retention time spaces - aligned_fdr_cutoff (float) – maximal FDR that a peakgroup needs to reach to be considered for extension (extension FDR)
- fdr_cutoff (float) – maximal FDR that at least one peakgroup needs to reach (seed FDR)
- correctRT_using_pg (bool) – use the apex of the aligned peak group as the input for the next alignment during MST traversal (opposed to using the transformed RT plain)
- max_rt_diff (float) – maximal difference in retention time to be used to look for a matching peakgroup in an adjacent run
- stdev_max_rt_per_run (float) – use a different maximal RT tolerance for each alignment, depending on the goodness of the alignment. The RT tolerance used by the algorithm will be the standard deviation times stdev_max_rt_per_run.
- use_local_stdev (float) – use RT-local standard deviation (experimental)
- max_rt_diff_isotope (float) – maximal difference in retention time between two isotopic pairs or precursors with the same charge state
Returns: None
- multipeptides (list of