Optimized Cython¶
This document contains information about the optimized Cython data structures and algorithms available within this package.
The main data structures are (see DataStructures - Alignment for a detailed description):
CyPeakgroupWrapperOnly
(wraps a C++ peakgroup)CyPrecursorWrapperOnly
(wraps a C++ precursor)CyPrecursorGroup
(a precursor group, a Cython version ofPrecursorGroup
)
For linear interpolation and retention time transformation, the
CyLinearInterpolateWrapper
wraps a C++ interpolation function,
allowing access to a very fast linear interpolator. The
CyLightTransformationData
is a Cython version of
LightTransformationData
.
The CyDataCacher
is a Cython class that holds pairwise RT alignment
data cached for later use.
Finally, the MST alignment algorithm can be called through
static_cy_alignBestCluster
.
Peakgroup (optimized)¶
-
class
msproteomicstoolslib.cython._optimized.
CyPeakgroupWrapperOnly
¶ Bases:
object
See
PeakGroupBase
for a detailed description.This implementation stores a pointer to a C++ object holding the actual data. The data access works very similarly as for any
PeakGroupBase
.-
getPeptide
()¶
-
get_cluster_id
()¶
-
get_dscore
()¶
-
get_fdr_score
()¶
-
get_feature_id
()¶
-
get_intensity
()¶
-
get_normalized_retentiontime
()¶
-
get_value
()¶
-
select_this_peakgroup
()¶
-
setClusterID
()¶
-
set_fdr_score
()¶
-
set_feature_id
()¶
-
set_intensity
()¶
-
set_normalized_retentiontime
()¶
-
set_value
()¶
-
Precursor (optimized)¶
-
class
msproteomicstoolslib.cython._optimized.
CyPrecursorWrapperOnly
¶ Bases:
object
A set of peakgroups that belong to the same precursor in a single run.
Each precursor has a backreference to its precursor group identifier it belongs to, the run it belongs to as well as its amino acid sequence and protein name.
Each precursor has a list of
CyPeakgroupWrapperOnly
that are found in the chromatogram of this precursor in this particular run.-
add_peakgroup_tpl
()¶ Adds a peakgroup to this precursor.
- The peakgroup should be a tuple of length 4 with the following components:
- id
- quality score (FDR)
- retention time (normalized)
3. intensity (4. d_score optional)
-
getAllPeakgroups
()¶
-
getAllPrecursors
()¶
-
getClusteredPeakgroups
()¶
-
getProteinName
()¶
-
getRunId
()¶
-
getSequence
()¶
-
get_all_peakgroups
()¶
-
get_best_peakgroup
()¶
-
get_decoy
()¶
-
get_id
()¶
-
get_selected_peakgroup
()¶ return the selected peakgroup of this precursor, we can only select 1 or zero groups per chromatogram!
-
printAddresses
()¶
-
setProteinName
()¶
-
setSequence
()¶
-
set_decoy
()¶
-
set_precursor_group
()¶
-
unselect_all
()¶
-
PrecursorGroup (optimized)¶
-
class
msproteomicstoolslib.cython._optimized.
CyPrecursorGroup
¶ Bases:
object
See
PrecursorGroup
for a description.This implementation is pure Cython.
-
- self.peptide_group_label_
Identifier or precursor group
-
- self.run_
Reference to the
Run
where this PrecursorGroup is from
-
- self.precursors_
List of
CyPrecursorWrapperOnly
-
addPrecursor
(self, precursor)¶ Add precursor to peptide group
-
getAllPeakgroups
(self)¶ Generator of all peakgroups attached to the precursors in this group
-
getAllPrecursors
(self)¶ Return a list of all precursors in this precursor group
-
getOverallBestPeakgroup
(self)¶ Get the best peakgroup (by fdr score) of all precursors contained in this precursor group
-
getPeptideGroupLabel
(self)¶ Get peptide group label
-
getPrecursor
(self, curr_id)¶ Get the precursor for the given transition group id
-
get_decoy
()¶ Whether the current peptide is a decoy or not
Returns: decoy – Whether the peptide is decoy or not Return type: bool
-
CyLinearInterpolateWrapper (optimized)¶
-
class
msproteomicstoolslib.cython._optimized.
CyLinearInterpolateWrapper
¶ Bases:
object
Cython wrapper around c_linear_interpolate
Another smoother that interpolates between the given data points. It is fast because its written in C++.
This class expectes already smoothed x,y data (e.g. computed using a lowess or spline smoothing) but for applying the transformation, new x-values will be requested. The corresponding y values will be calculated by interpolation.
- The class provides the following methods:
- def __init__(self, x, y, double abs_err): initialize with two vectors, x and y
- def predict(self, list xnew): predict for Python
- cdef double predict_cy(self, double xnew): predict for Cython (low overhead)
-
predict
()¶ Prediction for Python, returns a Python list
CyLightTransformationData (optimized)¶
-
class
msproteomicstoolslib.cython._optimized.
CyLightTransformationData
¶ Bases:
object
Cython implementation of
LightTransformationData
A lightweight data structure to store a transformation between retention times of multiple runs.
-
addData
()¶ Add raw data for the transformation between two runs
-
addTrafo
()¶ Add transformation between two runs
-
getData
()¶
-
getReferenceRunID
()¶
-
getStdev
()¶
-
getTrafo
()¶
-
getTransformation
()¶
-
DataCacher (optimized)¶
-
class
msproteomicstoolslib.algorithms.alignment.DataCacher.
CyDataCacher
¶ Bases:
object
Wrapper around c_data_cache which allows storage of multiple lists rt/fdr values for later alignment.
Basically the data cacher allows storage of an fdr and a rt vector for each of N runs.
Retrieval occurs by asking for these vectors for a specific combination of runs, the cacher returns a vector of RT pairs that occur in both runs.
-
appendValuesForPeptide
()¶ Store the retention time values for a single peptide across all runs.
Parameters: cached_values (list( list( double ) )) – A list of length N (number of runs) where for each run a pair of values is provided: (fdr,rt). In case the peptide was identified, None can be provided instead.
-
retrieveValues
()¶ Retrieve all paired RT values for two given runs
Parameters: - run1 (int) – Index of the the first run
- run2 (int) – Index of the the second run
Returns: tuple(list(double), list(double))
Return type: the two lists containing matched RT values
-
MST Algorithm (optimized)¶
-
msproteomicstoolslib.cython._optimized.
static_cy_alignBestCluster
()¶ See
TreeConsensusAlignment
for a detailed description.This is a Cython implementation of
TreeConsensusAlignment.alignBestCluster
which uses an minimum spanning tree (MST) for alignment.Parameters: - multipeptides (list of
Multipeptide
) – a list of multipeptides on which the alignment should be performed. After alignment, each peakgroup that should be quantified can be retrieved by calling get_selected_peakgroups() on the multipeptide. - tree (list of tuple) – a minimum spanning tree (MST) represented as list of edges (for example [(‘0’, ‘1’), (‘1’, ‘2’)] ). Node names need to correspond to run ids.
- tr_data (
CyLightTransformationData
) – structure to hold binary transformations between two different retention time spaces - aligned_fdr_cutoff (float) – maximal FDR that a peakgroup needs to reach to be considered for extension (extension FDR)
- fdr_cutoff (float) – maximal FDR that at least one peakgroup needs to reach (seed FDR)
- correctRT_using_pg (bool) – use the apex of the aligned peak group as the input for the next alignment during MST traversal (opposed to using the transformed RT plain)
- max_rt_diff (float) – maximal difference in retention time to be used to look for a matching peakgroup in an adjacent run
- stdev_max_rt_per_run (float) – use a different maximal RT tolerance for each alignment, depending on the goodness of the alignment. The RT tolerance used by the algorithm will be the standard deviation times stdev_max_rt_per_run.
- use_local_stdev (float) – use RT-local standard deviation (experimental)
- max_rt_diff_isotope (float) – maximal difference in retention time between two isotopic pairs or precursors with the same charge state
Returns: None
- multipeptides (list of