Optimized Cython

This document contains information about the optimized Cython data structures and algorithms available within this package.

The main data structures are (see DataStructures - Alignment for a detailed description):

For linear interpolation and retention time transformation, the CyLinearInterpolateWrapper wraps a C++ interpolation function, allowing access to a very fast linear interpolator. The CyLightTransformationData is a Cython version of LightTransformationData.

The CyDataCacher is a Cython class that holds pairwise RT alignment data cached for later use.

Finally, the MST alignment algorithm can be called through static_cy_alignBestCluster.

Peakgroup (optimized)

class msproteomicstoolslib.cython._optimized.CyPeakgroupWrapperOnly

Bases: object

See PeakGroupBase for a detailed description.

This implementation stores a pointer to a C++ object holding the actual data. The data access works very similarly as for any PeakGroupBase.

getPeptide()
get_cluster_id()
get_dscore()
get_fdr_score()
get_feature_id()
get_intensity()
get_normalized_retentiontime()
get_value()
select_this_peakgroup()
setClusterID()
set_fdr_score()
set_feature_id()
set_intensity()
set_normalized_retentiontime()
set_value()

Precursor (optimized)

class msproteomicstoolslib.cython._optimized.CyPrecursorWrapperOnly

Bases: object

A set of peakgroups that belong to the same precursor in a single run.

Each precursor has a backreference to its precursor group identifier it belongs to, the run it belongs to as well as its amino acid sequence and protein name.

Each precursor has a list of CyPeakgroupWrapperOnly that are found in the chromatogram of this precursor in this particular run.

add_peakgroup_tpl()

Adds a peakgroup to this precursor.

The peakgroup should be a tuple of length 4 with the following components:
  1. id
  2. quality score (FDR)
  3. retention time (normalized)

3. intensity (4. d_score optional)

getAllPeakgroups()
getAllPrecursors()
getClusteredPeakgroups()
getProteinName()
getRunId()
getSequence()
get_all_peakgroups()
get_best_peakgroup()
get_decoy()
get_id()
get_selected_peakgroup()

return the selected peakgroup of this precursor, we can only select 1 or zero groups per chromatogram!

printAddresses()
setProteinName()
setSequence()
set_decoy()
set_precursor_group()
unselect_all()

PrecursorGroup (optimized)

class msproteomicstoolslib.cython._optimized.CyPrecursorGroup

Bases: object

See PrecursorGroup for a description.

This implementation is pure Cython.

- self.peptide_group_label_

Identifier or precursor group

- self.run_

Reference to the Run where this PrecursorGroup is from

- self.precursors_

List of CyPrecursorWrapperOnly

addPrecursor(self, precursor)

Add precursor to peptide group

getAllPeakgroups(self)

Generator of all peakgroups attached to the precursors in this group

getAllPrecursors(self)

Return a list of all precursors in this precursor group

getOverallBestPeakgroup(self)

Get the best peakgroup (by fdr score) of all precursors contained in this precursor group

getPeptideGroupLabel(self)

Get peptide group label

getPrecursor(self, curr_id)

Get the precursor for the given transition group id

get_decoy()

Whether the current peptide is a decoy or not

Returns:decoy – Whether the peptide is decoy or not
Return type:bool

CyLinearInterpolateWrapper (optimized)

class msproteomicstoolslib.cython._optimized.CyLinearInterpolateWrapper

Bases: object

Cython wrapper around c_linear_interpolate

Another smoother that interpolates between the given data points. It is fast because its written in C++.

This class expectes already smoothed x,y data (e.g. computed using a lowess or spline smoothing) but for applying the transformation, new x-values will be requested. The corresponding y values will be calculated by interpolation.

The class provides the following methods:
  • def __init__(self, x, y, double abs_err): initialize with two vectors, x and y
  • def predict(self, list xnew): predict for Python
  • cdef double predict_cy(self, double xnew): predict for Cython (low overhead)
predict()

Prediction for Python, returns a Python list

CyLightTransformationData (optimized)

class msproteomicstoolslib.cython._optimized.CyLightTransformationData

Bases: object

Cython implementation of LightTransformationData

A lightweight data structure to store a transformation between retention times of multiple runs.

addData()

Add raw data for the transformation between two runs

addTrafo()

Add transformation between two runs

getData()
getReferenceRunID()
getStdev()
getTrafo()
getTransformation()

DataCacher (optimized)

class msproteomicstoolslib.algorithms.alignment.DataCacher.CyDataCacher

Bases: object

Wrapper around c_data_cache which allows storage of multiple lists rt/fdr values for later alignment.

Basically the data cacher allows storage of an fdr and a rt vector for each of N runs.

Retrieval occurs by asking for these vectors for a specific combination of runs, the cacher returns a vector of RT pairs that occur in both runs.

appendValuesForPeptide()

Store the retention time values for a single peptide across all runs.

Parameters:cached_values (list( list( double ) )) – A list of length N (number of runs) where for each run a pair of values is provided: (fdr,rt). In case the peptide was identified, None can be provided instead.
retrieveValues()

Retrieve all paired RT values for two given runs

Parameters:
  • run1 (int) – Index of the the first run
  • run2 (int) – Index of the the second run
Returns:

tuple(list(double), list(double))

Return type:

the two lists containing matched RT values

MST Algorithm (optimized)

msproteomicstoolslib.cython._optimized.static_cy_alignBestCluster()

See TreeConsensusAlignment for a detailed description.

This is a Cython implementation of TreeConsensusAlignment.alignBestCluster which uses an minimum spanning tree (MST) for alignment.

Parameters:
  • multipeptides (list of Multipeptide) – a list of multipeptides on which the alignment should be performed. After alignment, each peakgroup that should be quantified can be retrieved by calling get_selected_peakgroups() on the multipeptide.
  • tree (list of tuple) – a minimum spanning tree (MST) represented as list of edges (for example [(‘0’, ‘1’), (‘1’, ‘2’)] ). Node names need to correspond to run ids.
  • tr_data (CyLightTransformationData) – structure to hold binary transformations between two different retention time spaces
  • aligned_fdr_cutoff (float) – maximal FDR that a peakgroup needs to reach to be considered for extension (extension FDR)
  • fdr_cutoff (float) – maximal FDR that at least one peakgroup needs to reach (seed FDR)
  • correctRT_using_pg (bool) – use the apex of the aligned peak group as the input for the next alignment during MST traversal (opposed to using the transformed RT plain)
  • max_rt_diff (float) – maximal difference in retention time to be used to look for a matching peakgroup in an adjacent run
  • stdev_max_rt_per_run (float) – use a different maximal RT tolerance for each alignment, depending on the goodness of the alignment. The RT tolerance used by the algorithm will be the standard deviation times stdev_max_rt_per_run.
  • use_local_stdev (float) – use RT-local standard deviation (experimental)
  • max_rt_diff_isotope (float) – maximal difference in retention time between two isotopic pairs or precursors with the same charge state
Returns:

None