Optimized Cython¶

This document contains information about the optimized Cython data structures and algorithms available within this package.

The main data structures are (see DataStructures - Alignment for a detailed description):

CyPeakgroupWrapperOnly (wraps a C++ peakgroup)
CyPrecursorWrapperOnly (wraps a C++ precursor)
CyPrecursorGroup (a precursor group, a Cython version of PrecursorGroup)

For linear interpolation and retention time transformation, the CyLinearInterpolateWrapper wraps a C++ interpolation function, allowing access to a very fast linear interpolator. The CyLightTransformationData is a Cython version of LightTransformationData.

The CyDataCacher is a Cython class that holds pairwise RT alignment data cached for later use.

Finally, the MST alignment algorithm can be called through static_cy_alignBestCluster.

Peakgroup (optimized)¶

class msproteomicstoolslib.cython._optimized.CyPeakgroupWrapperOnly¶

Bases: object

See PeakGroupBase for a detailed description.

This implementation stores a pointer to a C++ object holding the actual data. The data access works very similarly as for any PeakGroupBase.

getPeptide()¶

get_cluster_id()¶

get_dscore()¶

get_fdr_score()¶

get_feature_id()¶

get_intensity()¶

get_normalized_retentiontime()¶

get_value()¶

select_this_peakgroup()¶

setClusterID()¶

set_fdr_score()¶

set_feature_id()¶

set_intensity()¶

set_normalized_retentiontime()¶

set_value()¶

Precursor (optimized)¶

class msproteomicstoolslib.cython._optimized.CyPrecursorWrapperOnly¶

Bases: object

A set of peakgroups that belong to the same precursor in a single run.

Each precursor has a backreference to its precursor group identifier it belongs to, the run it belongs to as well as its amino acid sequence and protein name.

Each precursor has a list of CyPeakgroupWrapperOnly that are found in the chromatogram of this precursor in this particular run.

add_peakgroup_tpl()¶

Adds a peakgroup to this precursor.

The peakgroup should be a tuple of length 4 with the following components:

id
quality score (FDR)
retention time (normalized)

3. intensity (4. d_score optional)

getAllPeakgroups()¶

getAllPrecursors()¶

getClusteredPeakgroups()¶

getProteinName()¶

getRunId()¶

getSequence()¶

get_all_peakgroups()¶

get_best_peakgroup()¶

get_decoy()¶

get_id()¶

get_selected_peakgroup()¶: return the selected peakgroup of this precursor, we can only select 1 or zero groups per chromatogram!

printAddresses()¶

setProteinName()¶

setSequence()¶

set_decoy()¶

set_precursor_group()¶

unselect_all()¶

PrecursorGroup (optimized)¶

class msproteomicstoolslib.cython._optimized.CyPrecursorGroup¶

Bases: object

See PrecursorGroup for a description.

This implementation is pure Cython.

- self.peptide_group_label_: Identifier or precursor group

- self.run_: Reference to the Run where this PrecursorGroup is from

- self.precursors_: List of CyPrecursorWrapperOnly

addPrecursor(self, precursor)¶: Add precursor to peptide group

getAllPeakgroups(self)¶: Generator of all peakgroups attached to the precursors in this group

getAllPrecursors(self)¶: Return a list of all precursors in this precursor group

getOverallBestPeakgroup(self)¶: Get the best peakgroup (by fdr score) of all precursors contained in this precursor group

getPeptideGroupLabel(self)¶: Get peptide group label

getPrecursor(self, curr_id)¶: Get the precursor for the given transition group id

get_decoy()¶

Whether the current peptide is a decoy or not

Returns:	decoy – Whether the peptide is decoy or not
Return type:	bool

CyLinearInterpolateWrapper (optimized)¶

class msproteomicstoolslib.cython._optimized.CyLinearInterpolateWrapper¶

Bases: object

Cython wrapper around c_linear_interpolate

Another smoother that interpolates between the given data points. It is fast because its written in C++.

This class expectes already smoothed x,y data (e.g. computed using a lowess or spline smoothing) but for applying the transformation, new x-values will be requested. The corresponding y values will be calculated by interpolation.

The class provides the following methods:

def __init__(self, x, y, double abs_err): initialize with two vectors, x and y
def predict(self, list xnew): predict for Python
cdef double predict_cy(self, double xnew): predict for Cython (low overhead)

predict()¶: Prediction for Python, returns a Python list

CyLightTransformationData (optimized)¶

class msproteomicstoolslib.cython._optimized.CyLightTransformationData¶

Bases: object

Cython implementation of LightTransformationData

A lightweight data structure to store a transformation between retention times of multiple runs.

addData()¶: Add raw data for the transformation between two runs

addTrafo()¶: Add transformation between two runs

getData()¶

getReferenceRunID()¶

getStdev()¶

getTrafo()¶

getTransformation()¶

DataCacher (optimized)¶

class msproteomicstoolslib.algorithms.alignment.DataCacher.CyDataCacher¶

Bases: object

Wrapper around c_data_cache which allows storage of multiple lists rt/fdr values for later alignment.

Basically the data cacher allows storage of an fdr and a rt vector for each of N runs.

Retrieval occurs by asking for these vectors for a specific combination of runs, the cacher returns a vector of RT pairs that occur in both runs.

appendValuesForPeptide()¶

Store the retention time values for a single peptide across all runs.

Parameters:	cached_values (list( list( double ) )) – A list of length N (number of runs) where for each run a pair of values is provided: (fdr,rt). In case the peptide was identified, None can be provided instead.

retrieveValues()¶

Retrieve all paired RT values for two given runs

Parameters:	run1 (int) – Index of the the first run run2 (int) – Index of the the second run
Returns:	tuple(list(double), list(double))
Return type:	the two lists containing matched RT values

MST Algorithm (optimized)¶

msproteomicstoolslib.cython._optimized.static_cy_alignBestCluster()¶

See TreeConsensusAlignment for a detailed description.

This is a Cython implementation of TreeConsensusAlignment.alignBestCluster which uses an minimum spanning tree (MST) for alignment.

Parameters:

multipeptides (list of Multipeptide) – a list of multipeptides on which the alignment should be performed. After alignment, each peakgroup that should be quantified can be retrieved by calling get_selected_peakgroups() on the multipeptide.
tree (list of tuple) – a minimum spanning tree (MST) represented as list of edges (for example [(‘0’, ‘1’), (‘1’, ‘2’)] ). Node names need to correspond to run ids.
tr_data (CyLightTransformationData) – structure to hold binary transformations between two different retention time spaces
aligned_fdr_cutoff (float) – maximal FDR that a peakgroup needs to reach to be considered for extension (extension FDR)
fdr_cutoff (float) – maximal FDR that at least one peakgroup needs to reach (seed FDR)
correctRT_using_pg (bool) – use the apex of the aligned peak group as the input for the next alignment during MST traversal (opposed to using the transformed RT plain)
max_rt_diff (float) – maximal difference in retention time to be used to look for a matching peakgroup in an adjacent run
stdev_max_rt_per_run (float) – use a different maximal RT tolerance for each alignment, depending on the goodness of the alignment. The RT tolerance used by the algorithm will be the standard deviation times stdev_max_rt_per_run.
use_local_stdev (float) – use RT-local standard deviation (experimental)
max_rt_diff_isotope (float) – maximal difference in retention time between two isotopic pairs or precursors with the same charge state

Returns:

None

Table Of Contents

Previous topic

Next topic

This Page

Optimized Cython¶

Peakgroup (optimized)¶

Precursor (optimized)¶

PrecursorGroup (optimized)¶

CyLinearInterpolateWrapper (optimized)¶

CyLightTransformationData (optimized)¶

DataCacher (optimized)¶

MST Algorithm (optimized)¶