Math

Smoothing Module

SmoothingNull

class msproteomicstoolslib.math.Smoothing.SmoothingNull

Null smoother that performs a null operation

initialize(data1, data2)
predict(xhat)

SmoothingR

class msproteomicstoolslib.math.Smoothing.SmoothingR

Class to smooth data using the smooth.spline function from R

This is equivalent to the following code:

data1 = c(5,7,8,9,10,15,7.1,6)
data2 = c(4,7,9,11,11,14,7.1,6.5)
data1 = sort(data1)
data2 = sort(data2)
smooth.model = smooth.spline(data1,data2,cv=T)
data2_pred = predict(smooth.model,data2)$y
[1]  2.342662  6.615797  7.292613  7.441842 10.489440 11.858406 11.858406
[8] 13.482255
plot(data1, data2)
lines(data1, data2_pred, col="blue")

Doing the same thing in Python

import rpy2.robjects as robjects
# uses python-rpy2
data1 = [5,7,8,9,10,15,7.1,6]
data2 = [4,7,9,11,11,14,7.1,6.5]
rdata1 = robjects.FloatVector(data1)
rdata2 = robjects.FloatVector(data2)
spline = robjects.r["smooth.spline"]
sm = spline(data1,data2,cv=T)
predict = robjects.r["predict"]
predicted_data = predict(sm, rdata2)
numpy.array(predicted_data[1])
array([  2.34266247,   7.2926131 ,  10.48943975,  11.85840597,
        11.85840597,  13.48225519,   7.44184246,   6.61579704])
initialize(data1, data2)
predict(xhat)

SmoothingRExtern

class msproteomicstoolslib.math.Smoothing.SmoothingRExtern(TMPDIR='/tmp/')

Class to smooth data using the smooth.spline function from R (extern system call)

initialize(data1, data2)
predict(xhat)
predict_R_(data1, data2, predict_data, TMPDIR)

SmoothingPy

class msproteomicstoolslib.math.Smoothing.SmoothingPy

Smoothing of 2D data using generalized crossvalidation

Will call _smooth_spline_scikit internally but only at a few select points. It then uses the generated smoothed spline to construct an interpolated spline on which then the xhat data is evaluated.

de_duplicate_array(arr)
initialize(data1, data2, Nhat=200, xmin=None, xmax=None)
predict(xhat)
re_duplicate_array(arr_fixed, duplications)

LowessSmoothingStatsmodels

class msproteomicstoolslib.math.Smoothing.LowessSmoothingStatsmodels

Bases: msproteomicstoolslib.math.Smoothing.LowessSmoothingBase

Smoothing using Lowess smoother and then interpolate on the result

statsmodels now also has fast Cython lowess, see https://github.com/statsmodels/statsmodels/pull/856

This faster lowess should be in version 0.5.0 of statsmodels (anaconda currently has version 0.6.0). However, Ubuntu only has version 0.5.0 from 14.04 onwards, so be careful.

frac: float Between 0 and 1. The fraction of the data used when estimating each y-value. it: int The number of residual-based reweightings to perform.

LowessSmoothingBiostats

class msproteomicstoolslib.math.Smoothing.LowessSmoothingBiostats

Bases: msproteomicstoolslib.math.Smoothing.LowessSmoothingBase

Smoothing using Lowess smoother and then interpolate on the result

LowessSmoothingCyLowess

class msproteomicstoolslib.math.Smoothing.LowessSmoothingCyLowess

Bases: msproteomicstoolslib.math.Smoothing.LowessSmoothingBase

Smoothing using Lowess smoother and then interpolate on the result

UnivarSplineNoCV

class msproteomicstoolslib.math.Smoothing.UnivarSplineNoCV

Smoothing of 2D data using a Python spline (no crossvalidation).

Will use UnivariateSpline internally, it seems to have a tendency to overfit.

initialize(data1, data2)
predict(xhat)

UnivarSplineCV

class msproteomicstoolslib.math.Smoothing.UnivarSplineCV

Smoothing of 2D data using a Python spline (using crossvalidation to determine smoothing parameters).

Will use UnivariateSpline internally, setting the scipy smoothing parameter optimally “s” using crossvalidation with part of the data (usually 25/75 split). This prevents overfit to the data.

initialize(data1, data2, frac_training_data=0.75, max_iter=100, s_iter_decrease=0.75, verb=False)
predict(xhat)

SmoothingEarth

class msproteomicstoolslib.math.Smoothing.SmoothingEarth

Class for MARS type smoothing based on pyearth

Get it at https://github.com/jcrudy/py-earth/

initialize(data1, data2)
predict(xhat)

SmoothingLinear

class msproteomicstoolslib.math.Smoothing.SmoothingLinear

Class for linear transformation

initialize(data1, data2)
predict(xhat)

SmoothingInterpolation

class msproteomicstoolslib.math.Smoothing.SmoothingInterpolation

Class for interpolation transformation

getLWP()
initialize(data1, data2)
predict(xhat)

LocalKernel

class msproteomicstoolslib.math.Smoothing.LocalKernel

Base class for local kernel smoothing

initialize(data1, data2)

WeightedNearestNeighbour

class msproteomicstoolslib.math.Smoothing.WeightedNearestNeighbour(topN, max_diff, min_diff, removeOutliers, exponent=1.0)

Bases: msproteomicstoolslib.math.Smoothing.LocalKernel

Class for weighted interpolation using local linear differences

This function uses the weighted mean of the k nearest neighbors to calculate the transformation. This method may be affected by single outlier close to the transformation point.

Each neighboring point is given a weight equal to

           1
--------------------------
  abs( distance ) ** exp

up to a minimal distance min_diff after which the weight cannot increase any more.

predict(xhat)

SmoothLLDMedian

class msproteomicstoolslib.math.Smoothing.SmoothLLDMedian(topN, max_diff, min_diff, removeOutliers)

Bases: msproteomicstoolslib.math.Smoothing.LocalKernel

Class for local median interpolation using local linear differences

This function uses the median of the k nearest neighbors to calculate the transformation. This is robust, unweighted method as a single outlier will not substantially affect the result.

This method assumes that the data is locally smooth and linear

predict(xhat)

LinearRegression Module

SimpleLinearRegression

class msproteomicstoolslib.math.LinearRegression.SimpleLinearRegression(data)

tool class as help for calculating a linear function

function(x)

linear function (be aware of current coefficient of correlation

run()

calculates coefficient of correlation and the parameters for the linear function

Chauvenet Module

Chauvenet

msproteomicstoolslib.math.chauvenet.chauvenet(x, y, mean=None, stdv=None)