Alignment executables¶

`FeatureAlignment` executable¶

The Feature Alignment executable can be run as

python feature_alignment.py

and the for help please use

python feature_alignment.py --help

Some of the most used options are the following

fdr_cutoff¶

This is the seeding score cutoff, if a precursor has an identification in one run with at least this score, it will be included for alignment.

max_fdr_quality¶

This is the extension score cutoff. During each step of the algorithm, a peakgroup from a new run is added to the initial seed (see above). Only if the additional peakgroup in the new run has a score better than max_fdr_quality will it be included in the final result.

target_fdr¶

Experimental option for dynamic parameter estimation of the fdr_cutoff parameter. If you want to use this, please turn off fdr_cutoff (but max_fdr_quality still needs to be set).

method¶

Defines the method to use for the clustering. Available options are

best_overall
best_cluster_score
global_best_cluster_score
global_best_overall
LocalMST
LocalMSTAllCluster

Note that the MST options will perform a local, MST guided alignment while the other options will use a reference-guided alignment. The global option will also move peaks which are below the selected FDR threshold (while the best_overall and best_cluster_score will not touch any peak that is below fdr_cutoff).

realign_method¶

Method to use to re-align retention times between pairs of runs. The following options are available:

None: use the raw RT from the file (not recommended)
diRT: use only deltaiRT from the input file
linear: perform a linear regression using best peakgroups
splineR: perform a spline fit using R (this feature relies on the rpy2 package)
splineR_external: perform a spline fit using R (start an R process using the command line, not tested under Windows)
splinePy: use Python native spline from scikits.datasmooth (not recommended, very slow)
nonCVSpline, CVSpline: splines with and without cross-validation from scipy.interpolate
lowess: use Robust locally weighted regression (lowess smoother)
earth : use Multivariate Adaptive Regression Splines using py-earth
WeightedNearestNeighbour: the weighted RT of the nearest neighbours is used
SmoothLLDMedian: a local kernel of linear differences is computed

Recommended options are CVSpline and splineR and splineR (if you have R). Both WeightedNearestNeighbour and SmoothLLDMedian gave acceptable results.

`FeatureAlignment` Module¶

class feature_alignment.AlignmentStatistics¶

Bases: object

count(multipeptides, fdr_cutoff, runs, skipDecoy=True)¶

to_yaml()¶

class feature_alignment.Experiment¶

Bases: msproteomicstoolslib.algorithms.alignment.MRExperiment.MRExperiment

An Experiment is a container for multiple experimental runs - some of which may contain the same precursors.

estimate_real_fdr(multipeptides, fraction_needed_selected)¶

print_stats(multipeptides, fdr_cutoff, fraction_present, min_nrruns)¶

write_to_file(multipeptides, options, alignment, tree=None, writeTrafoFiles=True)¶

feature_alignment.estimate_aligned_fdr_cutoff(options, this_exp, multipeptides, fdr_range)¶

feature_alignment.doMSTAlignment(exp, multipeptides, max_rt_diff, rt_diff_isotope, initial_alignment_cutoff, fdr_cutoff, aligned_fdr_cutoff, smoothing_method, method, use_RT_correction, stdev_max_rt_per_run, use_local_stdev, mst_use_ref, force, optimized_cython)¶: Minimum Spanning Tree (MST) based local aligment

feature_alignment.doParameterEstimation(options, this_exp, multipeptides)¶: Perform (q-value) parameter estimation

feature_alignment.doReferenceAlignment(options, this_exp, multipeptides)¶

feature_alignment.main(options)¶

`Noise imputation` Module¶

requantAlignedValues.runSingleFileImputation(options, peakgroups_file, mzML_file, method, is_test)¶

Impute values across chromatograms

Parameters:	peakgroups_file (filename) – CSV file containing all peakgroups mzML_file (filename) – mzML file containing chromatograms method (string) – which method to use for imputation (“singleShortestPath”, “singleClosestRun”) is_test (bool) – whether test mode should be used
Returns:	new_exp(AlignmentExperiment): experiment containing the aligned peakgroups multipeptides(list(AlignmentHelper.Multipeptide)): list of multipeptides rid(string): run id of the analyzed run
Return type:	A tuple of

This function will read the csv file with all peakgroups as well as the provided chromatogram file (.chrom.mzML). It will then try to impute missing values for those peakgroups where no values is currently present, reading the raw chromatograms.

requantAlignedValues.runImputeValues(options, peakgroups_file, trafo_fnames, is_test)¶

Impute values across chromatograms

Parameters:	peakgroups_file (filename) – CSV file containing all peakgroups trafo_fnames (filename) – A list of .tr filenames (it is assumed that in the same directory also the chromatogram mzML reside)
Returns:	new_exp(AlignmentExperiment): experiment containing the aligned peakgroups multipeptides(list(AlignmentHelper.Multipeptide)): list of multipeptides rid(string): run id of the analyzed run
Return type:	A tuple of

This function will read the csv file with all peakgroups as well as the transformation files (.tr) and the corresponding raw chromatograms which need to be in the same folder. It will then try to impute missing values for those peakgroups where no values is currently present, reading the raw chromatograms.

requantAlignedValues.analyze_multipeptides(new_exp, multipeptides, swath_chromatograms, transformation_collection_, border_option, onlyExtractFromRun=None, tree=None, mat=None, disable_isotopic_transfer=False, is_test=False)¶

Analyze the multipeptides and impute missing values

This function has three different modes:

if a tree is given, it will use the single shortest path on the tree to infer the boundaries

if a distance matrix is given, it will use the closest run overall to infer the boundaries

if neither is given, it will use a mean / median / maximum of all borders by converting all boundaries to the frame of the reference run and then to the current run

Parameters:

new_exp (AlignmentExperiment) – experiment containing the aligned peakgroups
multipeptides (list(AlignmentHelper.Multipeptide) – list of multipeptides
swath_chromatograms (dict) – containing the objects pointing to the original chrom mzML (see runImputeValues)
transformation_collection (TransformationCollection) – specifying how to transform between retention times of different runs
border_option (String) – one of the following options (“mean”, “median” “max_width”), determining how to aggregate multiple peak boundary information
onlyExtractFromRun (String) – Whether only to perform signal extraction from a single run (if not None, needs to be a run id)
tree (MinimumSpanningTree) – alignment guidance tree (if given, will use shortest path approach)
mat (matrix(float) – distance matrix, see getDistanceMatrix (if given, will use closest run approach)
disable_isotopic_transfer (bool) – whether to use isotopic grouping (e.g. group heavy/light channels together)

Returns:

The updated multipeptides

This function will update the input multipeptides and add peakgroups, imputing missing values

requantAlignedValues.analyze_multipeptide_cluster(current_mpep, cnt, new_exp, swath_chromatograms, transformation_collection_, border_option, selected_pg, cluster_id, onlyExtractFromRun=None, tree=None, mat=None, is_test=False)¶

requantAlignedValues.integrate_chromatogram(template_pg, current_run, swath_chromatograms, left_start, right_end, cnt, is_test)¶

Integrate a chromatogram from left_start to right_end and store the sum.

Parameters:

template_pg (GeneralPeakGroup) – A template peakgroup from which to construct the new peakgroup
current_run (SWATHScoringReader.Run) – current run where the missing value occured
swath_chromatograms (dict) – containing the objects pointing to the original chrom mzML
left_start (float) – retention time for integration (left border)
right_end (float) – retention time for integration (right border)

Returns:

A new GeneralPeakGroup which contains the new, integrated intensity for this run (or “NA” if no chromatograms could be found).

Create a new peakgroup from the old pg and then store the integrated intensity.

requantAlignedValues.write_out(new_exp, multipeptides, outfile, matrix_outfile, single_outfile=None)¶

Write the result to disk

This writes all peakgroups to disk (newly imputed ones as previously found ones) as even some “previously good” peakgroups may have changed location due to isotopic_transfer.

requantAlignedValues.main(options)¶

class requantAlignedValues.ImputeValuesHelper¶

Bases: object

Static object with some helper methods.

static select_correct_swath(swath_chromatograms, mz)¶

Select the correct chromatogram

Parameters:	swath_chromatograms (dict) – containing the objects pointing to the original chrom mzML (see runImputeValues) mz (float) – the mz value of the precursor

class requantAlignedValues.SwathChromatogramRun¶

Bases: object

A single SWATH LC-MS/MS run.

Each run may contain multiple files (split up by swath).

getChromatogram(chromid)¶

parse(runid, files)¶: Parse a set of files which all belong to the same experiment

class requantAlignedValues.SwathChromatogramCollection¶

Bases: object

A collection of multiple SWATH LC-MS/MS runs.

Each single run is represented as a SwathChromatogramRun and accessible through a run id.

>>> mzml_files = ["file1.mzML", "file2.mzML"]
>>> runMapping = {"file1.mzML": "run1", "file2.mzML" : "run2"}
>>> chromatograms = SwathChromatogramCollection()
>>> chromatograms.parseFromMzML(mzml_files, runMapping)

>>> chromatogram = chromatograms.getChromatogram("run1", "ChromatogramId")
>>> chromatogram = chromatograms.getChromatogram("run2", "ChromatogramId")

createRunCache(runid)¶

getChromatogram(runid, chromid)¶

getRunIDs()¶

parseFromMzML(mzML_files, runIdMapping)¶

Parse a set of different experiments.

Parameters:	mzML_files (list(filename) – a list of mzML filenames (chromatogram mzML) runIdMapping (dict) – a dictionary mapping each filename to a run id

parseFromSqMass(files, runIdMapping)¶

Parse a set of different experiments.

Parameters:	files (list(filename) – a list of sqMass filenames runIdMapping (dict) – a dictionary mapping each filename to a run id

parseFromTrafoFiles(trafo_fnames)¶

Parse a set of different experiments from the .tr files

The mzML files belonging to the same run are assumed to be in the same folder as the .tr files.

Table Of Contents

Previous topic

This Page

Alignment executables¶

`FeatureAlignment` executable¶

fdr_cutoff¶

max_fdr_quality¶

target_fdr¶

method¶

realign_method¶

`FeatureAlignment` Module¶

`Noise imputation` Module¶

Alignment executables¶

FeatureAlignment executable¶

fdr_cutoff¶

max_fdr_quality¶

target_fdr¶

method¶

realign_method¶

FeatureAlignment Module¶

Noise imputation Module¶

`FeatureAlignment` executable¶

`FeatureAlignment` Module¶

`Noise imputation` Module¶