Alignment executables

FeatureAlignment executable

The Feature Alignment executable can be run as

python feature_alignment.py

and the for help please use

python feature_alignment.py --help

Some of the most used options are the following

fdr_cutoff

This is the seeding score cutoff, if a precursor has an identification in one run with at least this score, it will be included for alignment.

max_fdr_quality

This is the extension score cutoff. During each step of the algorithm, a peakgroup from a new run is added to the initial seed (see above). Only if the additional peakgroup in the new run has a score better than max_fdr_quality will it be included in the final result.

target_fdr

Experimental option for dynamic parameter estimation of the fdr_cutoff parameter. If you want to use this, please turn off fdr_cutoff (but max_fdr_quality still needs to be set).

method

Defines the method to use for the clustering. Available options are

  • best_overall
  • best_cluster_score
  • global_best_cluster_score
  • global_best_overall
  • LocalMST
  • LocalMSTAllCluster

Note that the MST options will perform a local, MST guided alignment while the other options will use a reference-guided alignment. The global option will also move peaks which are below the selected FDR threshold (while the best_overall and best_cluster_score will not touch any peak that is below fdr_cutoff).

realign_method

Method to use to re-align retention times between pairs of runs. The following options are available:

  • None: use the raw RT from the file (not recommended)
  • diRT: use only deltaiRT from the input file
  • linear: perform a linear regression using best peakgroups
  • splineR: perform a spline fit using R (this feature relies on the rpy2 package)
  • splineR_external: perform a spline fit using R (start an R process using the command line, not tested under Windows)
  • splinePy: use Python native spline from scikits.datasmooth (not recommended, very slow)
  • nonCVSpline, CVSpline: splines with and without cross-validation from scipy.interpolate
  • lowess: use Robust locally weighted regression (lowess smoother)
  • earth : use Multivariate Adaptive Regression Splines using py-earth
  • WeightedNearestNeighbour: the weighted RT of the nearest neighbours is used
  • SmoothLLDMedian: a local kernel of linear differences is computed

Recommended options are CVSpline and splineR and splineR (if you have R). Both WeightedNearestNeighbour and SmoothLLDMedian gave acceptable results.

FeatureAlignment Module

class feature_alignment.AlignmentStatistics

Bases: object

count(multipeptides, fdr_cutoff, runs, skipDecoy=True)
to_yaml()
class feature_alignment.Experiment

Bases: msproteomicstoolslib.algorithms.alignment.MRExperiment.MRExperiment

An Experiment is a container for multiple experimental runs - some of which may contain the same precursors.

estimate_real_fdr(multipeptides, fraction_needed_selected)
print_stats(multipeptides, fdr_cutoff, fraction_present, min_nrruns)
write_to_file(multipeptides, options, alignment, tree=None, writeTrafoFiles=True)
feature_alignment.estimate_aligned_fdr_cutoff(options, this_exp, multipeptides, fdr_range)
feature_alignment.doMSTAlignment(exp, multipeptides, max_rt_diff, rt_diff_isotope, initial_alignment_cutoff, fdr_cutoff, aligned_fdr_cutoff, smoothing_method, method, use_RT_correction, stdev_max_rt_per_run, use_local_stdev, mst_use_ref, force, optimized_cython)

Minimum Spanning Tree (MST) based local aligment

feature_alignment.doParameterEstimation(options, this_exp, multipeptides)

Perform (q-value) parameter estimation

feature_alignment.doReferenceAlignment(options, this_exp, multipeptides)
feature_alignment.main(options)

Noise imputation Module

requantAlignedValues.runSingleFileImputation(options, peakgroups_file, mzML_file, method, is_test)

Impute values across chromatograms

Parameters:
  • peakgroups_file (filename) – CSV file containing all peakgroups
  • mzML_file (filename) – mzML file containing chromatograms
  • method (string) – which method to use for imputation (“singleShortestPath”, “singleClosestRun”)
  • is_test (bool) – whether test mode should be used
Returns:

new_exp(AlignmentExperiment): experiment containing the aligned peakgroups multipeptides(list(AlignmentHelper.Multipeptide)): list of multipeptides rid(string): run id of the analyzed run

Return type:

A tuple of

This function will read the csv file with all peakgroups as well as the provided chromatogram file (.chrom.mzML). It will then try to impute missing values for those peakgroups where no values is currently present, reading the raw chromatograms.

requantAlignedValues.runImputeValues(options, peakgroups_file, trafo_fnames, is_test)

Impute values across chromatograms

Parameters:
  • peakgroups_file (filename) – CSV file containing all peakgroups
  • trafo_fnames (filename) – A list of .tr filenames (it is assumed that in the same directory also the chromatogram mzML reside)
Returns:

new_exp(AlignmentExperiment): experiment containing the aligned peakgroups multipeptides(list(AlignmentHelper.Multipeptide)): list of multipeptides rid(string): run id of the analyzed run

Return type:

A tuple of

This function will read the csv file with all peakgroups as well as the transformation files (.tr) and the corresponding raw chromatograms which need to be in the same folder. It will then try to impute missing values for those peakgroups where no values is currently present, reading the raw chromatograms.

requantAlignedValues.analyze_multipeptides(new_exp, multipeptides, swath_chromatograms, transformation_collection_, border_option, onlyExtractFromRun=None, tree=None, mat=None, disable_isotopic_transfer=False, is_test=False)

Analyze the multipeptides and impute missing values

This function has three different modes:

  • if a tree is given, it will use the single shortest path on the tree to infer the boundaries
  • if a distance matrix is given, it will use the closest run overall to infer the boundaries
  • if neither is given, it will use a mean / median / maximum of all borders by converting all boundaries to the frame of the reference run and then to the current run
Parameters:
  • new_exp (AlignmentExperiment) – experiment containing the aligned peakgroups
  • multipeptides (list(AlignmentHelper.Multipeptide) – list of multipeptides
  • swath_chromatograms (dict) – containing the objects pointing to the original chrom mzML (see runImputeValues)
  • transformation_collection (TransformationCollection) – specifying how to transform between retention times of different runs
  • border_option (String) – one of the following options (“mean”, “median” “max_width”), determining how to aggregate multiple peak boundary information
  • onlyExtractFromRun (String) – Whether only to perform signal extraction from a single run (if not None, needs to be a run id)
  • tree (MinimumSpanningTree) – alignment guidance tree (if given, will use shortest path approach)
  • mat (matrix(float) – distance matrix, see getDistanceMatrix (if given, will use closest run approach)
  • disable_isotopic_transfer (bool) – whether to use isotopic grouping (e.g. group heavy/light channels together)
Returns:

The updated multipeptides

This function will update the input multipeptides and add peakgroups, imputing missing values

requantAlignedValues.analyze_multipeptide_cluster(current_mpep, cnt, new_exp, swath_chromatograms, transformation_collection_, border_option, selected_pg, cluster_id, onlyExtractFromRun=None, tree=None, mat=None, is_test=False)
requantAlignedValues.integrate_chromatogram(template_pg, current_run, swath_chromatograms, left_start, right_end, cnt, is_test)

Integrate a chromatogram from left_start to right_end and store the sum.

Parameters:
  • template_pg (GeneralPeakGroup) – A template peakgroup from which to construct the new peakgroup
  • current_run (SWATHScoringReader.Run) – current run where the missing value occured
  • swath_chromatograms (dict) – containing the objects pointing to the original chrom mzML
  • left_start (float) – retention time for integration (left border)
  • right_end (float) – retention time for integration (right border)
Returns:

A new GeneralPeakGroup which contains the new, integrated intensity for this run (or “NA” if no chromatograms could be found).

Create a new peakgroup from the old pg and then store the integrated intensity.

requantAlignedValues.write_out(new_exp, multipeptides, outfile, matrix_outfile, single_outfile=None)

Write the result to disk

This writes all peakgroups to disk (newly imputed ones as previously found ones) as even some “previously good” peakgroups may have changed location due to isotopic_transfer.

requantAlignedValues.main(options)
class requantAlignedValues.ImputeValuesHelper

Bases: object

Static object with some helper methods.

static select_correct_swath(swath_chromatograms, mz)

Select the correct chromatogram

Parameters:
  • swath_chromatograms (dict) – containing the objects pointing to the original chrom mzML (see runImputeValues)
  • mz (float) – the mz value of the precursor
class requantAlignedValues.SwathChromatogramRun

Bases: object

A single SWATH LC-MS/MS run.

Each run may contain multiple files (split up by swath).

getChromatogram(chromid)
parse(runid, files)

Parse a set of files which all belong to the same experiment

class requantAlignedValues.SwathChromatogramCollection

Bases: object

A collection of multiple SWATH LC-MS/MS runs.

Each single run is represented as a SwathChromatogramRun and accessible through a run id.

>>> mzml_files = ["file1.mzML", "file2.mzML"]
>>> runMapping = {"file1.mzML": "run1", "file2.mzML" : "run2"}
>>> chromatograms = SwathChromatogramCollection()
>>> chromatograms.parseFromMzML(mzml_files, runMapping)
>>> chromatogram = chromatograms.getChromatogram("run1", "ChromatogramId")
>>> chromatogram = chromatograms.getChromatogram("run2", "ChromatogramId")
createRunCache(runid)
getChromatogram(runid, chromid)
getRunIDs()
parseFromMzML(mzML_files, runIdMapping)

Parse a set of different experiments.

Parameters:
  • mzML_files (list(filename) – a list of mzML filenames (chromatogram mzML)
  • runIdMapping (dict) – a dictionary mapping each filename to a run id
parseFromSqMass(files, runIdMapping)

Parse a set of different experiments.

Parameters:
  • files (list(filename) – a list of sqMass filenames
  • runIdMapping (dict) – a dictionary mapping each filename to a run id
parseFromTrafoFiles(trafo_fnames)

Parse a set of different experiments from the .tr files

The mzML files belonging to the same run are assumed to be in the same folder as the .tr files.