Alignment executables¶
FeatureAlignment
executable¶
The Feature Alignment executable can be run as
python feature_alignment.py
and the for help please use
python feature_alignment.py --help
Some of the most used options are the following
fdr_cutoff¶
This is the seeding score cutoff, if a precursor has an identification in one run with at least this score, it will be included for alignment.
max_fdr_quality¶
This is the extension score cutoff. During each step of the algorithm, a peakgroup from a new run is added to the initial seed (see above). Only if the additional peakgroup in the new run has a score better than max_fdr_quality will it be included in the final result.
target_fdr¶
Experimental option for dynamic parameter estimation of the fdr_cutoff parameter. If you want to use this, please turn off fdr_cutoff (but max_fdr_quality still needs to be set).
method¶
Defines the method to use for the clustering. Available options are
- best_overall
- best_cluster_score
- global_best_cluster_score
- global_best_overall
- LocalMST
- LocalMSTAllCluster
Note that the MST options will perform a local, MST guided alignment while the other options will use a reference-guided alignment. The global option will also move peaks which are below the selected FDR threshold (while the best_overall and best_cluster_score will not touch any peak that is below fdr_cutoff).
realign_method¶
Method to use to re-align retention times between pairs of runs. The following options are available:
- None: use the raw RT from the file (not recommended)
- diRT: use only deltaiRT from the input file
- linear: perform a linear regression using best peakgroups
- splineR: perform a spline fit using R (this feature relies on the rpy2 package)
- splineR_external: perform a spline fit using R (start an R process using the command line, not tested under Windows)
- splinePy: use Python native spline from scikits.datasmooth (not recommended, very slow)
- nonCVSpline, CVSpline: splines with and without cross-validation from scipy.interpolate
- lowess: use Robust locally weighted regression (lowess smoother)
- earth : use Multivariate Adaptive Regression Splines using py-earth
- WeightedNearestNeighbour: the weighted RT of the nearest neighbours is used
- SmoothLLDMedian: a local kernel of linear differences is computed
Recommended options are CVSpline and splineR and splineR (if you have R). Both WeightedNearestNeighbour and SmoothLLDMedian gave acceptable results.
FeatureAlignment
Module¶
-
class
feature_alignment.
AlignmentStatistics
¶ Bases:
object
-
count
(multipeptides, fdr_cutoff, runs, skipDecoy=True)¶
-
to_yaml
()¶
-
-
class
feature_alignment.
Experiment
¶ Bases:
msproteomicstoolslib.algorithms.alignment.MRExperiment.MRExperiment
An Experiment is a container for multiple experimental runs - some of which may contain the same precursors.
-
estimate_real_fdr
(multipeptides, fraction_needed_selected)¶
-
print_stats
(multipeptides, fdr_cutoff, fraction_present, min_nrruns)¶
-
write_to_file
(multipeptides, options, alignment, tree=None, writeTrafoFiles=True)¶
-
-
feature_alignment.
estimate_aligned_fdr_cutoff
(options, this_exp, multipeptides, fdr_range)¶
-
feature_alignment.
doMSTAlignment
(exp, multipeptides, max_rt_diff, rt_diff_isotope, initial_alignment_cutoff, fdr_cutoff, aligned_fdr_cutoff, smoothing_method, method, use_RT_correction, stdev_max_rt_per_run, use_local_stdev, mst_use_ref, force, optimized_cython)¶ Minimum Spanning Tree (MST) based local aligment
-
feature_alignment.
doParameterEstimation
(options, this_exp, multipeptides)¶ Perform (q-value) parameter estimation
-
feature_alignment.
doReferenceAlignment
(options, this_exp, multipeptides)¶
-
feature_alignment.
main
(options)¶
Noise imputation
Module¶
-
requantAlignedValues.
runSingleFileImputation
(options, peakgroups_file, mzML_file, method, is_test)¶ Impute values across chromatograms
Parameters: - peakgroups_file (filename) – CSV file containing all peakgroups
- mzML_file (filename) – mzML file containing chromatograms
- method (string) – which method to use for imputation (“singleShortestPath”, “singleClosestRun”)
- is_test (bool) – whether test mode should be used
Returns: new_exp(AlignmentExperiment): experiment containing the aligned peakgroups multipeptides(list(AlignmentHelper.Multipeptide)): list of multipeptides rid(string): run id of the analyzed run
Return type: A tuple of
This function will read the csv file with all peakgroups as well as the provided chromatogram file (.chrom.mzML). It will then try to impute missing values for those peakgroups where no values is currently present, reading the raw chromatograms.
-
requantAlignedValues.
runImputeValues
(options, peakgroups_file, trafo_fnames, is_test)¶ Impute values across chromatograms
Parameters: - peakgroups_file (filename) – CSV file containing all peakgroups
- trafo_fnames (filename) – A list of .tr filenames (it is assumed that in the same directory also the chromatogram mzML reside)
Returns: new_exp(AlignmentExperiment): experiment containing the aligned peakgroups multipeptides(list(AlignmentHelper.Multipeptide)): list of multipeptides rid(string): run id of the analyzed run
Return type: A tuple of
This function will read the csv file with all peakgroups as well as the transformation files (.tr) and the corresponding raw chromatograms which need to be in the same folder. It will then try to impute missing values for those peakgroups where no values is currently present, reading the raw chromatograms.
-
requantAlignedValues.
analyze_multipeptides
(new_exp, multipeptides, swath_chromatograms, transformation_collection_, border_option, onlyExtractFromRun=None, tree=None, mat=None, disable_isotopic_transfer=False, is_test=False)¶ Analyze the multipeptides and impute missing values
This function has three different modes:
- if a tree is given, it will use the single shortest path on the tree to infer the boundaries
- if a distance matrix is given, it will use the closest run overall to infer the boundaries
- if neither is given, it will use a mean / median / maximum of all borders by converting all boundaries to the frame of the reference run and then to the current run
Parameters: - new_exp (AlignmentExperiment) – experiment containing the aligned peakgroups
- multipeptides (list(AlignmentHelper.Multipeptide) – list of multipeptides
- swath_chromatograms (dict) – containing the objects pointing to the original chrom mzML (see runImputeValues)
- transformation_collection (TransformationCollection) – specifying how to transform between retention times of different runs
- border_option (String) – one of the following options (“mean”, “median” “max_width”), determining how to aggregate multiple peak boundary information
- onlyExtractFromRun (String) – Whether only to perform signal extraction from a single run (if not None, needs to be a run id)
- tree (MinimumSpanningTree) – alignment guidance tree (if given, will use shortest path approach)
- mat (matrix(float) – distance matrix, see getDistanceMatrix (if given, will use closest run approach)
- disable_isotopic_transfer (bool) – whether to use isotopic grouping (e.g. group heavy/light channels together)
Returns: The updated multipeptides
This function will update the input multipeptides and add peakgroups, imputing missing values
-
requantAlignedValues.
analyze_multipeptide_cluster
(current_mpep, cnt, new_exp, swath_chromatograms, transformation_collection_, border_option, selected_pg, cluster_id, onlyExtractFromRun=None, tree=None, mat=None, is_test=False)¶
-
requantAlignedValues.
integrate_chromatogram
(template_pg, current_run, swath_chromatograms, left_start, right_end, cnt, is_test)¶ Integrate a chromatogram from left_start to right_end and store the sum.
Parameters: - template_pg (GeneralPeakGroup) – A template peakgroup from which to construct the new peakgroup
- current_run (SWATHScoringReader.Run) – current run where the missing value occured
- swath_chromatograms (dict) – containing the objects pointing to the original chrom mzML
- left_start (float) – retention time for integration (left border)
- right_end (float) – retention time for integration (right border)
Returns: A new GeneralPeakGroup which contains the new, integrated intensity for this run (or “NA” if no chromatograms could be found).
Create a new peakgroup from the old pg and then store the integrated intensity.
-
requantAlignedValues.
write_out
(new_exp, multipeptides, outfile, matrix_outfile, single_outfile=None)¶ Write the result to disk
This writes all peakgroups to disk (newly imputed ones as previously found ones) as even some “previously good” peakgroups may have changed location due to isotopic_transfer.
-
requantAlignedValues.
main
(options)¶
-
class
requantAlignedValues.
ImputeValuesHelper
¶ Bases:
object
Static object with some helper methods.
-
static
select_correct_swath
(swath_chromatograms, mz)¶ Select the correct chromatogram
Parameters: - swath_chromatograms (dict) – containing the objects pointing to the original chrom mzML (see runImputeValues)
- mz (float) – the mz value of the precursor
-
static
-
class
requantAlignedValues.
SwathChromatogramRun
¶ Bases:
object
A single SWATH LC-MS/MS run.
Each run may contain multiple files (split up by swath).
-
getChromatogram
(chromid)¶
-
parse
(runid, files)¶ Parse a set of files which all belong to the same experiment
-
-
class
requantAlignedValues.
SwathChromatogramCollection
¶ Bases:
object
A collection of multiple SWATH LC-MS/MS runs.
Each single run is represented as a SwathChromatogramRun and accessible through a run id.
>>> mzml_files = ["file1.mzML", "file2.mzML"] >>> runMapping = {"file1.mzML": "run1", "file2.mzML" : "run2"} >>> chromatograms = SwathChromatogramCollection() >>> chromatograms.parseFromMzML(mzml_files, runMapping)
>>> chromatogram = chromatograms.getChromatogram("run1", "ChromatogramId") >>> chromatogram = chromatograms.getChromatogram("run2", "ChromatogramId")
-
createRunCache
(runid)¶
-
getChromatogram
(runid, chromid)¶
-
getRunIDs
()¶
-
parseFromMzML
(mzML_files, runIdMapping)¶ Parse a set of different experiments.
Parameters: - mzML_files (list(filename) – a list of mzML filenames (chromatogram mzML)
- runIdMapping (dict) – a dictionary mapping each filename to a run id
-
parseFromSqMass
(files, runIdMapping)¶ Parse a set of different experiments.
Parameters: - files (list(filename) – a list of sqMass filenames
- runIdMapping (dict) – a dictionary mapping each filename to a run id
-
parseFromTrafoFiles
(trafo_fnames)¶ Parse a set of different experiments from the .tr files
The mzML files belonging to the same run are assumed to be in the same folder as the .tr files.
-