Format¶
Several classes and functions to deal with common mass spectrometric format (mostly dealing with File I/O).
Transformation Collection
Module¶
TransformationCollection¶
-
class
msproteomicstoolslib.format.TransformationCollection.
TransformationCollection
¶ A class to store a transformation between retention times of multiple runs.
It allows to add transformation data (e.g. a pair of arrays which map coordinates from one RT space to the other). Once all data is added, one can initialize from the data:
- Compute a new transformation and write to file:
- # data1 = reference data (master) with ref_id # data2 = data to be aligned (slave) with current_id
>>> tcoll = TransformationCollection() >>> tcoll.setReferenceRunID( ref_id ) >>> tcoll.addTransformationData([data2, data1], current_id, ref_id ) >>> tcoll.writeTransformationData( "outfile", current_id, ref_id)
Read a set of transformations from files:
>>> tcoll = TransformationCollection() >>> for filename in ["file1.tr", "file2.tr"]: >>> tcoll.readTransformationData(filename) >>> tcoll.initialize_from_data(reverse=True)
Compute a transformation:
>>> norm_value = tcoll.getTransformation(orig_runid, ref_id).predict( [ value ] )[0]
-
addTransformationData
(data, s_from, s_to)¶ Add raw data points to the collection
Parameters: - data (list(data_slave, data_master) – two data two data vectors containing the raw data points from two runs. The first data vector is the master (reference) data and the second one is the slave (to be aligned).
- s_from (String) – run ID of the slave (to be aligned) run
- s_to (String) – run ID of the master (reference) run
-
addTransformedData
(data, s_from, s_to)¶ Add transformed data points to the collection
The idea is to add the anchor points of s_from in the space of s_to so that one could compute the transformation using a simple linear transform.
Parameters: - data –
- s_from (String) – run ID of the slave (to be aligned) run
- s_to (String) – run ID of the master (reference) run
-
getReferenceRunID
()¶
-
getTransformation
(s_from, s_to)¶
-
getTransformationData
(s_from, s_to)¶
-
getTransformedData
(s_from, s_to)¶
-
initialize_from_data
(reverse=False, smoother='lowess', force=False)¶
-
printTransformationData
(s_from, s_to)¶
-
readTransformationData
(filename)¶ Read the transformation present in the file.
- The header is either:
- #Transformation Null #Transformation Data “from_id” to “to_id” reference_id “ref_id”
-
setReferenceRunID
(value)¶
-
writeTransformationData
(filename, s_from, s_to)¶ Write the transformation s_from to s_to to a file.
- The header is either:
- #Transformation Null #Transformation Data “from_id” to “to_id” reference_id “ref_id”
LightTransformationData¶
-
class
msproteomicstoolslib.format.TransformationCollection.
LightTransformationData
(ref=None)¶ A lightweight data structure to store a transformation between retention times of multiple runs.
-
addData
(run1, data1, run2, data2, doSort=True)¶ Add raw data for the transformation between two runs
-
addTrafo
(run1, run2, trafo, stdev=None)¶ Add transformation between two runs
-
getData
(run1, run2)¶
-
getReferenceRunID
()¶
-
getStdev
(run1, run2)¶
-
getTrafo
(run1, run2)¶
-
getTransformation
(run1, run2)¶
-
File Reader
Module¶
SWATHScoringReader¶
-
class
msproteomicstoolslib.format.SWATHScoringReader.
ReadFilter
¶ Bases:
object
A callable class which can pre-filters a row and determine whether the row can be skipped.
If the call returns true, the row is examined but if it returns false, the row should be skipped.
-
class
msproteomicstoolslib.format.SWATHScoringReader.
SWATHScoringReader
¶ -
static
newReader
(infiles, filetype, readmethod="minimal", readfilter=ReadFilter(), errorHandling="strict", enable_isotopic_grouping=False)¶ Factory to create a new reader
-
parse_files
(read_exp_RT=True, verbosity=10, useCython=False)¶ Parse the input file(s) (CSV).
Parameters: read_exp_RT (bool) – to read the real, experimental retention time (default behavior) or the delta iRT should be used instead. Returns: runs(list(SWATHScoringReader.Run)) A single CSV file might contain more than one run and thus to create unique run ids, we number the runs as xx_yy where xx is the current file number and yy is the run found in the current file. However, if an alignment has already been performed and each run has already obtained a unique run id, we can directly use the previous alignment id.
-
parse_row
(run, this_row, read_exp_RT)¶
-
static
-
class
msproteomicstoolslib.format.SWATHScoringReader.
OpenSWATH_SWATHScoringReader
(infiles, readmethod='minimal', readfilter=<msproteomicstoolslib.format.SWATHScoringReader.ReadFilter object>, errorHandling='strict', enable_isotopic_grouping=False, read_cluster_id=True)¶ Bases:
msproteomicstoolslib.format.SWATHScoringReader.SWATHScoringReader
Parser for OpenSWATH output
-
parse_row
(run, this_row, read_exp_RT)¶
-
-
class
msproteomicstoolslib.format.SWATHScoringReader.
mProphet_SWATHScoringReader
(infiles, readmethod='minimal', readfilter=<msproteomicstoolslib.format.SWATHScoringReader.ReadFilter object>, enable_isotopic_grouping=False)¶ Bases:
msproteomicstoolslib.format.SWATHScoringReader.SWATHScoringReader
Parser for mProphet output
-
parse_row
(run, this_row, read_exp_RT)¶
-
-
class
msproteomicstoolslib.format.SWATHScoringReader.
Peakview_SWATHScoringReader
(infiles, readmethod='minimal', readfilter=<msproteomicstoolslib.format.SWATHScoringReader.ReadFilter object>, enable_isotopic_grouping=False)¶ Bases:
msproteomicstoolslib.format.SWATHScoringReader.SWATHScoringReader
Parser for Peakview output
-
parse_row
(run, this_row, read_exp_RT)¶
-
-
msproteomicstoolslib.format.SWATHScoringReader.
inferMapping
(rawdata_files, aligned_pg_files, mapping, precursors_mapping, sequences_mapping, protein_mapping, verbose=False, throwOnMismatch=False, fileType=None)¶ Infers a mapping between raw chromatogram files (mzML) and processed feature TSV files
Usually one feature file can contain multiple aligned runs and maps to multiple chromatogram files (mzML). This function will try to guess the original name of the mzML based on the align_origfilename column in the TSV. Note that both files have some typical endings that are _not_ shared, these are generally removed before comparison.
Only an excact match is allowed.
Data Matrix
Module¶
Functions for handling the output data matrix
MatrixWriters¶
-
msproteomicstoolslib.format.MatrixWriters.
getwriter
(matrix_outfile)¶ Factory function to get the correct writer depending on the file ending
Parameters: matrix_outfile (str) – Filename of output - used to determine output format. Valid formats are .xlsx .xls .csv or .tsv
-
class
msproteomicstoolslib.format.MatrixWriters.
IWriter
(outfile, delim=None)¶ Interface. you need to implement init, write, newline and del
-
newline
()¶
-
write
(entry, color=None)¶
-
-
class
msproteomicstoolslib.format.MatrixWriters.
CsvWriter
(outfile, delim='t')¶ Bases:
msproteomicstoolslib.format.MatrixWriters.IWriter
-
newline
()¶
-
write
(entry, color='ignored')¶
-
-
class
msproteomicstoolslib.format.MatrixWriters.
XlsWriter
(outfile, delim='ignored')¶ Bases:
msproteomicstoolslib.format.MatrixWriters.IWriter
-
newline
()¶
-
write
(entry, color='d')¶
-
-
class
msproteomicstoolslib.format.MatrixWriters.
XlsxWriter
(outfile, delim='ignored')¶ Bases:
msproteomicstoolslib.format.MatrixWriters.IWriter
-
newline
()¶
-
write
(entry, color='d')¶
-
Spectral library
Module¶
Functions for handling SpectraST spectral library format
Spectral library handler¶
-
class
msproteomicstoolslib.format.speclib_db_lib.
Library
(lkey=None)¶ This class contains one spectral library, whatever that means. It provides an read/write interface to the database. It provides an read/write interface to the SpectraST *.splib and *.pepidx files. One can easily add spectra or retrive the spectra
-
add_spectra
(s)¶
-
annotate_with_libkey
()¶ Annotate spectra with the key of the current library
-
count_modifications
()¶
-
delete_library_from_DB
(library_key, db)¶ Delete current library from SQL database
-
delete_reverse_spectra
()¶
-
find_by_sequence
(sequence, db)¶ This function can be used to access spectra using a sequence search
-
find_by_sql
(query_in, db)¶ This function can be used to access spectra using an sql query. The query should produce a single coloumn with spectra_keys. This can be very slow, use find_by_sql_fast instead (~400x faster).
-
find_by_sql_fast
(subQuery, db, tmp_db)¶ This function can be used to access spectra using an sql query. The query should produce a single coloumn with spectra_keys (ids) which MUST be called tmp_spectra_keys. You need create table privileges in the databse tmp_db for this. But it can be 400x times faster than plain find_by_sql.
-
get_all_spectra
()¶
-
get_fileheader
(splibFileName)¶ Get the header preceding the first spectrum in a spectrast file.
-
get_first_offset
(splibFileName)¶
-
get_rawspectrum_with_offset
(splibFileName, offset)¶ Get a raw spectrum as it is from a spectrast file by using an offset to locate it.
-
get_spectra_by_sequence
(sequence)¶ Get all spectra that match a specific sequence
-
init_with_self
(library)¶ Initialize with another library. Doesnt do a very deep copy
-
measure_nr_spectra
()¶
-
nr_unique_peptides
()¶
-
read_fromDB
(library_key, db)¶ This function can be used to access one complete library from the DB.
-
static
read_from_db_to_file
(library_key, db, filePrefix)¶ This function can be used to access one complete library from the DB directly to a file.
-
static
read_library_to_db
(splibFileName, pepidxFileName, db, library_key)¶ Read directly from a spectral library into the database.
-
read_pepidx
(filename)¶
-
read_spectrum_sptxt_idx
(splibFileName, idx, library_key)¶ “Fetch a spectrum from the spectral library, by using the binary index
-
read_sptxt
(filename)¶
-
read_sptxt_pepidx
(splibFileName, pepidxFileName, library_key)¶ Read directly from a spectral library into memory.
-
read_sptxt_with_offset
(splibFileName, offset)¶ Read a sptxt spectra library file by using an offset to keep memory free
-
remove_duplicate_entries
()¶
-
set_library_key
(lkey)¶
-
write
(filePrefix, append=False)¶ Write the current library to a file.
-
write_sorted
(filePrefix)¶
-
write_toDB
(db, cursor)¶ Write all spectra into a SQL database
-
-
class
msproteomicstoolslib.format.speclib_db_lib.
SequenceHandler
¶ Container class of spectra with the same sequence in a spectral library
Acts as a container of all spectra mapping to the same sequence inside a spectral library
-
add_meta
(meta)¶
-
add_spectra
(spectra)¶
-
add_spectra_no_duplicates
(spectra)¶
-
empty
()¶
-
init_with_self
(handler)¶
-
remove
(s)¶
-
remove_duplicate_entries
()¶
-
-
class
msproteomicstoolslib.format.speclib_db_lib.
Spectra
¶ A single spectrum inside a spectral library
-
acetyl_len
()¶
-
add_meta
(sequence, modifications, library_key)¶
-
analyse_mod
()¶
-
carbamido_len
()¶
-
escape_string
(string)¶
-
find
(id, db)¶
-
get_known_modifications
()¶
-
get_meta_headers
()¶
-
get_peaks
()¶
-
get_spectra_headers
()¶
-
icat_len
()¶
-
initialize
()¶ Initialize spectrum
-
is_tryptic
()¶
-
methyl_len
()¶
-
other_known_len
()¶
-
other_len
()¶
-
oxidations_len
()¶
-
parse_SearchEngineInfo
(searchEngineInfo)¶
-
parse_comments
(comment)¶
-
parse_sptxt
(stack)¶ Parse an sptxt entry and initialize spectrum
-
phospho_len
()¶
-
phosphos_len
()¶
-
save
(db)¶
-
to_pepidx_str
()¶ Convert spectrum object to pepidx format
-
to_splib_str
()¶ Convert spectrum object to splib format
-
validate
()¶
-