`pyms.DPA`

Table of Contents

pyms.DPA

Alignment of peak lists by dynamic programming.

`pyms.DPA.Alignment`

Classes for peak alignment by dynamic programming.

Classes:

Alignment(expr)

Models an alignment of peak lists.

Functions:

exprl2alignment(expr_list)

Converts a list of experiments into a list of alignments.

class Alignment(expr)[source]

Bases: object

Models an alignment of peak lists.

Parameters: expr (Optional[Experiment]) – The experiment to be converted into an alignment object.
Authors: Woon Wai Keen, Qiao Wang, Vladimir Likic, Dominic Davis-Foster.

Methods:

`__len__`()	Returns the length of the alignment, defined as the number of peak positions in the alignment.
`aligned_peaks`([minutes])	Returns a list of Peak objects where each peak has the combined spectra and average retention time of all peaks that aligned.
`common_ion`()	Calculates a common ion among the peaks of an aligned peak.
`filter_min_peaks`(min_peaks)	Filters alignment positions that have less peaks than `min_peaks`.
`get_area_alignment`([require_all_expr])	Returns a Pandas dataframe containing the peak areas of the aligned peaks.
`get_highest_mz_ion`(ion_dict)	Returns the preferred ion for quantitiation.
`get_ms_alignment`([require_all_expr])	Returns a Pandas dataframe of mass spectra for the aligned peaks.
`get_peak_alignment`([minutes, require_all_expr])	Returns a Pandas dataframe of aligned retention times.
`get_peaks_alignment`([require_all_expr])	Returns a Pandas dataframe of Peak objects for the aligned peaks.
`write_common_ion_csv`(area_file_name, ...[, ...])	Writes the alignment to CSV files.
`write_csv`(rt_file_name, area_file_name[, ...])	Writes the alignment to CSV files.
`write_ion_areas_csv`(ms_file_name[, minutes])	Write Ion Areas to CSV File.

Attributes:

`expr_code`	List of experiment codes.
`peakalgt`
`peakpos`
`similarity`

__len__()[source]

Returns the length of the alignment, defined as the number of peak positions in the alignment.

Return type: int
Authors: Qiao Wang, Vladimir Likic

aligned_peaks(minutes=False)[source]

Returns a list of Peak objects where each peak has the combined spectra and average retention time of all peaks that aligned.

Parameters: minutes (bool) – Whether retention times are in minutes. If False, retention time are in seconds. Default False.
Return type: Sequence[Optional[Peak]]
Returns: A list of composite peaks based on the alignment.
Author: Andrew Isaac

common_ion()[source]

Calculates a common ion among the peaks of an aligned peak.

Return type: List[float]
Returns: A list of the highest intensity common ion for all aligned peaks.
Author: Sean O’Callaghan

expr_code

Type: List[str]

List of experiment codes.

filter_min_peaks(min_peaks)[source]

Filters alignment positions that have less peaks than min_peaks.

This function is useful only for within state alignment.

Parameters: min_peaks (int) – Minimum number of peaks required for the alignment position to survive filtering.
Author: Qiao Wang

get_area_alignment(require_all_expr=True)[source]

Returns a Pandas dataframe containing the peak areas of the aligned peaks.

Parameters: require_all_expr (bool) – Whether the peak must be present in all experiments to be included in the data frame. Default True.
Authors: Woon Wai Keen, Andrew Isaac, Vladimir Likic, Dominic Davis-Foster
Return type: DataFrame

static get_highest_mz_ion(ion_dict)[source]

Returns the preferred ion for quantitiation.

Looks at the list of candidate ions, selects those which have highest occurrence, and selects the heaviest of those.

Parameters: ion_dict (Dict[float, int]) – a dictionary of m/z value: number of occurrences.
Return ion: The ion with the highest m/z value.
Return type: float

get_ms_alignment(require_all_expr=True)[source]

Returns a Pandas dataframe of mass spectra for the aligned peaks.

Parameters: require_all_expr (bool) – Whether the peak must be present in all experiments to be included in the data frame. Default True.
Authors: Woon Wai Keen, Andrew Isaac, Vladimir Likic, Dominic Davis-Foster
Return type: DataFrame

get_peak_alignment(minutes=True, require_all_expr=True)[source]

Returns a Pandas dataframe of aligned retention times.

Parameters

minutes (bool) – Whether to return retention times in minutes. If False, retention time will be returned in seconds. Default True.
require_all_expr (bool) – Whether the peak must be present in all experiments to be included in the data frame. Default True.

Authors

Woon Wai Keen, Andrew Isaac, Vladimir Likic, Dominic Davis-Foster

Return type

DataFrame

get_peaks_alignment(require_all_expr=True)[source]

Returns a Pandas dataframe of Peak objects for the aligned peaks.

Parameters: require_all_expr (bool) – Whether the peak must be present in all experiments to be included in the data frame. Default True.
Authors: Woon Wai Keen, Andrew Isaac, Vladimir Likic, Dominic Davis-Foster
Return type: DataFrame

peakalgt: Type: List[List[Peak]]

peakpos: Type: List[List[Peak]]

similarity: Type: Optional[float]

write_common_ion_csv(area_file_name, top_ion_list, minutes=True)[source]

Writes the alignment to CSV files.

This function writes two files: one containing the alignment of peak retention times and the other containing the alignment of peak areas.

Parameters

area_file_name (Union[str, Path, PathLike]) – The name for the areas alignment file.
top_ion_list (Sequence[float]) – A list of the highest intensity common ion along the aligned peaks.
minutes (bool) – Whether to save retention times in minutes. If False, retention time will be saved in seconds. Default True.

Authors

Woon Wai Keen, Andrew Isaac, Sean O’Callaghan, Vladimir Likic, Dominic Davis-Foster (pathlib support)

write_csv(rt_file_name, area_file_name, minutes=True)[source]

Writes the alignment to CSV files.

This function writes two files: one containing the alignment of peak retention times and the other containing the alignment of peak areas.

Parameters

rt_file_name (Union[str, Path, PathLike]) – The name for the retention time alignment file.
area_file_name (Union[str, Path, PathLike]) – The name for the areas alignment file.
minutes (bool) – Whether to save retention times in minutes. If False, retention time will be saved in seconds. Default True.

Authors

Woon Wai Keen, Andrew Isaac, Vladimir Likic, David Kainer, Dominic Davis-Foster (pathlib support)

write_ion_areas_csv(ms_file_name, minutes=True)[source]

Write Ion Areas to CSV File.

Parameters

ms_file_name (Union[str, Path, PathLike]) – The name of the file
minutes (bool) – Whether to save retention times in minutes. If False, retention time will be saved in seconds. Default True.

Authors

David Kainer, Dominic Davis-Foster (pathlib support)

exprl2alignment(expr_list)[source]

Converts a list of experiments into a list of alignments.

Parameters: expr_list (List[Experiment]) – The list of experiments to be converted into an alignment objects.
Return type: List[Alignment]
Returns: A list of alignment objects for the experiments.
Author: Vladimir Likic

`pyms.DPA.PairwiseAlignment`

Classes for peak alignment by dynamic programming.

Classes:

`DPResult`	Return type of `dp()`.
`PairwiseAlignment`(alignments, D, gap)	Models pairwise alignment of alignments.

Functions:

`align`(a1, a2, D, gap)	Aligns two alignments.
`align_with_tree`(T[, min_peaks])	Aligns a list of alignments using the supplied guide tree.
`alignment_compare`(x, y)	A helper function for sorting peak positions in a alignment.
`alignment_similarity`(traces, score_matrix, gap)	Calculates similarity score between two alignments (new method).
`dp`(S, gap_penalty)	Solves optimal path in score matrix based on global sequence alignment.
`merge_alignments`(A1, A2, traces)	Merges two alignments with gaps added in from DP traceback.
`position_similarity`(pos1, pos2, D)	Calculates the similarity between the two alignment positions.
`score_matrix`(a1, a2, D)	Calculates the score matrix between two alignments.
`score_matrix_mpi`(a1, a2, D)	Calculates the score matrix between two alignments.

typeddict DPResult[source]

Bases: TypedDict

Return type of dp().

Required Keys

p (List[int])
q (List[int])
trace (List[int])
matches (List[List[int]])
D (ndarray)
phi (ndarray)

class PairwiseAlignment(alignments, D, gap)[source]

Bases: object

Models pairwise alignment of alignments.

Parameters

alignments (List[Alignment]) – A list of alignments.
D (float) – Retention time tolerance parameter (in seconds) for pairwise alignments.
gap (float) – Gap parameter for pairwise alignments.

Authors

Woon Wai Keen, Vladimir Likic

align(a1, a2, D, gap)[source]

Aligns two alignments.

Parameters

a1 (Alignment) – The first alignment
a2 (Alignment) – The second alignment
D (float) – Retention time tolerance in seconds.
gap (float) – Gap penalty

Return type

Alignment

Returns

Aligned alignments

Authors

Woon Wai Keen, Vladimir Likic

align_with_tree(T, min_peaks=1)[source]

Aligns a list of alignments using the supplied guide tree.

Parameters

T (PairwiseAlignment) – The pairwise alignment object.
min_peaks (int) – Default 1.

Return type

Alignment

Returns

The final alignment consisting of aligned input alignments.

Authors

Woon Wai Keen, Vladimir Likic

alignment_compare(x, y)[source]

A helper function for sorting peak positions in a alignment.

Parameters

x
y

Return type

int

alignment_similarity(traces, score_matrix, gap)[source]

Calculates similarity score between two alignments (new method).

Parameters

traces (List[int]) – Traceback from DP algorithm.
score_matrix (ndarray) – Score matrix of the two alignments.
gap (float) – Gap penalty.

Return type

float

Returns

Similarity score (i.e. more similar => higher score)

Authors

Woon Wai Keen, Vladimir Likic

dp(S, gap_penalty)[source]

Solves optimal path in score matrix based on global sequence alignment.

Parameters

S (ndarray) – Score matrix
gap_penalty (float) – Gap penalty

Return type

DPResult

Returns

A dictionary of results

Author

Tim Erwin

merge_alignments(A1, A2, traces)[source]

Merges two alignments with gaps added in from DP traceback.

Parameters

A1 (Alignment) – First alignment.
A2 (Alignment) – Second alignment.
traces (List[int]) – DP traceback.

Return type

Alignment

Returns

A single alignment from A1 and A2.

Authors

Woon Wai Keen, Vladimir Likic, Qiao Wang

position_similarity(pos1, pos2, D)[source]

Calculates the similarity between the two alignment positions.

A score of 0 is best and 1 is worst.

Parameters

pos1 (List[Peak]) – The position of the first alignment.
pos2 (List[Peak]) – The position of the second alignment.
D (float) – Retention time tolerance in seconds.

Return type

float

Returns

The similarity value for the current position.

Authors

Qiao Wang, Vladimir Likic, Andrew Isaac

score_matrix(a1, a2, D)[source]

Calculates the score matrix between two alignments.

Parameters

a1 (Alignment) – The first alignment.
a2 (Alignment) – The second alignment.
D (float) – Retention time tolerance in seconds.

Return type

ndarray

Returns

Aligned alignments.

Authors

Qiao Wang, Andrew Isaac

score_matrix_mpi(a1, a2, D)[source]

Calculates the score matrix between two alignments.

Parameters

a1 (Alignment) – The first alignment.
a2 (Alignment) – The second alignment.
D (float) – Retention time tolerance in seconds.

Return type

ndarray

Returns

Aligned alignments

Authors

Qiao Wang, Andrew Isaac

`pyms.DPA.IO`

Functions for writing peak alignment to various file formats.

Functions:

`write_excel`(alignment, file_name[, minutes])	Writes the alignment to an excel file, with colouring showing possible mis-alignments.
`write_mass_hunter_csv`(alignment, file_name, ...)	Creates a csv file with UID, common and qualifying ions and their ratios for mass hunter interpretation.
`write_transposed_output`(alignment, file_name)	Write an alignment to an Excel workbook.

write_excel(alignment, file_name, minutes=True)[source]

Writes the alignment to an excel file, with colouring showing possible mis-alignments.

Parameters

alignment (Alignment) – pyms.DPA.Alignment.Alignment object to write to file.
file_name (Union[str, Path, PathLike]) – The name for the retention time alignment file.
minutes (bool) – Whether to save retention times in minutes. If False, retention time will be saved in seconds. Default True.

Author

David Kainer

write_mass_hunter_csv(alignment, file_name, top_ion_list)[source]

Creates a csv file with UID, common and qualifying ions and their ratios for mass hunter interpretation.

Parameters

alignment (Alignment) – alignment object to write to file
file_name (Union[str, Path, PathLike]) – name of the output file.
top_ion_list (List[int]) – a list of the common ions for each peak in the averaged peak list for the alignment.

write_transposed_output(alignment, file_name, minutes=True)[source]

Write an alignment to an Excel workbook.

Parameters

alignment (Alignment) – pyms.DPA.Alignment.Alignment object to write to file
file_name (Union[str, Path, PathLike]) – The name of the file
minutes (bool) – Default True.

`pyms.DPA.clustering`

Provides Pycluster.treecluster regardless of which library provides it.

Functions:

treecluster(data[, mask, weight, transpose, ...])

Perform hierarchical clustering, and return a Tree object.

treecluster(data, mask=None, weight=None, transpose=False, method='m', dist='e', distancematrix=None)[source]

Perform hierarchical clustering, and return a Tree object.

This function implements the pairwise single, complete, centroid, and average linkage hierarchical clustering methods.

Keyword arguments:

data: nrows x ncolumns array containing the data values.
mask: nrows x ncolumns array of integers, showing which data are missing. If mask[i][j]==0, then data[i][j] is missing.
weight: the weights to be used when calculating distances.
transpose: - if False, rows are clustered; - if True, columns are clustered.
dist: specifies the distance function to be used: - dist == ‘e’: Euclidean distance - dist == ‘b’: City Block distance - dist == ‘c’: Pearson correlation - dist == ‘a’: absolute value of the correlation - dist == ‘u’: uncentered correlation - dist == ‘x’: absolute uncentered correlation - dist == ‘s’: Spearman’s rank correlation - dist == ‘k’: Kendall’s tau
method: specifies which linkage method is used: - method == ‘s’: Single pairwise linkage - method == ‘m’: Complete (maximum) pairwise linkage (default) - method == ‘c’: Centroid linkage - method == ‘a’: Average pairwise linkage
distancematrix: The distance matrix between the items. There are three ways in which you can pass a distance matrix: 1. a 2D NumPy array (in which only the left-lower part of the array will be accessed); 2. a 1D NumPy array containing the distances consecutively; 3. a list of rows containing the lower-triangular part of the distance matrix.

Examples are:
```
>>> from numpy import array
>>> # option 1:
>>> distance = array([[0.0, 1.1, 2.3],
...                   [1.1, 0.0, 4.5],
...                   [2.3, 4.5, 0.0]])
>>> # option 2:
>>> distance = array([1.1, 2.3, 4.5])
>>> # option 3:
>>> distance = [array([]),
...             array([1.1]),
...             array([2.3, 4.5])]
```
These three correspond to the same distance matrix.

PLEASE NOTE: As the treecluster routine may shuffle the values in the distance matrix as part of the clustering algorithm, be sure to save this array in a different variable before calling treecluster if you need it later.

Either data or distancematrix should be None. If distancematrix is None, the hierarchical clustering solution is calculated from the values stored in the argument data. If data is None, the hierarchical clustering solution is instead calculated from the distance matrix. Pairwise centroid-linkage clustering can be performed only from the data values and not from the distance matrix. Pairwise single-, maximum-, and average-linkage clustering can be calculated from the data values or from the distance matrix.

Return value: treecluster returns a Tree object describing the hierarchical clustering result. See the description of the Tree class for more information.

pyms.DPA

pyms.DPA.Alignment

pyms.DPA.PairwiseAlignment

pyms.DPA.IO

pyms.DPA.clustering

`pyms.DPA`

`pyms.DPA.Alignment`

`pyms.DPA.PairwiseAlignment`

`pyms.DPA.IO`

`pyms.DPA.clustering`