pyms.DPA

Alignment of peak lists by dynamic programming.

pyms.DPA.Alignment

Classes for peak alignment by dynamic programming.

Classes:

Alignment(expr)

Models an alignment of peak lists.

Functions:

exprl2alignment(expr_list)

Converts a list of experiments into a list of alignments.

class Alignment(expr)[source]

Bases: object

Models an alignment of peak lists.

Parameters:

expr (Optional[Experiment]) – The experiment to be converted into an alignment object.

Authors:

Woon Wai Keen, Qiao Wang, Vladimir Likic, Dominic Davis-Foster.

Methods:

__len__()

Returns the length of the alignment, defined as the number of peak positions in the alignment.

aligned_peaks([minutes])

Returns a list of Peak objects where each peak has the combined spectra and average retention time of all peaks that aligned.

common_ion()

Calculates a common ion among the peaks of an aligned peak.

filter_min_peaks(min_peaks)

Filters alignment positions that have less peaks than min_peaks.

get_area_alignment([require_all_expr])

Returns a Pandas dataframe containing the peak areas of the aligned peaks.

get_highest_mz_ion(ion_dict)

Returns the preferred ion for quantitiation.

get_ms_alignment([require_all_expr])

Returns a Pandas dataframe of mass spectra for the aligned peaks.

get_peak_alignment([minutes, require_all_expr])

Returns a Pandas dataframe of aligned retention times.

get_peaks_alignment([require_all_expr])

Returns a Pandas dataframe of Peak objects for the aligned peaks.

write_common_ion_csv(area_file_name, ...[, ...])

Writes the alignment to CSV files.

write_csv(rt_file_name, area_file_name[, ...])

Writes the alignment to CSV files.

write_ion_areas_csv(ms_file_name[, minutes])

Write Ion Areas to CSV File.

Attributes:

expr_code

List of experiment codes.

peakalgt

peakpos

similarity

__len__()[source]

Returns the length of the alignment, defined as the number of peak positions in the alignment.

Return type:

int

Authors:

Qiao Wang, Vladimir Likic

aligned_peaks(minutes=False)[source]

Returns a list of Peak objects where each peak has the combined spectra and average retention time of all peaks that aligned.

Parameters:

minutes (bool) – Whether retention times are in minutes. If False, retention time are in seconds. Default False.

Return type:

Sequence[Optional[Peak]]

Returns:

A list of composite peaks based on the alignment.

Author:

Andrew Isaac

common_ion()[source]

Calculates a common ion among the peaks of an aligned peak.

Return type:

List[float]

Returns:

A list of the highest intensity common ion for all aligned peaks.

Author:

Sean O’Callaghan

expr_code

Type:    List[str]

List of experiment codes.

filter_min_peaks(min_peaks)[source]

Filters alignment positions that have less peaks than min_peaks.

This function is useful only for within state alignment.

Parameters:

min_peaks (int) – Minimum number of peaks required for the alignment position to survive filtering.

Author:

Qiao Wang

get_area_alignment(require_all_expr=True)[source]

Returns a Pandas dataframe containing the peak areas of the aligned peaks.

Parameters:

require_all_expr (bool) – Whether the peak must be present in all experiments to be included in the data frame. Default True.

Authors:

Woon Wai Keen, Andrew Isaac, Vladimir Likic, Dominic Davis-Foster

Return type:

DataFrame

static get_highest_mz_ion(ion_dict)[source]

Returns the preferred ion for quantitiation.

Looks at the list of candidate ions, selects those which have highest occurrence, and selects the heaviest of those.

Parameters:

ion_dict (Dict[float, int]) – a dictionary of m/z value: number of occurrences.

Return ion:

The ion with the highest m/z value.

Return type:

float

get_ms_alignment(require_all_expr=True)[source]

Returns a Pandas dataframe of mass spectra for the aligned peaks.

Parameters:

require_all_expr (bool) – Whether the peak must be present in all experiments to be included in the data frame. Default True.

Authors:

Woon Wai Keen, Andrew Isaac, Vladimir Likic, Dominic Davis-Foster

Return type:

DataFrame

get_peak_alignment(minutes=True, require_all_expr=True)[source]

Returns a Pandas dataframe of aligned retention times.

Parameters:
  • minutes (bool) – Whether to return retention times in minutes. If False, retention time will be returned in seconds. Default True.

  • require_all_expr (bool) – Whether the peak must be present in all experiments to be included in the data frame. Default True.

Authors:

Woon Wai Keen, Andrew Isaac, Vladimir Likic, Dominic Davis-Foster

Return type:

DataFrame

get_peaks_alignment(require_all_expr=True)[source]

Returns a Pandas dataframe of Peak objects for the aligned peaks.

Parameters:

require_all_expr (bool) – Whether the peak must be present in all experiments to be included in the data frame. Default True.

Authors:

Woon Wai Keen, Andrew Isaac, Vladimir Likic, Dominic Davis-Foster

Return type:

DataFrame

peakalgt

Type:    List[List[Peak]]

peakpos

Type:    List[List[Peak]]

similarity

Type:    Optional[float]

write_common_ion_csv(area_file_name, top_ion_list, minutes=True)[source]

Writes the alignment to CSV files.

This function writes two files: one containing the alignment of peak retention times and the other containing the alignment of peak areas.

Parameters:
  • area_file_name (Union[str, Path, PathLike]) – The name for the areas alignment file.

  • top_ion_list (Sequence[float]) – A list of the highest intensity common ion along the aligned peaks.

  • minutes (bool) – Whether to save retention times in minutes. If False, retention time will be saved in seconds. Default True.

Authors:

Woon Wai Keen, Andrew Isaac, Sean O’Callaghan, Vladimir Likic, Dominic Davis-Foster (pathlib support)

write_csv(rt_file_name, area_file_name, minutes=True)[source]

Writes the alignment to CSV files.

This function writes two files: one containing the alignment of peak retention times and the other containing the alignment of peak areas.

Parameters:
  • rt_file_name (Union[str, Path, PathLike]) – The name for the retention time alignment file.

  • area_file_name (Union[str, Path, PathLike]) – The name for the areas alignment file.

  • minutes (bool) – Whether to save retention times in minutes. If False, retention time will be saved in seconds. Default True.

Authors:

Woon Wai Keen, Andrew Isaac, Vladimir Likic, David Kainer, Dominic Davis-Foster (pathlib support)

write_ion_areas_csv(ms_file_name, minutes=True)[source]

Write Ion Areas to CSV File.

Parameters:
  • ms_file_name (Union[str, Path, PathLike]) – The name of the file

  • minutes (bool) – Whether to save retention times in minutes. If False, retention time will be saved in seconds. Default True.

Authors:

David Kainer, Dominic Davis-Foster (pathlib support)

exprl2alignment(expr_list)[source]

Converts a list of experiments into a list of alignments.

Parameters:

expr_list (List[Experiment]) – The list of experiments to be converted into an alignment objects.

Return type:

List[Alignment]

Returns:

A list of alignment objects for the experiments.

Author:

Vladimir Likic

pyms.DPA.PairwiseAlignment

Classes for peak alignment by dynamic programming.

Classes:

PairwiseAlignment(alignments, D, gap)

Models pairwise alignment of alignments.

Functions:

align(a1, a2, D, gap)

Aligns two alignments.

align_with_tree(T[, min_peaks])

Aligns a list of alignments using the supplied guide tree.

alignment_compare(x, y)

A helper function for sorting peak positions in a alignment.

alignment_similarity(traces, score_matrix, gap)

Calculates similarity score between two alignments (new method).

dp(S, gap_penalty)

Solves optimal path in score matrix based on global sequence alignment.

merge_alignments(A1, A2, traces)

Merges two alignments with gaps added in from DP traceback.

position_similarity(pos1, pos2, D)

Calculates the similarity between the two alignment positions.

score_matrix(a1, a2, D)

Calculates the score matrix between two alignments.

score_matrix_mpi(a1, a2, D)

Calculates the score matrix between two alignments.

class PairwiseAlignment(alignments, D, gap)[source]

Bases: object

Models pairwise alignment of alignments.

Parameters:
  • alignments (List[Alignment]) – A list of alignments.

  • D (float) – Retention time tolerance parameter (in seconds) for pairwise alignments.

  • gap (float) – Gap parameter for pairwise alignments.

Authors:

Woon Wai Keen, Vladimir Likic

align(a1, a2, D, gap)[source]

Aligns two alignments.

Parameters:
  • a1 (Alignment) – The first alignment

  • a2 (Alignment) – The second alignment

  • D (float) – Retention time tolerance in seconds.

  • gap (float) – Gap penalty

Return type:

Alignment

Returns:

Aligned alignments

Authors:

Woon Wai Keen, Vladimir Likic

align_with_tree(T, min_peaks=1)[source]

Aligns a list of alignments using the supplied guide tree.

Parameters:
Return type:

Alignment

Returns:

The final alignment consisting of aligned input alignments.

Authors:

Woon Wai Keen, Vladimir Likic

alignment_compare(x, y)[source]

A helper function for sorting peak positions in a alignment.

Parameters:
  • x

  • y

Return type:

int

alignment_similarity(traces, score_matrix, gap)[source]

Calculates similarity score between two alignments (new method).

Parameters:
  • traces – Traceback from DP algorithm.

  • score_matrix – Score matrix of the two alignments.

  • gap (float) – Gap penalty.

Return type:

float

Returns:

Similarity score (i.e. more similar => higher score)

Authors:

Woon Wai Keen, Vladimir Likic

dp(S, gap_penalty)[source]

Solves optimal path in score matrix based on global sequence alignment.

Parameters:
  • S – Score matrix

  • gap_penalty (float) – Gap penalty

Return type:

Dict

Returns:

A dictionary of results

Author:

Tim Erwin

merge_alignments(A1, A2, traces)[source]

Merges two alignments with gaps added in from DP traceback.

Parameters:
  • A1 (Alignment) – First alignment.

  • A2 (Alignment) – Second alignment.

  • traces – DP traceback.

Return type:

Alignment

Returns:

A single alignment from A1 and A2.

Authors:

Woon Wai Keen, Vladimir Likic, Qiao Wang

position_similarity(pos1, pos2, D)[source]

Calculates the similarity between the two alignment positions.

A score of 0 is best and 1 is worst.

Parameters:
  • pos1 (List[Peak]) – The position of the first alignment.

  • pos2 (List[Peak]) – The position of the second alignment.

  • D (float) – Retention time tolerance in seconds.

Return type:

float

Returns:

The similarity value for the current position.

Authors:

Qiao Wang, Vladimir Likic, Andrew Isaac

score_matrix(a1, a2, D)[source]

Calculates the score matrix between two alignments.

Parameters:
  • a1 (Alignment) – The first alignment.

  • a2 (Alignment) – The second alignment.

  • D (float) – Retention time tolerance in seconds.

Return type:

ndarray

Returns:

Aligned alignments.

Authors:

Qiao Wang, Andrew Isaac

score_matrix_mpi(a1, a2, D)[source]

Calculates the score matrix between two alignments.

Parameters:
  • a1 (Alignment) – The first alignment.

  • a2 (Alignment) – The second alignment.

  • D (float) – Retention time tolerance in seconds.

Returns:

Aligned alignments

Authors:

Qiao Wang, Andrew Isaac

pyms.DPA.IO

Functions for writing peak alignment to various file formats.

Functions:

write_excel(alignment, file_name[, minutes])

Writes the alignment to an excel file, with colouring showing possible mis-alignments.

write_mass_hunter_csv(alignment, file_name, ...)

Creates a csv file with UID, common and qualifying ions and their ratios for mass hunter interpretation.

write_transposed_output(alignment, file_name)

Write an alignment to an Excel workbook.

write_excel(alignment, file_name, minutes=True)[source]

Writes the alignment to an excel file, with colouring showing possible mis-alignments.

Parameters:
Author:

David Kainer

write_mass_hunter_csv(alignment, file_name, top_ion_list)[source]

Creates a csv file with UID, common and qualifying ions and their ratios for mass hunter interpretation.

Parameters:
  • alignment (Alignment) – alignment object to write to file

  • file_name (Union[str, Path, PathLike]) – name of the output file.

  • top_ion_list (List[int]) – a list of the common ions for each peak in the averaged peak list for the alignment.

write_transposed_output(alignment, file_name, minutes=True)[source]

Write an alignment to an Excel workbook.

Parameters:

pyms.DPA.clustering

Provides Pycluster.treecluster regardless of which library provides it.

Functions:

treecluster(data[, mask, weight, transpose, ...])

Perform hierarchical clustering, and return a Tree object.

treecluster(data, mask=None, weight=None, transpose=False, method='m', dist='e', distancematrix=None)[source]

Perform hierarchical clustering, and return a Tree object.

This function implements the pairwise single, complete, centroid, and average linkage hierarchical clustering methods.

Keyword arguments:
  • data: nrows x ncolumns array containing the data values.

  • mask: nrows x ncolumns array of integers, showing which data are missing. If mask[i][j]==0, then data[i][j] is missing.

  • weight: the weights to be used when calculating distances.

  • transpose: - if False, rows are clustered; - if True, columns are clustered.

  • dist: specifies the distance function to be used: - dist == ‘e’: Euclidean distance - dist == ‘b’: City Block distance - dist == ‘c’: Pearson correlation - dist == ‘a’: absolute value of the correlation - dist == ‘u’: uncentered correlation - dist == ‘x’: absolute uncentered correlation - dist == ‘s’: Spearman’s rank correlation - dist == ‘k’: Kendall’s tau

  • method: specifies which linkage method is used: - method == ‘s’: Single pairwise linkage - method == ‘m’: Complete (maximum) pairwise linkage (default) - method == ‘c’: Centroid linkage - method == ‘a’: Average pairwise linkage

  • distancematrix: The distance matrix between the items. There are three ways in which you can pass a distance matrix: 1. a 2D NumPy array (in which only the left-lower part of the array will be accessed); 2. a 1D NumPy array containing the distances consecutively; 3. a list of rows containing the lower-triangular part of the distance matrix.

    Examples are:

    >>> from numpy import array
    >>> # option 1:
    >>> distance = array([[0.0, 1.1, 2.3],
    ...                   [1.1, 0.0, 4.5],
    ...                   [2.3, 4.5, 0.0]])
    >>> # option 2:
    >>> distance = array([1.1, 2.3, 4.5])
    >>> # option 3:
    >>> distance = [array([]),
    ...             array([1.1]),
    ...             array([2.3, 4.5])]
    

    These three correspond to the same distance matrix.

    PLEASE NOTE: As the treecluster routine may shuffle the values in the distance matrix as part of the clustering algorithm, be sure to save this array in a different variable before calling treecluster if you need it later.

Either data or distancematrix should be None. If distancematrix is None, the hierarchical clustering solution is calculated from the values stored in the argument data. If data is None, the hierarchical clustering solution is instead calculated from the distance matrix. Pairwise centroid-linkage clustering can be performed only from the data values and not from the distance matrix. Pairwise single-, maximum-, and average-linkage clustering can be calculated from the data values or from the distance matrix.

Return value: treecluster returns a Tree object describing the hierarchical clustering result. See the description of the Tree class for more information.