pyms.DPA

Alignment of peak lists by dynamic programming.

pyms.DPA.Alignment

Classes for peak alignment by dynamic programming.

Classes:

Alignment(expr)

Models an alignment of peak lists.

Functions:

exprl2alignment(expr_list)

Converts a list of experiments into a list of alignments.

class Alignment(expr)[source]

Bases: object

Models an alignment of peak lists.

Parameters

expr (Optional[Experiment]) – The experiment to be converted into an alignment object.

Authors

Woon Wai Keen, Qiao Wang, Vladimir Likic, Dominic Davis-Foster.

Methods:

__len__()

Returns the length of the alignment, defined as the number of peak positions in the alignment.

aligned_peaks([minutes])

Returns a list of Peak objects where each peak has the combined spectra and average retention time of all peaks that aligned.

common_ion()

Calculates a common ion among the peaks of an aligned peak.

filter_min_peaks(min_peaks)

Filters alignment positions that have less peaks than min_peaks.

get_area_alignment([require_all_expr])

Returns a Pandas dataframe containing the peak areas of the aligned peaks.

get_highest_mz_ion(ion_dict)

Returns the preferred ion for quantitiation.

get_ms_alignment([require_all_expr])

Returns a Pandas dataframe of mass spectra for the aligned peaks.

get_peak_alignment([minutes, require_all_expr])

Returns a Pandas dataframe of aligned retention times.

get_peaks_alignment([require_all_expr])

Returns a Pandas dataframe of Peak objects for the aligned peaks.

write_common_ion_csv(area_file_name, …[, …])

Writes the alignment to CSV files.

write_csv(rt_file_name, area_file_name[, …])

Writes the alignment to CSV files.

write_ion_areas_csv(ms_file_name[, minutes])

Write Ion Areas to CSV File.

Attributes:

expr_code

List of experiment codes.

peakalgt

peakpos

similarity

__len__()[source]

Returns the length of the alignment, defined as the number of peak positions in the alignment.

Return type

int

Authors

Qiao Wang, Vladimir Likic

aligned_peaks(minutes=False)[source]

Returns a list of Peak objects where each peak has the combined spectra and average retention time of all peaks that aligned.

Parameters

minutes (bool) – Whether retention times are in minutes. If False, retention time are in seconds. Default False.

Return type

Sequence[Optional[Peak]]

Returns

A list of composite peaks based on the alignment.

Author

Andrew Isaac

common_ion()[source]

Calculates a common ion among the peaks of an aligned peak.

Return type

List[float]

Returns

A list of the highest intensity common ion for all aligned peaks.

Author

Sean O’Callaghan

expr_code

Type:    List[str]

List of experiment codes.

filter_min_peaks(min_peaks)[source]

Filters alignment positions that have less peaks than min_peaks.

This function is useful only for within state alignment.

Parameters

min_peaks (int) – Minimum number of peaks required for the alignment position to survive filtering.

Author

Qiao Wang

get_area_alignment(require_all_expr=True)[source]

Returns a Pandas dataframe containing the peak areas of the aligned peaks.

Parameters

require_all_expr (bool) – Whether the peak must be present in all experiments to be included in the data frame. Default True.

Authors

Woon Wai Keen, Andrew Isaac, Vladimir Likic, Dominic Davis-Foster

Return type

DataFrame

static get_highest_mz_ion(ion_dict)[source]

Returns the preferred ion for quantitiation.

Looks at the list of candidate ions, selects those which have highest occurrence, and selects the heaviest of those.

Parameters

ion_dict (Dict[float, int]) – a dictionary of m/z value: number of occurrences.

Return ion

The ion with the highest m/z value.

Return type

float

get_ms_alignment(require_all_expr=True)[source]

Returns a Pandas dataframe of mass spectra for the aligned peaks.

Parameters

require_all_expr (bool) – Whether the peak must be present in all experiments to be included in the data frame. Default True.

Authors

Woon Wai Keen, Andrew Isaac, Vladimir Likic, Dominic Davis-Foster

Return type

DataFrame

get_peak_alignment(minutes=True, require_all_expr=True)[source]

Returns a Pandas dataframe of aligned retention times.

Parameters
  • minutes (bool) – Whether to return retention times in minutes. If False, retention time will be returned in seconds. Default True.

  • require_all_expr (bool) – Whether the peak must be present in all experiments to be included in the data frame. Default True.

Authors

Woon Wai Keen, Andrew Isaac, Vladimir Likic, Dominic Davis-Foster

Return type

DataFrame

get_peaks_alignment(require_all_expr=True)[source]

Returns a Pandas dataframe of Peak objects for the aligned peaks.

Parameters

require_all_expr (bool) – Whether the peak must be present in all experiments to be included in the data frame. Default True.

Authors

Woon Wai Keen, Andrew Isaac, Vladimir Likic, Dominic Davis-Foster

Return type

DataFrame

peakalgt

Type:    List[List[Peak]]

peakpos

Type:    List[List[Peak]]

similarity

Type:    Optional[float]

write_common_ion_csv(area_file_name, top_ion_list, minutes=True)[source]

Writes the alignment to CSV files.

This function writes two files: one containing the alignment of peak retention times and the other containing the alignment of peak areas.

Parameters
  • area_file_name (Union[str, Path, PathLike]) – The name for the areas alignment file.

  • top_ion_list (Sequence[float]) – A list of the highest intensity common ion along the aligned peaks.

  • minutes (bool) – Whether to save retention times in minutes. If False, retention time will be saved in seconds. Default True.

Authors

Woon Wai Keen, Andrew Isaac, Sean O’Callaghan, Vladimir Likic, Dominic Davis-Foster (pathlib support)

write_csv(rt_file_name, area_file_name, minutes=True)[source]

Writes the alignment to CSV files.

This function writes two files: one containing the alignment of peak retention times and the other containing the alignment of peak areas.

Parameters
  • rt_file_name (Union[str, Path, PathLike]) – The name for the retention time alignment file.

  • area_file_name (Union[str, Path, PathLike]) – The name for the areas alignment file.

  • minutes (bool) – Whether to save retention times in minutes. If False, retention time will be saved in seconds. Default True.

Authors

Woon Wai Keen, Andrew Isaac, Vladimir Likic, David Kainer, Dominic Davis-Foster (pathlib support)

write_ion_areas_csv(ms_file_name, minutes=True)[source]

Write Ion Areas to CSV File.

Parameters
  • ms_file_name (Union[str, Path, PathLike]) – The name of the file

  • minutes (bool) – Whether to save retention times in minutes. If False, retention time will be saved in seconds. Default True.

Authors

David Kainer, Dominic Davis-Foster (pathlib support)

exprl2alignment(expr_list)[source]

Converts a list of experiments into a list of alignments.

Parameters

expr_list (List[Experiment]) – The list of experiments to be converted into an alignment objects.

Return type

List[Alignment]

Returns

A list of alignment objects for the experiments.

Author

Vladimir Likic

pyms.DPA.PairwiseAlignment

Classes for peak alignment by dynamic programming.

Classes:

PairwiseAlignment(alignments, D, gap)

Models pairwise alignment of alignments.

Functions:

align(a1, a2, D, gap)

Aligns two alignments.

align_with_tree(T[, min_peaks])

Aligns a list of alignments using the supplied guide tree.

alignment_compare(x, y)

A helper function for sorting peak positions in a alignment.

alignment_similarity(traces, score_matrix, gap)

Calculates similarity score between two alignments (new method).

dp(S, gap_penalty)

Solves optimal path in score matrix based on global sequence alignment.

merge_alignments(A1, A2, traces)

Merges two alignments with gaps added in from DP traceback.

position_similarity(pos1, pos2, D)

Calculates the similarity between the two alignment positions.

score_matrix(a1, a2, D)

Calculates the score matrix between two alignments.

score_matrix_mpi(a1, a2, D)

Calculates the score matrix between two alignments.

class PairwiseAlignment(alignments, D, gap)[source]

Bases: object

Models pairwise alignment of alignments.

Parameters
  • alignments (List[Alignment]) – A list of alignments.

  • D (float) – Retention time tolerance parameter (in seconds) for pairwise alignments.

  • gap (float) – Gap parameter for pairwise alignments.

Authors

Woon Wai Keen, Vladimir Likic

align(a1, a2, D, gap)[source]

Aligns two alignments.

Parameters
  • a1 (Alignment) – The first alignment

  • a2 (Alignment) – The second alignment

  • D (float) – Retention time tolerance in seconds.

  • gap (float) – Gap penalty

Return type

Alignment

Returns

Aligned alignments

Authors

Woon Wai Keen, Vladimir Likic

align_with_tree(T, min_peaks=1)[source]

Aligns a list of alignments using the supplied guide tree.

Parameters
Return type

Alignment

Returns

The final alignment consisting of aligned input alignments.

Authors

Woon Wai Keen, Vladimir Likic

alignment_compare(x, y)[source]

A helper function for sorting peak positions in a alignment.

Parameters
  • x

  • y

Return type

int

alignment_similarity(traces, score_matrix, gap)[source]

Calculates similarity score between two alignments (new method).

Parameters
  • traces – Traceback from DP algorithm.

  • score_matrix – Score matrix of the two alignments.

  • gap (float) – Gap penalty.

Return type

float

Returns

Similarity score (i.e. more similar => higher score)

Authors

Woon Wai Keen, Vladimir Likic

dp(S, gap_penalty)[source]

Solves optimal path in score matrix based on global sequence alignment.

Parameters
  • S – Score matrix

  • gap_penalty (float) – Gap penalty

Return type

Dict

Returns

A dictionary of results

Author

Tim Erwin

merge_alignments(A1, A2, traces)[source]

Merges two alignments with gaps added in from DP traceback.

Parameters
  • A1 (Alignment) – First alignment.

  • A2 (Alignment) – Second alignment.

  • traces – DP traceback.

Return type

Alignment

Returns

A single alignment from A1 and A2.

Authors

Woon Wai Keen, Vladimir Likic, Qiao Wang

position_similarity(pos1, pos2, D)[source]

Calculates the similarity between the two alignment positions.

A score of 0 is best and 1 is worst.

Parameters
  • pos1 (List[Peak]) – The position of the first alignment.

  • pos2 (List[Peak]) – The position of the second alignment.

  • D (float) – Retention time tolerance in seconds.

Return type

float

Returns

The similarity value for the current position.

Authors

Qiao Wang, Vladimir Likic, Andrew Isaac

score_matrix(a1, a2, D)[source]

Calculates the score matrix between two alignments.

Parameters
  • a1 (Alignment) – The first alignment.

  • a2 (Alignment) – The second alignment.

  • D (float) – Retention time tolerance in seconds.

Return type

ndarray

Returns

Aligned alignments.

Authors

Qiao Wang, Andrew Isaac

score_matrix_mpi(a1, a2, D)[source]

Calculates the score matrix between two alignments.

Parameters
  • a1 (Alignment) – The first alignment.

  • a2 (Alignment) – The second alignment.

  • D (float) – Retention time tolerance in seconds.

Returns

Aligned alignments

Authors

Qiao Wang, Andrew Isaac

pyms.DPA.IO

Functions for writing peak alignment to various file formats.

Functions:

write_excel(alignment, file_name[, minutes])

Writes the alignment to an excel file, with colouring showing possible mis-alignments.

write_mass_hunter_csv(alignment, file_name, …)

Creates a csv file with UID, common and qualifying ions and their ratios for mass hunter interpretation.

write_transposed_output(alignment, file_name)

Write an alignment to an Excel workbook.

write_excel(alignment, file_name, minutes=True)[source]

Writes the alignment to an excel file, with colouring showing possible mis-alignments.

Parameters
Author

David Kainer

write_mass_hunter_csv(alignment, file_name, top_ion_list)[source]

Creates a csv file with UID, common and qualifying ions and their ratios for mass hunter interpretation.

Parameters
  • alignment (Alignment) – alignment object to write to file

  • file_name (Union[str, Path, PathLike]) – name of the output file.

  • top_ion_list (List[int]) – a list of the common ions for each peak in the averaged peak list for the alignment.

write_transposed_output(alignment, file_name, minutes=True)[source]

Write an alignment to an Excel workbook.

Parameters

pyms.DPA.clustering

Provides Pycluster.treecluster regardless of which library provides it.

Functions:

treecluster(data[, mask, weight, transpose, …])

Perform hierarchical clustering, and return a Tree object.

treecluster(data, mask=None, weight=None, transpose=False, method='m', dist='e', distancematrix=None)[source]

Perform hierarchical clustering, and return a Tree object.

This function implements the pairwise single, complete, centroid, and average linkage hierarchical clustering methods.

Keyword arguments:
  • data: nrows x ncolumns array containing the data values.

  • mask: nrows x ncolumns array of integers, showing which data are missing. If mask[i][j]==0, then data[i][j] is missing.

  • weight: the weights to be used when calculating distances.

  • transpose: - if False, rows are clustered; - if True, columns are clustered.

  • dist: specifies the distance function to be used: - dist == ‘e’: Euclidean distance - dist == ‘b’: City Block distance - dist == ‘c’: Pearson correlation - dist == ‘a’: absolute value of the correlation - dist == ‘u’: uncentered correlation - dist == ‘x’: absolute uncentered correlation - dist == ‘s’: Spearman’s rank correlation - dist == ‘k’: Kendall’s tau

  • method: specifies which linkage method is used: - method == ‘s’: Single pairwise linkage - method == ‘m’: Complete (maximum) pairwise linkage (default) - method == ‘c’: Centroid linkage - method == ‘a’: Average pairwise linkage

  • distancematrix: The distance matrix between the items. There are three ways in which you can pass a distance matrix: 1. a 2D NumPy array (in which only the left-lower part of the array will be accessed); 2. a 1D NumPy array containing the distances consecutively; 3. a list of rows containing the lower-triangular part of the distance matrix.

    Examples are:

    >>> from numpy import array
    >>> # option 1:
    >>> distance = array([[0.0, 1.1, 2.3],
    ...                   [1.1, 0.0, 4.5],
    ...                   [2.3, 4.5, 0.0]])
    >>> # option 2:
    >>> distance = array([1.1, 2.3, 4.5])
    >>> # option 3:
    >>> distance = [array([]),
    ...             array([1.1]),
    ...             array([2.3, 4.5])]
    

    These three correspond to the same distance matrix.

    PLEASE NOTE: As the treecluster routine may shuffle the values in the distance matrix as part of the clustering algorithm, be sure to save this array in a different variable before calling treecluster if you need it later.

Either data or distancematrix should be None. If distancematrix is None, the hierarchical clustering solution is calculated from the values stored in the argument data. If data is None, the hierarchical clustering solution is instead calculated from the distance matrix. Pairwise centroid-linkage clustering can be performed only from the data values and not from the distance matrix. Pairwise single-, maximum-, and average-linkage clustering can be calculated from the data values or from the distance matrix.

Return value: treecluster returns a Tree object describing the hierarchical clustering result. See the description of the Tree class for more information.