module varspark.utils

This module includes various utility functions e.g. for converting objects between Hail, Spark, numpy and pandas.

Created on 6 Dec 2017

@author: szu004

varspark.utils.array_to_dataframe(ndarray, labels=None)[source]

Converts a square numpy array to a pandas dataframe with index and column names from labels (if provided)

Parameters:
  • ndarray – a square numpy array to convert
  • labels – labels to use for the index and for the column names
Returns:

a pandas dataframe

varspark.utils.array_to_dataframe_coord(ndarray, labels=None, triangular=True, include_diagonal=True, row_name='row', col_name='col', value_name='value')[source]

Converts a square numpy array to a pandas dataframe in coordinate format that is [row, column, value]. Optionally only includes the lower triangular matrix with or without diagonal (to get only unique coordinates)

Parameters:
  • labels – labels to use for row and columns coordinates
  • triangular – only include the lower triangular matrix
  • include_diagonal – if the main diagonal should be included
  • row_name – the name to use for row column (first coordinate)
  • col_name – the name to use for col column (second coordinate)
  • value_name – the name to use for the value column
Returns:

dataframe with the values from the kinship matrix in the coordinate form

varspark.utils.dist_mat_to_array(dist_mat)[source]

Converts a (small) distributed matrix to dense numpy narray

Parameters:dist_mat – a pyspark.mllib.linalg distributed matrix
Returns:a local numpy array with the matrix data
varspark.utils.kinship_mat_to_dataframe(km)[source]

Converts a hail KinshipMatrix to a pandas dataframe. Index and column names are obtained from sample_list of the matrix.

Parameters:km – kinship matrix to convert
Returns:dataframe with the values from the kinship matrix
varspark.utils.kinship_mat_to_dataframe_coord(km, **kwargs)[source]

Converts a hail KinshipMatrix to a pandas dataframe. Coordinate values are obtained from sample_list of the matrix.

Parameters:
  • km – kinship matrix to convert
  • kwargs – other conversion parameters as in [[array_to_dataframe_coord]]
Returns:

dataframe with the values from the kinship matrix in the coordinate form