module varspark.utils¶
This module includes various utility functions e.g. for converting objects between Hail, Spark, numpy and pandas.
Created on 6 Dec 2017
@author: szu004
-
varspark.utils.
array_to_dataframe
(ndarray, labels=None)[source]¶ Converts a square numpy array to a pandas dataframe with index and column names from labels (if provided)
Parameters: - ndarray – a square numpy array to convert
- labels – labels to use for the index and for the column names
Returns: a pandas dataframe
-
varspark.utils.
array_to_dataframe_coord
(ndarray, labels=None, triangular=True, include_diagonal=True, row_name='row', col_name='col', value_name='value')[source]¶ Converts a square numpy array to a pandas dataframe in coordinate format that is [row, column, value]. Optionally only includes the lower triangular matrix with or without diagonal (to get only unique coordinates)
Parameters: - labels – labels to use for row and columns coordinates
- triangular – only include the lower triangular matrix
- include_diagonal – if the main diagonal should be included
- row_name – the name to use for row column (first coordinate)
- col_name – the name to use for col column (second coordinate)
- value_name – the name to use for the value column
Returns: dataframe with the values from the kinship matrix in the coordinate form
-
varspark.utils.
dist_mat_to_array
(dist_mat)[source]¶ Converts a (small) distributed matrix to dense numpy narray
Parameters: dist_mat – a pyspark.mllib.linalg distributed matrix Returns: a local numpy array with the matrix data
-
varspark.utils.
kinship_mat_to_dataframe
(km)[source]¶ Converts a hail KinshipMatrix to a pandas dataframe. Index and column names are obtained from sample_list of the matrix.
Parameters: km – kinship matrix to convert Returns: dataframe with the values from the kinship matrix
-
varspark.utils.
kinship_mat_to_dataframe_coord
(km, **kwargs)[source]¶ Converts a hail KinshipMatrix to a pandas dataframe. Coordinate values are obtained from sample_list of the matrix.
Parameters: - km – kinship matrix to convert
- kwargs – other conversion parameters as in [[array_to_dataframe_coord]]
Returns: dataframe with the values from the kinship matrix in the coordinate form