package varspark.hail¶
This package contains variant spark integration with Hail.
from hail import *
import varspark.hail
hc = HailContext(sc)
vds = hc.import_vcf(...)
...
via = vds.importance_analysis("sa.pheno.label", n_trees = 1000)
module varspark.hail.extend¶
Created on 7 Nov 2017
@author: szu004
-
class
varspark.hail.extend.
VariantsDatasetFunctions
(*args, **kwargs)[source]¶ Extension to hail.VariantDataset with variant-spark related functions
-
importance_analysis
(**kwargs)[source]¶ Builds random forest classifier for the response variable defined with y_expr.
Parameters: - y_expr (str) – Response expression. Must evaluate to Boolean or numeric with all values 0 or 1.
- n_trees (int) – The number of trees to build in the forest.
- mtry_fraction (float) – The fraction of variables to try at each split.
- oob (bool) – Should OOB error be calculated.
- seed (long) – Random seed to use.
- batch_size (int) – The number of trees to build in one batch.
Returns: Importance analysis model.
Return type: ImportanceAnalysis
-
pairwise_operation
(**kwargs)[source]¶ Computes a pairwise operation on encoded genotypes. Currently implemented operations include:
- manhattan : the Manhattan distance
- euclidean : the Euclidean distance
- sharedAltAlleleCount: count of shared alternative alleles
- anySharedAltAlleleCount: count of variants that share at least one alternative allele
Parameters: operation_name – name of the operaiton. One of manhattan, euclidean, sharedAltAlleleCount, anySharedAltAlleleCount Returns: A symmetric no_of_samples x no_of_samples matrix with the result of the pairwise computation. Return type: hail.KinshipMatrix
-
module varspark.hail.rf¶
Created on 10 Nov 2017
@author: szu004
-
class
varspark.hail.rf.
ImportanceAnalysis
(hc, _jia)[source]¶ Model for random forest based importance analysis
-
important_variants
(**kwargs)[source]¶ Gets the top n most important loci.
Parameters: n_limit (int) – the limit of the number of loci to return Returns: A KeyTable with the variant in the first column and importance in the second. Return type: hail.KeyTable
-
oob_error
¶ OOB (Out of Bag) error estimate for the model
Return type: float
-