armine package

Module contents

class armine.ARM[source]

Bases: object

Utility class for Association Rule Mining.

This class provides methods to generate a set of Association rules from a transactional dataset.

confidence_threshold
coverage_threshold
learn(support_threshold, confidence_threshold, coverage_threshold=20)[source]

Generate Association rules from the Training dataset.

Parameters:
  • support_threshold (float) – User defined threshold between 0 and 1. Rules with support less than support_threshold are not generated.
  • confidence_threshold (float) – User defined threshold between 0 and 1. Rules with confidence less than confidence_threshold are not generated.
  • coverage_threshold (int) – Maximum number of rules, a specific transaction can match. After it exceeds this, That row is no longer considered for matching other rules. Using this process all rules are removed, which do not match any transaction left(Default 20).
load(data)[source]

Load a set of transactions from a Iterable of lists.

Parameters:data (Iterable of lists) – List of transactions
load_from_csv(filename)[source]

Load a set of transactions from a csv file.

Parameters:filename (string) – Name of the csv file which contains a set of transactions
print_rules(attributes=('coverage', 'confidence', 'lift'))[source]

Print the generated rules in a tabular format.

Parameters:attributes (array_like) – pass
rules

Get a list of rules generated using the loaded dataset.

set_rule_key(key)[source]

Set the key function which should be used to sort rules.

The default key function sorts rules using lift, confidence and size of antecedent respectively. This behaviour can be changed using this method.

Parameters:key (function) – The key function to sort rules
support_threshold
class armine.ARMClassifier[source]

Bases: armine.armine.ARM

Utility class for Classification Rule Mining.

This class provides methods to generate a set of Classification rules from a transactional dataset or a tabular dataset. You can then use this class to classify unclassified data instances. The classification is done using a modified version of the CBA Algorithm.

classify(data_instance, top_k_rules=25)[source]

Classify data_instance using rules generated by learn method.

Parameters:
  • data_instance (array_like) – Unclassified input.
  • top_k_rules (int) – Maximum number of rules, which will be used to classify data_instance.
Returns:

Predicted label for the data_instance.

Return type:

str

Note

If the support_threshold and confidence_threshold passed to classify are both greater than the values at which learning was done, The result is same as if the learning is done at those higher values. This helps in optimization purposes where you only need to learn once at a low support and confidence_threshold, which reduces optimization time.

confidence_threshold
coverage_threshold
learn(support_threshold, confidence_threshold, coverage_threshold=20)

Generate Association rules from the Training dataset.

Parameters:
  • support_threshold (float) – User defined threshold between 0 and 1. Rules with support less than support_threshold are not generated.
  • confidence_threshold (float) – User defined threshold between 0 and 1. Rules with confidence less than confidence_threshold are not generated.
  • coverage_threshold (int) – Maximum number of rules, a specific transaction can match. After it exceeds this, That row is no longer considered for matching other rules. Using this process all rules are removed, which do not match any transaction left(Default 20).
load(data, transactional_database=False)[source]

Load dataset from a Dictionary.

Parameters:
  • data (dict) – Dictionary with keys as features and values as labels.
  • transactional_database (bool) – Whether the database is transactional(Default False).

Note

A database is transactional, if it contains transactions accompanied with respective labels. On the other hand, A non transactional database is basically a tabular dataset, with each column representing a distinct feature.

load_from_csv(filename, label_index=0, transactional_database=False)[source]

Load dataset from a csv file.

Parameters:
  • filename (string) – Name of the csv file which contains the dataset.
  • label_index (int) – Index of the column which contains the labels for each row. Supports negative indexing(Default -1 which corresponds to the last column).
  • transactional_database (bool) – Whether the database is transactional(Default False).
print_rules(attributes=('coverage', 'confidence', 'lift'))

Print the generated rules in a tabular format.

Parameters:attributes (array_like) – pass
rules

Get a list of rules generated using the loaded dataset.

set_rule_key(key)

Set the key function which should be used to sort rules.

The default key function sorts rules using lift, confidence and size of antecedent respectively. This behaviour can be changed using this method.

Parameters:key (function) – The key function to sort rules
support_threshold