ART Inference Attacks

Base Wrapper Class

The base class does not implement any attack. The ART inference attack wrappers inherit from the BaseMembershipInferenceAttack class and have the same attributes.

class pepr.privacy.art_inference_wrapper.BaseMembershipInferenceAttack(attack_alias, data, labels, data_conf, target_models, inference_attacks, pars_descriptors)

Base ART membership inference attack class implementing the logic for running an membership inference attack and generating a report.

Parameters:
  • attack_alias (str) – Alias for a specific instantiation of the class.
  • data (numpy.ndarray) – Dataset with all input images used to attack the target models.
  • labels (numpy.ndarray) – Array of all labels used to attack the target models.
  • data_conf (dict) –

    Record-indices for inference and evaluation:

    • train_record_indices (np.ndarray): (optional) Input to training process. Includes all features used to train the original model.
    • test_record_indices (np.ndarray): (optional) Test records that are not used in the training of the target model.
    • attack_record_indices (np.ndarray): Input to attack. Includes all features except the attacked feature.
    • attack_membership_status (np.ndarray): True membership status of the attack_records 1 indicates a member and 0 indicates non-member. This is used to compare the attacks results with the true membership status.
  • target_models (iterable) – List of target models which should be tested.
  • inference_attacks (list(art.attacks.Attack)) – List of ART inference attack objects per target model which are wrapped in this class.
  • pars_descriptors (dict) – Dictionary of attack parameters and their description shown in the attack report. Example: {“attack_model_type”: “Attack model type”} for the attribute named “attack_model_type” of MembershipInferenceBlackBox.
attack_alias

str – Alias for a specific instantiation of the class.

attack_pars

dict – Inference attack specific attack parameters:

data

numpy.ndarray – Dataset with all training samples used in the given pentesting setting.

labels

numpy.ndarray – Array of all labels used in the given pentesting setting.

target_models

iterable – List of target models which should be tested.

data_conf

dict – Record-indices for inference and evaluation:

inference_attacks

list(art.attacks.Attack) – List of ART attack objects per target model which are wrapped in this class.

pars_descriptors

dict – Dictionary of attack parameters and their description shown in the attack report. Example: {“attack_model_type”: “Attack model type”} for the attribute named “attack_model_type” of MembershipInferenceBlackBox.

attack_results

dict – Dictionary storing the attack model results.

  • membership (list): List holding the inferred membership status, 1 indicates a member and 0 indicates non-member per target model.
  • tn (list): Number of true negatives per target model.
  • tp (list): Number of true positives per target model.
  • fn (list): Number of false negatives per target model.
  • fp (list): Number of false positives per target model.
  • precision (list): Attack precision per target model.
  • recall (list): Attack recall per target model.
  • accuracy (list): Attack accuracy per target model.

ART Inference Attack Wrappers

MembershipInferenceBlackBox art.attacks.inference.membership_inference.MembershipInferenceBlackBox wrapper
MembershipInferenceBlackBoxRuleBased art.attacks.inference.membership_inference.MembershipInferenceBlackBoxRuleBased wrapper
LabelOnlyDecisionBoundary art.attacks.inference.membership_inference.LabelOnlyDecisionBoundary wrapper
MIFace art.attacks.inference.membership_inference.MIFace wrapper class.
class pepr.privacy.art_inference_wrapper.MembershipInferenceBlackBox(attack_alias, attack_pars, data, labels, data_conf, target_models)

art.attacks.inference.membership_inference.MembershipInferenceBlackBox wrapper class.

Attack description: Implementation of a learned black-box membership inference attack.

This implementation can use as input to the learning process probabilities/logits or losses, depending on the type of model and provided configuration.

Parameters:
  • attack_alias (str) – Alias for a specific instantiation of the class.
  • attack_pars (dict) –

    Dictionary containing all needed attack parameters:

    • attack_model_type (str): (optional) the type of default attack model to train, optional. Should be one of nn (for neural network, default), rf (for random forest) or gb (gradient boosting). If attack_model is supplied, this option will be ignored.
    • input_type (str): (optional) the type of input to train the attack on. Can be one of: ‘prediction’ or ‘loss’. Default is prediction. Predictions can be either probabilities or logits, depending on the return type of the model.
    • attack_model: The attack model to train. Due to stability issues only TensorFlow Keras models are currently allowed. (Use PyTorch at your own risk, your runtime may crash.)
  • data (numpy.ndarray) – Dataset with all input images used to attack the target models.
  • labels (numpy.ndarray) – Array of all labels used to attack the target models.
  • data_conf (dict) –

    Dictionary describing for every target model which record-indices should be used for the attack.

    • train_record_indices (np.ndarray): Input to training process. Includes all features used to train the original model.
    • test_record_indices (np.ndarray): Test records that are not used in the training of the target model.
    • attack_record_indices (np.ndarray): Input to attack. Includes all features except the attacked feature.
    • attack_membership_status (np.ndarray): True membership status of the attack_records 1 indicates a member and 0 indicates non-member. This is used to compare the attacks results with the true membership status.
  • target_models (iterable) – List of target models which should be tested.
class pepr.privacy.art_inference_wrapper.MembershipInferenceBlackBoxRuleBased(attack_alias, attack_pars, data, labels, data_conf, target_models)

art.attacks.inference.membership_inference.MembershipInferenceBlackBoxRuleBased wrapper class.

Attack description: Implementation of a simple, rule-based black-box membership inference attack.

This implementation uses the simple rule: if the model’s prediction for a sample is correct, then it is a member. Otherwise, it is not a member.

Parameters:
  • attack_alias (str) – Alias for a specific instantiation of the class.
  • attack_pars (dict) – Dictionary containing all needed attack parameters: (no parameters)
  • data (numpy.ndarray) – Dataset with all input images used to attack the target models.
  • labels (numpy.ndarray) – Array of all labels used to attack the target models.
  • data_conf (dict) –

    Dictionary describing for every target model which record-indices should be used for the attack.

    • attack_record_indices (np.ndarray): Input to attack. Includes all features except the attacked feature.
    • attack_membership_status (np.ndarray): True membership status of the attack_records 1 indicates a member and 0 indicates non-member. This is used to compare the attacks results with the true membership status.
  • target_models (iterable) – List of target models which should be tested.
class pepr.privacy.art_inference_wrapper.LabelOnlyDecisionBoundary(attack_alias, attack_pars, data, labels, data_conf, target_models)

art.attacks.inference.membership_inference.LabelOnlyDecisionBoundary wrapper class.

Attack description: Implementation of Label-Only Inference Attack based on Decision Boundary.

Paper link: https://arxiv.org/abs/2007.14321

Parameters:
  • attack_alias (str) – Alias for a specific instantiation of the class.
  • attack_pars (dict) –

    Dictionary containing all needed attack parameters:

    • distance_threshold_tau (float): Threshold distance for decision boundary. Samples with boundary distances larger than threshold are considered members of the training dataset.
    • norm: (optional) Order of the norm. Possible values: “inf”, np.inf or 2.
    • max_iter (int): (optional) Maximum number of iterations.
    • max_eval (int): (optional) Maximum number of evaluations for estimating gradient.
    • init_eval (int): (optional) Initial number of evaluations for estimating gradient.
    • init_size (int): (optional) Maximum number of trials for initial generation of adversarial examples.
    • verbose (bool): (optional) Show progress bars.
  • data (numpy.ndarray) – Dataset with all input images used to attack the target models.
  • labels (numpy.ndarray) – Array of all labels used to attack the target models.
  • data_conf (dict) –

    Dictionary describing for every target model which record-indices should be used for the attack.

    • train_record_indices (np.ndarray): Input to training process. Includes all features used to train the original model.
    • test_record_indices (np.ndarray): Test records that are not used in the training of the target model.
    • attack_record_indices (np.ndarray): Input to attack. Includes all features except the attacked feature.
    • attack_membership_status (np.ndarray): True membership status of the attack_records 1 indicates a member and 0 indicates non-member. This is used to compare the attacks results with the true membership status.
  • target_models (iterable) – List of target models which should be tested.
class pepr.privacy.art_inference_wrapper.MIFace(attack_alias, attack_pars, data, labels, data_conf, target_models)

art.attacks.inference.membership_inference.MIFace wrapper class.

Attack description: Implementation of the MIFace algorithm from Fredrikson et al. (2015). While in that paper the attack is demonstrated specifically against face recognition models, it is applicable more broadly to classifiers with continuous features which expose class gradients.

Paper link: https://dl.acm.org/doi/10.1145/2810103.2813677

Parameters:
  • attack_alias (str) – Alias for a specific instantiation of the class.
  • attack_pars (dict) –

    Dictionary containing all needed attack parameters:

    • max_iter (int): (optional) Maximum number of gradient descent iterations for the model inversion.
    • window_length (int): (optional) Length of window for checking whether descent should be aborted.
    • threshold (float): (optional) Threshold for descent stopping criterion.
    • batch_size (int): (optional) Size of internal batches.
    • verbose (bool): (optional) Show progress bars.
  • data (numpy.ndarray) – Dataset with all input images used to attack the target models.
  • labels (numpy.ndarray) – Array of all labels used to attack the target models.
  • data_conf (dict) –

    Dictionary describing for every target model which record-indices should be used for the attack.

    • initial_input_data (np.ndarray): An array with the initial input to the victim classifier. If None, then initial input will be initialized as zero array.
    • initial_input_targets (np.ndarray): Target values of shape (nb_samples,).
  • target_models (iterable) – List of target models which should be tested.
attack_alias

str – Alias for a specific instantiation of the class.

attack_pars

dict – Inference attack specific attack parameters:

data

numpy.ndarray – Dataset with all training samples used in the given pentesting setting.

labels

numpy.ndarray – Array of all labels used in the given pentesting setting.

target_models

iterable – List of target models which should be tested.

data_conf

dict – Record-indices for inference and evaluation:

inference_attacks

list(art.attacks.Attack) – List of ART attack objects per target model which are wrapped in this class.

pars_descriptors

dict – Dictionary of attack parameters and their description shown in the attack report. Example: {“max_iter”: “Max. iterations”} for the attribute named “max_iter” of MIFace.

attack_results

dict – Dictionary storing the attack model results.

  • inferred_training_samples (list): The inferred training samples per target model.