ART Extraction Attacks

Base Wrapper Class

The base class does not implement any attack. The ART extraction attack wrappers inherit from the BaseExtractionAttack class and have the same attributes.

class pepr.privacy.art_extraction_wrapper.BaseExtractionAttack(attack_alias, attack_pars, data, labels, data_conf, target_models, extraction_attacks, pars_descriptors)

Base ART extraction attack class implementing the logic for running an extraction attack and generating a report.

Parameters:
  • attack_alias (str) – Alias for a specific instantiation of the class.
  • attack_pars (dict) –

    Extraction attack specific attack parameters:

    • stolen_models (list): List of untrained input models for every target model to store the stolen training data in.
  • data (numpy.ndarray) – Dataset with all input images used to attack the target models.
  • labels (numpy.ndarray) – Array of all labels used to attack the target models.
  • data_conf (dict) –

    Record-indices for extraction and evaluation:

    • stolen_record_indices (np.ndarray): Indices of records to use for the extraction attack.
    • eval_record_indices (np.ndarray): Indices of records for measuring the accuracy of the extracted model.
  • target_models (iterable) – List of target models which should be tested.
  • extraction_attacks (list(art.attacks.Attack)) – List of ART extraction attack objects per target model which are wrapped in this class.
  • pars_descriptors (dict) – Dictionary of attack parameters and their description shown in the attack report. Example: {“classifier”: “A victim classifier”} for the attribute named “classifier” of CopycatCNN.
attack_alias

str – Alias for a specific instantiation of the class.

attack_pars

dict – Extraction attack specific attack parameters:

data

numpy.ndarray – Dataset with all training samples used in the given pentesting setting.

labels

numpy.ndarray – Array of all labels used in the given pentesting setting.

target_models

iterable – List of target models which should be tested.

data_conf

dict – Record-indices for extraction and evaluation:

extraction_attacks

list(art.attacks.Attack) – List of ART attack objects per target model which are wrapped in this class.

pars_descriptors

dict – Dictionary of attack parameters and their description shown in the attack report. Example: {“classifier”: “A victim classifier”} for the attribute named “classifier” of CopycatCNN.

attack_results

dict – Dictionary storing the attack model results.

  • extracted_classifiers (list): List of extracted classifiers per target model.
  • ec_accuracy (list): List of the accuracy of the extracted classifiers per target model.
  • ec_accuracy_list (list): List of the accuracy of the extracted classifiers per target model and class. Shape: (target_model, class)

ART Extraction Attack Wrappers

CopycatCNN art.attacks.extraction.CopycatCNN wrapper class.
KnockoffNets art.attacks.extraction.KnockoffNets wrapper class.
class pepr.privacy.art_extraction_wrapper.CopycatCNN(attack_alias, attack_pars, data, labels, data_conf, target_models)

art.attacks.extraction.CopycatCNN wrapper class.

Attack description: Implementation of the Copycat CNN attack from Rodrigues Correia-Silva et al. (2018).

Paper link: https://arxiv.org/abs/1806.05476

Parameters:
  • attack_alias (str) – Alias for a specific instantiation of the class.
  • attack_pars (dict) –

    Dictionary containing all needed attack parameters:

    • batch_size_fit (int): (optional) Size of batches for fitting the thieved classifier.
    • batch_size_query (int): (optional) Size of batches for querying the victim classifier.
    • nb_epochs (int): (optional) Number of epochs to use for training.
    • nb_stolen (int): (optional) Number of queries submitted to the victim classifier to steal it.
    • use_probability (bool): (optional) Use probability.
    • stolen_record_indices (np.ndarray): Indices of records to use for the extraction attack.
  • data (numpy.ndarray) – Dataset with all input images used to attack the target models.
  • labels (numpy.ndarray) – Array of all labels used to attack the target models.
  • data_conf (dict) –

    Dictionary describing for every target model which record-indices should be used for the attack.

    • stolen_record_indices (np.ndarray): Indices of records to use for the extraction attack.
    • eval_record_indices (np.ndarray): Indices of records for measuring the accuracy of the extracted model.
  • target_models (iterable) – List of target models which should be tested.
class pepr.privacy.art_extraction_wrapper.KnockoffNets(attack_alias, attack_pars, data, labels, data_conf, target_models)

art.attacks.extraction.KnockoffNets wrapper class.

Attack description: Implementation of the Knockoff Nets attack from Orekondy et al. (2018).

Paper link: https://arxiv.org/abs/1812.02766

Parameters:
  • attack_alias (str) – Alias for a specific instantiation of the class.
  • attack_pars (dict) –

    Dictionary containing all needed attack parameters:

    • batch_size_fit (int): (optional) Size of batches for fitting the thieved classifier.
    • batch_size_query (int): (optional) Size of batches for querying the victim classifier.
    • nb_epochs (int): (optional) Number of epochs to use for training.
    • nb_stolen (int): (optional) Number of queries submitted to the victim classifier to steal it.
    • use_probability (bool): (optional) Use probability.
    • sampling_strategy (str): Sampling strategy, either random or adaptive.
    • reward (str): Reward type, in [‘cert’, ‘div’, ‘loss’, ‘all’].
    • verbose (bool): Show progress bars.
    • stolen_record_indices (np.ndarray): Indices of records to use for the extraction attack.
  • data (numpy.ndarray) – Dataset with all input images used to attack the target models.
  • labels (numpy.ndarray) – Array of all labels used to attack the target models.
  • data_conf (dict) –

    Dictionary describing for every target model which record-indices should be used for the attack.

    • stolen_record_indices (np.ndarray): Indices of records to use for the extraction attack.
    • eval_record_indices (np.ndarray): Indices of records for measuring the accuracy of the extracted model.
  • target_models (iterable) – List of target models which should be tested.