API
Input Specification
Input data
- class piargus.inputspec.InputData(dataset, *, hierarchies: Dict[str, Hierarchy] = None, codelists: Dict[str, CodeList] = None, column_lengths: Dict[str, int] = None, total_codes: Dict[str, str] = None)[source]
Bases:
object
Abstract base class for a dataset that needs to be protected by Tau Argus.
- __init__(dataset, *, hierarchies: Dict[str, Hierarchy] = None, codelists: Dict[str, CodeList] = None, column_lengths: Dict[str, int] = None, total_codes: Dict[str, str] = None)[source]
Abstract class for input data. Either initialize MicroData or TableData.
- Parameters:
dataset – The dataset to make tables for.
hierarchies – The hierarchies to use for categorial data in the dataset.
codelists – Codelists (dicts) for categorical data in the dataset.
column_lengths – For each column the length.
total_codes – Codes within explanatory that are used for the totals. The lengths can also be derived by calling resolve_column_lengths.
- resolve_column_lengths(default=20)[source]
Make sure each column has a length.
For strings, it will look at hierarchies and codelists or max string. For categorical, it will look at the longest label. For booleans 1/0 is used with code length of 1. For numbers, it will default to 20.
- Parameters:
default – The length to use for numbers and other datatypes.
- property hierarchies
The hierarchies attached to input data.
- property codelists
The codelists attached to input data.
- class piargus.inputspec.MicroData(dataset, *, weight: str | None = None, request: str | None = None, request_values: Sequence[Any] = ('1', '2'), holding: str | None = None, **kwargs)[source]
Bases:
InputData
A MicroData instance contains the data at an individual level.
From such microdata, tabular aggregates can be constructed.
- __init__(dataset, *, weight: str | None = None, request: str | None = None, request_values: Sequence[Any] = ('1', '2'), holding: str | None = None, **kwargs)[source]
- Parameters:
dataset – The dataset (pd.DataFrame) containing the microdata.
weight – Column that contains the sampling weight of this record.
request – Column that indicates if a respondent asks for protection.
request_values – Parameters that indicate if request is asked. Two different request values can be specified for two different levels in the request_rule.
holding – Column containing the group identifier.
args – See InputData.
kwargs – See InputData.
See the Tau-Argus documentation for more details on these parameters.
- class piargus.inputspec.TableData(dataset, explanatory: Sequence[str], response: str, shadow: str | None = None, cost: str | None = None, labda: int | None = None, *, hierarchies: Dict[str, Hierarchy] = None, total_codes: str | Dict[str, str] = 'Total', frequency: str | None = None, top_contributors: Sequence[str] = (), lower_protection_level: str | None = None, upper_protection_level: str | None = None, status_indicator: str | None = None, status_markers: Dict[str, str] | None = None, safety_rule: str | Collection[str] = (), apriori: Apriori | Iterable[Sequence[Any]] = (), suppress_method: str | None = 'OPT', suppress_method_args: Sequence[Any] = (), **kwargs)[source]
-
A TableData instance contains data that has already been aggregated.
- __init__(dataset, explanatory: Sequence[str], response: str, shadow: str | None = None, cost: str | None = None, labda: int | None = None, *, hierarchies: Dict[str, Hierarchy] = None, total_codes: str | Dict[str, str] = 'Total', frequency: str | None = None, top_contributors: Sequence[str] = (), lower_protection_level: str | None = None, upper_protection_level: str | None = None, status_indicator: str | None = None, status_markers: Dict[str, str] | None = None, safety_rule: str | Collection[str] = (), apriori: Apriori | Iterable[Sequence[Any]] = (), suppress_method: str | None = 'OPT', suppress_method_args: Sequence[Any] = (), **kwargs)[source]
A TableData instance contains data which has already been aggregated.
It can be used for tables that are unprotected or partially protected. If it’s already partially protected, this can be indicated by status_indicator. Most of the parameters are already explained either in InputData or in Table.
- Parameters:
dataset – The dataset containing the table. This dataset should include totals.
explanatory – See Table.
response – See Table.
shadow – See Table.
cost – See Table.
labda – See Table.
total_codes – Codes within explanatory that are used for the totals.
frequency – Column containing number of contributors to this cell.
top_contributors – The columns containing top contributions for dominance rule. The columns should be in the same order as they appear in the dataset. The first of the these columns should describe the highest contribution, the second column the second-highest contribution.
lower_protection_level – Column that denotes the level below which values are unsafe.
upper_protection_level – Column that denotes the level above which values are unsafe.
status_indicator – Column indicating the status of cells.
status_markers – The meaning of each status. Should be dictionary mapping “SAFE”, “UNSAFE” and “STATUS” to a code indicating status.
kwargs – See InputData
- class piargus.inputspec.MetaData(columns=None, separator=',', status_markers=None)[source]
Bases:
object
Metadata describing InputData.
Usually it’s not required for a user to create a MetaData themselves. If not provided to job one can be generated from inputspec. It’s also possible to call: metadata = inputspec.generate_metadata() Then the resulting object can be modified.
This class can be used directly when an existing rda file needs to be used. An existing file can be loaded by MetaData.from_rda and passed to Job.
Hierarchies
- class piargus.inputspec.hierarchy.TreeHierarchy(*args, **kwargs)[source]
Bases:
Hierarchy
A hierarchy where the codes are built as a tree.
- property total_code: str
The code used as a total.
- get_node(path) TreeHierarchyNode | None [source]
Obtain a node within the hierarchy.
Return single Node, None if it doesn’t exist, ValueError if path not unique.
- create_node(path) TreeHierarchyNode [source]
Create a node within the hierarchy.
The newly created node is returned. If the node already existed, the existing one is returned.
- classmethod from_hrc(file, indent='@', total_code='Total')[source]
Create hierarchy from a hrc-file.
- classmethod from_rows(rows: Iterable[Tuple[str, str]], indent='@', total_code='Total')[source]
Construct from list of (code, parent) tuples.
- class piargus.inputspec.hierarchy.LevelHierarchy(*args, **kwargs)[source]
Bases:
Hierarchy
Hierarchical code consisting of digits.
Can be used if the digits of the code make the hierarchy. For each hierarchical level the width in the code should be given. For example [1, 2, 1] means the code has format “x.yy.z”.
- class piargus.inputspec.hierarchy.FlatHierarchy(*args, **kwargs)[source]
Bases:
Hierarchy
Hierarchy where all nodes are the same level.
This is used as a default when no hierarchy is specified.
- class piargus.inputspec.hierarchy.TreeHierarchyNode(code=None, children=(), parent=None)[source]
Bases:
Node
- __init__(code=None, children=(), parent=None)[source]
Create a Node
- Parameters:
identifier – Identifier for this node
parent – Immediately assign a parent
children – Children to add
- property code
Which code belongs to this node.
Output Specification
Tables
- class piargus.outputspec.Table(explanatory: Sequence[str], response: str | int = '<freq>', shadow: str | None = None, cost: int | str | None = None, labda: int = None, safety_rule: str | Collection[str] | SafetyRule = (), apriori: Apriori | Iterable[Sequence[Any]] = (), recodes: Mapping[str, int | TreeRecode] = None, suppress_method: str | None = 'OPT', suppress_method_args: Sequence = ())[source]
Bases:
object
A Table describes what the protected table should look like.
Usually there are a few explanatory columns and one response.
- __init__(explanatory: Sequence[str], response: str | int = '<freq>', shadow: str | None = None, cost: int | str | None = None, labda: int = None, safety_rule: str | Collection[str] | SafetyRule = (), apriori: Apriori | Iterable[Sequence[Any]] = (), recodes: Mapping[str, int | TreeRecode] = None, suppress_method: str | None = 'OPT', suppress_method_args: Sequence = ())[source]
Create a new Table
- Parameters:
explanatory – List of background variables that explain the response. Will be set as a Dataframe-index.
response – The column that needs to be explained.
shadow – The column that is used for the safety rules. Default: response.
cost – The column that contains the cost of suppressing a cell. Set to 1 to minimise the number of cells suppressed (although this might suppress totals). Default: response.
labda – If set to a value > 0, a box-cox transformation is applied on the cost variable. If set to 0, a log transformation is applied on the cost. Default: 1.
safety_rule –
A set of safety rules on individual level. Can be supplied as:
str where parts are separated by |
A sequence of parts
A dict with keys {“individual”: x “holding”: y} with separate rules on individual and holding level.
- Each part can be:
”P(p, n=1)”: p% rule
”NK(n, k)”: (n, k)-dominance rule
”ZERO(safety_range)”: Zero rule
”FREQ(minfreq, safety_range)”: Frequency rule
”REQ(percentage_1, percentage_2, safety_margin)”: Request rule
See the Tau-Argus manual for details on those rules.
apriori – Apriori file to change parameters.
suppress_method –
Method to use for secondary suppression. Options are:
GHMITER (“GH”): Hypercube
MODULAR (“MOD”): Modular
OPTIMAL (“OPT”): Optimal [default]
NETWORK (“NET”): Network
ROUNDING (“RND”): Controlled rounding
TABULAR_ADJUSTMENT (“CTA”): Controlled Tabular Adjustment
None: No secondary suppression is applied
See the Tau-Argus manual for details on those rules.
suppress_method_args – Parameters to pass to suppress_method.
- property safety_rule: str
What safety rule applies to this table.
- property apriori
Apriori settings of this table.
- load_result() TableResult [source]
After tau argus has run, this obtains the protected data.
- class piargus.outputspec.TreeRecode(codes)[source]
Bases:
object
Hierarchical codes can be recoded to make the output less detailed.
- class piargus.outputspec.Apriori(changes=(), separator=',', ignore_error=False, expand_trivial=True)[source]
Bases:
object
Apriori can be used to mark cells as safe or to specify that cells should not be suppressed.
- change_status(cell, status)[source]
Change status of cell to status.
Status can be: - S mark safe - U mark unsafe - P mark protected
Safety rule
- piargus.dominance_rule(n=3, k=75)[source]
(N, K)-dominance rule.
n is the number of contributors to a cell contributing more than k% of the total value of the cel
- piargus.percent_rule(p=10, n=1)[source]
P%-rule.
if x1 can be determined to an accuracy of better than p% of the true value then it is disclosive where x1 is the largest contributor to a cell.
- piargus.request_rule(percent1, percent2, safety_margin)[source]
Request rule
Here, cells are protected only when the largest contributor represents over (for example) 70% of the total and that contributor asked for protection. Therefore, a variable indicating the request is required
- piargus.weight_rule(apply_weights=False)[source]
Whether weights should be used in the safety rules.
- piargus.p_rule(p=10, n=1)
P%-rule.
if x1 can be determined to an accuracy of better than p% of the true value then it is disclosive where x1 is the largest contributor to a cell.
- piargus.nk_rule(n=3, k=75)
(N, K)-dominance rule.
n is the number of contributors to a cell contributing more than k% of the total value of the cel
Result
- class piargus.ArgusReport(returncode: int, batch_file=None, logbook_file=None, workdir=None)[source]
Bases:
object
Report of argus run.
- property status: str
Return whether the run succeeded or failed as text.
- property is_succesful: bool
Return whether the run succeeded.
- property is_failed: bool
Return whether the run failed.
- class piargus.TableResult(df, response)[source]
Bases:
object
Resulting table after protection.
- unsafe() Series [source]
Return the unsafe original response.
- Returns:
The raw unprotected totals as a series.
- status(recode=True) Series [source]
Return the status of each response.
- Parameters:
recode –
If True, readable codes will be returned. The following codes are used by default:
S: safe
P: protected
U: primary unsafe
M: secondary unsafe
Z: empty
Otherwise, raw status codes from Tau-Argus are returned. See the documentation of Tau-Argus.
- Returns:
Status for each combination.
Tau-Argus
- class piargus.TauArgus(program: str | Path = 'TauArgus')[source]
Bases:
object
Representation of the tau argus program that is run in the background.
- run(batch_or_job=None, check: bool = True, *args, **kwargs) ArgusReport [source]
Run either a batch file or a job.
- class piargus.BatchWriter(file)[source]
Bases:
object
Helper to write a batch file for use with TauArgus.
Usually the heavy work can be done by creating a Job. However, this class can still be used for direct low-level control.
- specify_table(explanatory, response='<freq>', shadow=None, cost=None, labda=None)[source]
Write SPECIFYTABLE to batch file.
- class piargus.Job(input_data: InputData, tables: Mapping[Hashable, Table] | Iterable[Table] | None = None, *, metadata: MetaData | None = None, linked_suppress_method: str | None = None, linked_suppress_method_args: Sequence[Any] = (), directory: str | Path | None = None, name: str | None = None, logbook: bool | str = True, interactive: bool = False, setup: bool = True)[source]
Bases:
object
Representation a data protection task.
This task can be fed to the tau-argus program.
- __init__(input_data: InputData, tables: Mapping[Hashable, Table] | Iterable[Table] | None = None, *, metadata: MetaData | None = None, linked_suppress_method: str | None = None, linked_suppress_method_args: Sequence[Any] = (), directory: str | Path | None = None, name: str | None = None, logbook: bool | str = True, interactive: bool = False, setup: bool = True)[source]
A job to protect a data source.
This class takes care of generating all input/meta files that TauArgus needs. If a directory is supplied, the necessary files will be created in that directory. Otherwise, a temporary directory is created, but it’s better to always supply one. Existing files won’t be written to directory. For example, if metadata is created from MetaData.from_rda(“otherdir/metadata.rda”) the existing file is used. If modifications are made to the metadata, then the user should call metadata.to_rda() first.
- Parameters:
input_data – The source from which to generate tables. Needs to be either MicroData or TableData.
tables – The tables to be generated. Can be omitted if input_data is TableData.
metadata – The metadata of input_data. If omitted, it will be derived from input_data.
linked_suppress_method –
Method to use for linked suppression. Options are:
GHMITER (“GH”): Hypercube
MODULAR (“MOD”): Modular
linked_suppress_method_args – Parameters to pass to suppress_method.
directory – Where to write tau-argus files.
name – Name from which to derive the name of some temporary files.
logbook – Whether this job should create its own logging file.
interactive – Whether the gui should be opened.
setup – Whether to set up the job immediately. (required before run).
- property name
Name of the job.
- property directory
Directory to put files.
- property batch_filepath
Where the batch file will be stored (read-only).
This is derived from name and directory.
- property logbook_filepath
Where the logfile will be stored (read-only).
This is derived automatically from name and directory.