API

Input Specification

Input data

class piargus.inputspec.InputData(dataset, *, hierarchies: Dict[str, Hierarchy] = None, codelists: Dict[str, CodeList] = None, column_lengths: Dict[str, int] = None, total_codes: Dict[str, str] = None)[source]

Bases: object

Abstract base class for a dataset that needs to be protected by Tau Argus.

__init__(dataset, *, hierarchies: Dict[str, Hierarchy] = None, codelists: Dict[str, CodeList] = None, column_lengths: Dict[str, int] = None, total_codes: Dict[str, str] = None)[source]

Abstract class for input data. Either initialize MicroData or TableData.

Parameters:

dataset – The dataset to make tables for.
hierarchies – The hierarchies to use for categorial data in the dataset.
codelists – Codelists (dicts) for categorical data in the dataset.
column_lengths – For each column the length.
total_codes – Codes within explanatory that are used for the totals. The lengths can also be derived by calling resolve_column_lengths.

abstract to_csv(target)[source]: Save data to a file in the csv-format which tau-argus requires.

abstract generate_metadata() → MetaData[source]: Generate metadata corresponding to the input data.

resolve_column_lengths(default=20)[source]

Make sure each column has a length.

For strings, it will look at hierarchies and codelists or max string. For categorical, it will look at the longest label. For booleans 1/0 is used with code length of 1. For numbers, it will default to 20.

Parameters:: default – The length to use for numbers and other datatypes.

property hierarchies: The hierarchies attached to input data.

property codelists: The codelists attached to input data.

class piargus.inputspec.MicroData(dataset, *, weight: str | None = None, request: str | None = None, request_values: Sequence[Any] = ('1', '2'), holding: str | None = None, **kwargs)[source]

Bases: InputData

A MicroData instance contains the data at an individual level.

From such microdata, tabular aggregates can be constructed.

__init__(dataset, *, weight: str | None = None, request: str | None = None, request_values: Sequence[Any] = ('1', '2'), holding: str | None = None, **kwargs)[source]

Parameters:

dataset – The dataset (pd.DataFrame) containing the microdata.
weight – Column that contains the sampling weight of this record.
request – Column that indicates if a respondent asks for protection.
request_values – Parameters that indicate if request is asked. Two different request values can be specified for two different levels in the request_rule.
holding – Column containing the group identifier.
args – See InputData.
kwargs – See InputData.

See the Tau-Argus documentation for more details on these parameters.

generate_metadata() → MetaData[source]: Generates a metadata file for free format micro data.

to_csv(file=None, na_rep='')[source]: Save data to a file in the csv-format which tau-argus requires.

class piargus.inputspec.TableData(dataset, explanatory: Sequence[str], response: str, shadow: str | None = None, cost: str | None = None, labda: int | None = None, *, hierarchies: Dict[str, Hierarchy] = None, total_codes: str | Dict[str, str] = 'Total', frequency: str | None = None, top_contributors: Sequence[str] = (), lower_protection_level: str | None = None, upper_protection_level: str | None = None, status_indicator: str | None = None, status_markers: Dict[str, str] | None = None, safety_rule: str | Collection[str] = (), apriori: Apriori | Iterable[Sequence[Any]] = (), suppress_method: str | None = 'OPT', suppress_method_args: Sequence[Any] = (), **kwargs)[source]

Bases: InputData, Table

A TableData instance contains data that has already been aggregated.

__init__(dataset, explanatory: Sequence[str], response: str, shadow: str | None = None, cost: str | None = None, labda: int | None = None, *, hierarchies: Dict[str, Hierarchy] = None, total_codes: str | Dict[str, str] = 'Total', frequency: str | None = None, top_contributors: Sequence[str] = (), lower_protection_level: str | None = None, upper_protection_level: str | None = None, status_indicator: str | None = None, status_markers: Dict[str, str] | None = None, safety_rule: str | Collection[str] = (), apriori: Apriori | Iterable[Sequence[Any]] = (), suppress_method: str | None = 'OPT', suppress_method_args: Sequence[Any] = (), **kwargs)[source]

A TableData instance contains data which has already been aggregated.

It can be used for tables that are unprotected or partially protected. If it’s already partially protected, this can be indicated by status_indicator. Most of the parameters are already explained either in InputData or in Table.

Parameters:

dataset – The dataset containing the table. This dataset should include totals.
explanatory – See Table.
response – See Table.
shadow – See Table.
cost – See Table.
labda – See Table.
total_codes – Codes within explanatory that are used for the totals.
frequency – Column containing number of contributors to this cell.
top_contributors – The columns containing top contributions for dominance rule. The columns should be in the same order as they appear in the dataset. The first of the these columns should describe the highest contribution, the second column the second-highest contribution.
lower_protection_level – Column that denotes the level below which values are unsafe.
upper_protection_level – Column that denotes the level above which values are unsafe.
status_indicator – Column indicating the status of cells.
status_markers – The meaning of each status. Should be dictionary mapping “SAFE”, “UNSAFE” and “STATUS” to a code indicating status.
kwargs – See InputData

generate_metadata() → MetaData[source]: Generates a metadata file for tabular data.

to_csv(file=None, na_rep='')[source]: Save data to a file in the csv-format which tau-argus requires.

class piargus.inputspec.MetaData(columns=None, separator=',', status_markers=None)[source]

Bases: object

Metadata describing InputData.

Usually it’s not required for a user to create a MetaData themselves. If not provided to job one can be generated from inputspec. It’s also possible to call: metadata = inputspec.generate_metadata() Then the resulting object can be modified.

This class can be used directly when an existing rda file needs to be used. An existing file can be loaded by MetaData.from_rda and passed to Job.

__init__(columns=None, separator=',', status_markers=None)[source]

to_rda(file=None)[source]: Save metadata to rda-file.

class piargus.inputspec.CodeList(codes)[source]

Bases: object

Describe a codelist for use with TauArgus.

It can be used to attach labels to code lists. It only has effect when running TauArgus interactively.

classmethod from_cdl(file)[source]: Read cdl file.

__init__(codes)[source]: Create a codelist.

iter_codes()[source]: Iterate through codes.

to_cdl(file=None, length=0)[source]: Store codelist in cdl file.

Hierarchies

class piargus.inputspec.hierarchy.TreeHierarchy(*args, **kwargs)[source]

Bases: Hierarchy

A hierarchy where the codes are built as a tree.

__init__(tree=None, *, total_code: str = 'Total', indent='@')[source]: Create a tree hierarchy.

property total_code: str: The code used as a total.

get_node(path) → TreeHierarchyNode | None[source]

Obtain a node within the hierarchy.

Return single Node, None if it doesn’t exist, ValueError if path not unique.

create_node(path) → TreeHierarchyNode[source]

Create a node within the hierarchy.

The newly created node is returned. If the node already existed, the existing one is returned.

classmethod from_hrc(file, indent='@', total_code='Total')[source]: Create hierarchy from a hrc-file.

to_hrc(file=None, length=0)[source]: Write hierarchy to a hrc-file.

classmethod from_rows(rows: Iterable[Tuple[str, str]], indent='@', total_code='Total')[source]: Construct from list of (code, parent) tuples.

classmethod from_relations(relations, child_name='code', parent_name='parent')[source]: Construct from child-parent list.

to_image(**kwargs)[source]: Export the hierarchy file as an image.

to_pillow(**kwargs)[source]: Export the hierarchy file as a Pillow image.

class piargus.inputspec.hierarchy.LevelHierarchy(*args, **kwargs)[source]

Bases: Hierarchy

Hierarchical code consisting of digits.

Can be used if the digits of the code make the hierarchy. For each hierarchical level the width in the code should be given. For example [1, 2, 1] means the code has format “x.yy.z”.

__init__(levels, *, total_code: str = 'Total')[source]: Create a tree hierarchy.

class piargus.inputspec.hierarchy.FlatHierarchy(*args, **kwargs)[source]

Bases: Hierarchy

Hierarchy where all nodes are the same level.

This is used as a default when no hierarchy is specified.

__init__(*, total_code='Total')[source]: Create a FlatHierarchy.

class piargus.inputspec.hierarchy.TreeHierarchyNode(code=None, children=(), parent=None)[source]

Bases: Node

__init__(code=None, children=(), parent=None)[source]

Create a Node

Parameters:

identifier – Identifier for this node
parent – Immediately assign a parent
children – Children to add

property code: Which code belongs to this node.

Output Specification

Tables

Bases: object

A Table describes what the protected table should look like.

Usually there are a few explanatory columns and one response.

Create a new Table

Parameters:

explanatory – List of background variables that explain the response. Will be set as a Dataframe-index.
response – The column that needs to be explained.
shadow – The column that is used for the safety rules. Default: response.
cost – The column that contains the cost of suppressing a cell. Set to 1 to minimise the number of cells suppressed (although this might suppress totals). Default: response.
labda – If set to a value > 0, a box-cox transformation is applied on the cost variable. If set to 0, a log transformation is applied on the cost. Default: 1.
safety_rule –
A set of safety rules on individual level. Can be supplied as:
- str where parts are separated by |
- A sequence of parts
- A dict with keys {“individual”: x “holding”: y} with separate rules on individual and holding level.
Each part can be:
- ”P(p, n=1)”: p% rule
- ”NK(n, k)”: (n, k)-dominance rule
- ”ZERO(safety_range)”: Zero rule
- ”FREQ(minfreq, safety_range)”: Frequency rule
- ”REQ(percentage_1, percentage_2, safety_margin)”: Request rule
See the Tau-Argus manual for details on those rules.
apriori – Apriori file to change parameters.
suppress_method –
Method to use for secondary suppression. Options are:
- GHMITER (“GH”): Hypercube
- MODULAR (“MOD”): Modular
- OPTIMAL (“OPT”): Optimal [default]
- NETWORK (“NET”): Network
- ROUNDING (“RND”): Controlled rounding
- TABULAR_ADJUSTMENT (“CTA”): Controlled Tabular Adjustment
- None: No secondary suppression is applied
See the Tau-Argus manual for details on those rules.
suppress_method_args – Parameters to pass to suppress_method.

property safety_rule: str: What safety rule applies to this table.

property apriori: Apriori settings of this table.

load_result() → TableResult[source]: After tau argus has run, this obtains the protected data.

class piargus.outputspec.TreeRecode(codes)[source]

Bases: object

Hierarchical codes can be recoded to make the output less detailed.

__init__(codes)[source]

classmethod from_grc(file)[source]: Load from grc file.

to_grc(file=None, length=0)[source]: Write to grc file.

class piargus.outputspec.Apriori(changes=(), separator=',', ignore_error=False, expand_trivial=True)[source]

Bases: object

Apriori can be used to mark cells as safe or to specify that cells should not be suppressed.

classmethod from_hst(file, separator=',')[source]: Read from apriori-file (extension .hst).

__init__(changes=(), separator=',', ignore_error=False, expand_trivial=True)[source]

change_status(cell, status)[source]

Change status of cell to status.

Status can be: - S mark safe - U mark unsafe - P mark protected

change_cost(cell, cost)[source]

Change costs of cell.

The higher the cost, the less likely this cell will be used for secondary suppression.

change_protection_level(cell, protection_level)[source]: Change protection level of cell.

to_hst(file=None)[source]: Write to hst-file.

Safety rule

piargus.dominance_rule(n=3, k=75)[source]

(N, K)-dominance rule.

n is the number of contributors to a cell contributing more than k% of the total value of the cel

piargus.percent_rule(p=10, n=1)[source]

P%-rule.

if x1 can be determined to an accuracy of better than p% of the true value then it is disclosive where x1 is the largest contributor to a cell.

piargus.frequency_rule(n, safety_range)[source]: Frequency rule.

piargus.request_rule(percent1, percent2, safety_margin)[source]

Request rule

Here, cells are protected only when the largest contributor represents over (for example) 70% of the total and that contributor asked for protection. Therefore, a variable indicating the request is required

piargus.zero_rule(safety_range)[source]

Zero rule

Whether zero-values are safe.

piargus.missing_rule(is_safe=False)[source]

Missing values rule

Whether missing values are safe.

piargus.weight_rule(apply_weights=False)[source]: Whether weights should be used in the safety rules.

piargus.manual_rule(margin=20)[source]

Manual rule

A manually supplied safety range

piargus.p_rule(p=10, n=1)

P%-rule.

if x1 can be determined to an accuracy of better than p% of the true value then it is disclosive where x1 is the largest contributor to a cell.

piargus.nk_rule(n=3, k=75)

(N, K)-dominance rule.

n is the number of contributors to a cell contributing more than k% of the total value of the cel

Result

class piargus.ArgusReport(returncode: int, batch_file=None, logbook_file=None, workdir=None)[source]

Bases: object

Report of argus run.

__init__(returncode: int, batch_file=None, logbook_file=None, workdir=None)[source]

read_batch() → Sequence[str][source]: Read batchfile and return lines.

read_log() → Sequence[str][source]: Read logfile and return lines.

check()[source]: Raise an exception if the run failed.

property status: str: Return whether the run succeeded or failed as text.

property is_succesful: bool: Return whether the run succeeded.

property is_failed: bool: Return whether the run failed.

class piargus.TableResult(df, response)[source]

Bases: object

Resulting table after protection.

__init__(df, response)[source]

unsafe() → Series[source]

Return the unsafe original response.

Returns:: The raw unprotected totals as a series.

status(recode=True) → Series[source]

Return the status of each response.

Parameters:

recode –

If True, readable codes will be returned. The following codes are used by default:

S: safe

P: protected

U: primary unsafe

M: secondary unsafe

Z: empty

Otherwise, raw status codes from Tau-Argus are returned. See the documentation of Tau-Argus.

Returns:

Status for each combination.

safe(unsafe_marker='x') → Series[source]

Return the (safe) totals of the response.

Parameters:: unsafe_marker – The marker to shield unsafe values
Returns:: The totals of the response as a Series

dataframe() → DataFrame[source]: Combines safe, status and unsafe in a dataframe.

Tau-Argus

class piargus.TauArgus(program: str | Path = 'TauArgus')[source]

Bases: object

Representation of the tau argus program that is run in the background.

__init__(program: str | Path = 'TauArgus')[source]

run(batch_or_job=None, check: bool = True, *args, **kwargs) → ArgusReport[source]: Run either a batch file or a job.

class piargus.BatchWriter(file)[source]

Bases: object

Helper to write a batch file for use with TauArgus.

Usually the heavy work can be done by creating a Job. However, this class can still be used for direct low-level control.

__init__(file)[source]

logbook(log_file)[source]: Write LOGBOOK to batch file.

open_microdata(microdata)[source]: Write OPENMICRODATA to batch file.

open_tabledata(tabledata)[source]: Write OPENTABLEDATA to batch file.

open_metadata(metadata)[source]: Write METADATA to batch file.

specify_table(explanatory, response='<freq>', shadow=None, cost=None, labda=None)[source]: Write SPECIFYTABLE to batch file.

read_microdata()[source]: Write READMICRODATA to batch file.

read_table(compute_totals=None)[source]: Write READTABLE to batch file.

apriori(filename, table, separator=',', ignore_error=False, expand_trivial=True)[source]: Write APRIORI to batch file.

recode(table, variable, file_or_treelevel)[source]: Write RECODE to batch file.

safety_rule(rule='', /, *, individual='', holding='')[source]: Write SAFETYRULE to batch file.

suppress(method, table, *method_args)[source]: Write SUPPRESS to batch file.

write_table(table, kind, options, filename)[source]: Write WRITETABLE to batch file.

version_info(filename)[source]: Write VERSIONINFO to batch file.

go_interactive()[source]: Write GOINTERACTIVE to batch file.

clear()[source]: Write CLEAR to batch file.

class piargus.Job(input_data: InputData, tables: Mapping[Hashable, Table] | Iterable[Table] | None = None, *, metadata: MetaData | None = None, linked_suppress_method: str | None = None, linked_suppress_method_args: Sequence[Any] = (), directory: str | Path | None = None, name: str | None = None, logbook: bool | str = True, interactive: bool = False, setup: bool = True)[source]

Bases: object

Representation a data protection task.

This task can be fed to the tau-argus program.

__init__(input_data: InputData, tables: Mapping[Hashable, Table] | Iterable[Table] | None = None, *, metadata: MetaData | None = None, linked_suppress_method: str | None = None, linked_suppress_method_args: Sequence[Any] = (), directory: str | Path | None = None, name: str | None = None, logbook: bool | str = True, interactive: bool = False, setup: bool = True)[source]

A job to protect a data source.

This class takes care of generating all input/meta files that TauArgus needs. If a directory is supplied, the necessary files will be created in that directory. Otherwise, a temporary directory is created, but it’s better to always supply one. Existing files won’t be written to directory. For example, if metadata is created from MetaData.from_rda(“otherdir/metadata.rda”) the existing file is used. If modifications are made to the metadata, then the user should call metadata.to_rda() first.

Parameters:

input_data – The source from which to generate tables. Needs to be either MicroData or TableData.
tables – The tables to be generated. Can be omitted if input_data is TableData.
metadata – The metadata of input_data. If omitted, it will be derived from input_data.
linked_suppress_method –
Method to use for linked suppression. Options are:
- GHMITER (“GH”): Hypercube
- MODULAR (“MOD”): Modular
linked_suppress_method_args – Parameters to pass to suppress_method.
directory – Where to write tau-argus files.
name – Name from which to derive the name of some temporary files.
logbook – Whether this job should create its own logging file.
interactive – Whether the gui should be opened.
setup – Whether to set up the job immediately. (required before run).

property name: Name of the job.

property directory: Directory to put files.

property input_data: InputData: Inputdata for job.

property tables: Mapping[Hashable, Table]: Which tables to generate based on input data..

property batch_filepath

Where the batch file will be stored (read-only).

This is derived from name and directory.

property logbook_filepath

Where the logfile will be stored (read-only).

This is derived automatically from name and directory.

setup(check=True)[source]: Generate all files required for TauArgus to run.

exception piargus.JobSetupError(problems)[source]

Bases: Exception

Exception to raise when the problem specification is wrong.

__init__(problems)[source]