Getting started
Ensure that piargus and TauArgus are installed. If both are installed, you can start by importing piargus along with pandas:
import pandas as pd
import piargus as pa
Loading data
There are two primary ways to use piargus:
starting from microdata or table data.
In both cases, your data must be in the form of a pandas Dataframe
.
If your data is stored in a CSV file, it can be loaded using pd.read_csv()
.
input_df = pd.read_csv('data.csv')
For more options to load data, consult the pandas documentation.
Starting from Microdata
First, convert your input_df
into a microdata-object:
input_data = pa.MicroData(input_df)
If any columns are hierarchical, specify them.
For example, if regio
is hierarchical and its hierarchy is stored in a file regio.hrc
,
you can load the hierarchy as follows:
regio_hierarchy = pa.TreeHierarchy.load_hrc("regio.hrc")
input_data = pa.MicroData(
input_df,
hierarchies={"regio": regio_hierarchy},
)
Setting up a Table
Set up a table with sbi
and regio
as explanatory variables and income
as the response variable.
Use the p%-rule as a safety rule and OPTIMAL
as a method for secondary suppression:
output_table = pa.Table(explanatory=['sbi', 'regio'],
reponse='income',
safety_rule="P(10)",
suppression_method=pa.OPTIMAL)
Running the Job
To run the table generation job with TauArgus
:
tau = pa.TauArgus(r'<Insert path to argus.exe here>')
job = pa.Job(input_data, [output_table], directory='tau', name="my-microdata")
report = tau.run(job)
table_result = output_table.load_result()
print(report)
print(table_result)
table_result.dataframe().to_csv('output/microdata_result.csv')
The output will look like this:
<ArgusReport>
status: success <0>
batch_file: tau\basic-example.arb
workdir: tau\work\basic-example
logbook_file: tau\basic-example_logbook.txt
logbook:
25-Aug-2023 16:49:24 : <OPENMICRODATA> "tau\input\basic-example_microdata.csv"
25-Aug-2023 16:49:24 : <OPENMETADATA> "tau\input\basic-example_microdata.rda"
25-Aug-2023 16:49:24 : <SPECIFYTABLE> "symbol""regio"|"income"||
25-Aug-2023 16:49:24 : <SAFETYRULE> P(10, 1)
25-Aug-2023 16:49:24 : <READMICRODATA>
25-Aug-2023 16:49:24 : Start explore file: tau\input\basic-example_microdata.csv
25-Aug-2023 16:49:24 : Start computing tables
25-Aug-2023 16:49:24 : Table: symbol x regio | income has been specified
25-Aug-2023 16:49:24 : Tables have been computed
25-Aug-2023 16:49:24 : Micro data file read; processing time 0 seconds
25-Aug-2023 16:49:24 : Tables from microdata have been read
25-Aug-2023 16:49:24 : <SUPPRESS> OPT(1)
25-Aug-2023 16:49:25 : End of Optimal protection. Time used 0 seconds
Number of suppressions: 4
25-Aug-2023 16:49:25 : <WRITETABLE> (1, 2, AS+, "tau\output\basic-example_table-1.csv")
25-Aug-2023 16:49:25 : Table: symbol x regio | income has been written
Output file name: tau\output\basic-example_table-1.csv
25-Aug-2023 16:49:25 : End of TauArgus run
Response: income
safe status unsafe
symbol regio
Total Total 264.43 S 264.43
ExampleDam x M 141.57
ExampleCity x M 122.86
A Total x M 142.59
ExampleDam x U 93.13
ExampleCity x U 49.46
C Total x M 121.84
ExampleDam x U 48.44
ExampleCity x U 73.40
Interpreting Status codes
Status |
Meaning |
---|---|
S |
Safe |
P |
Protected |
U |
Unsafe by primary suppression |
M |
Unsafe by secondary suppression |
Z |
Empty cell |
Starting from TableData
To work with tabular data, convert input_df
into a TableData
object:
input_data = pa.TableData(
input_df,
explanatory=["activity", "size"],
reponse="value",
safety_rule="P(10)",
suppression_method=pa.OPTIMAL,
)
You can also specify additional parameters to TableData
:
Parameter |
Meaning |
Example |
---|---|---|
|
Total code for each explanatory variable. |
|
|
Column with number of contributors to response. |
|
|
Columns with top contributors. |
|
To run the data protection job:
job = pa.Job(table, directory='tau', name='my-table-data')
result = tau.run(job)
table_result = table.load_result()
print(result)
print(table_result)
table_result.dataframe().to_csv('output/tabledata_result.csv')