# Getting started Ensure that piargus and TauArgus are installed. If both are installed, you can start by importing piargus along with pandas: ```python import pandas as pd import piargus as pa ``` ## Loading data There are two primary ways to use piargus: starting from **microdata** or **table data**. In both cases, your data must be in the form of a pandas `Dataframe`. If your data is stored in a CSV file, it can be loaded using `pd.read_csv()`. ```python input_df = pd.read_csv('data.csv') ``` For more options to load data, consult the [pandas documentation](https://pandas.pydata.org/docs/reference/io.html). ## Starting from Microdata First, convert your `input_df` into a microdata-object: ```python input_data = pa.MicroData(input_df) ``` If any columns are hierarchical, specify them. For example, if `regio` is hierarchical and its hierarchy is stored in a file `regio.hrc`, you can load the hierarchy as follows: ```python regio_hierarchy = pa.TreeHierarchy.load_hrc("regio.hrc") input_data = pa.MicroData( input_df, hierarchies={"regio": regio_hierarchy}, ) ``` ### Setting up a Table Set up a table with `sbi` and `regio` as explanatory variables and `income` as the response variable. Use the [p%-rule](https://link.springer.com/chapter/10.1007/978-3-642-33627-0_1) as a safety rule and `OPTIMAL` as a method for secondary suppression: ```python output_table = pa.Table(explanatory=['sbi', 'regio'], reponse='income', safety_rule="P(10)", suppression_method=pa.OPTIMAL) ``` ### Running the Job To run the table generation job with `TauArgus`: ```python tau = pa.TauArgus(r'') job = pa.Job(input_data, [output_table], directory='tau', name="my-microdata") report = tau.run(job) table_result = output_table.load_result() print(report) print(table_result) table_result.dataframe().to_csv('output/microdata_result.csv') ``` The output will look like this: ``` status: success <0> batch_file: tau\basic-example.arb workdir: tau\work\basic-example logbook_file: tau\basic-example_logbook.txt logbook: 25-Aug-2023 16:49:24 : "tau\input\basic-example_microdata.csv" 25-Aug-2023 16:49:24 : "tau\input\basic-example_microdata.rda" 25-Aug-2023 16:49:24 : "symbol""regio"|"income"|| 25-Aug-2023 16:49:24 : P(10, 1) 25-Aug-2023 16:49:24 : 25-Aug-2023 16:49:24 : Start explore file: tau\input\basic-example_microdata.csv 25-Aug-2023 16:49:24 : Start computing tables 25-Aug-2023 16:49:24 : Table: symbol x regio | income has been specified 25-Aug-2023 16:49:24 : Tables have been computed 25-Aug-2023 16:49:24 : Micro data file read; processing time 0 seconds 25-Aug-2023 16:49:24 : Tables from microdata have been read 25-Aug-2023 16:49:24 : OPT(1) 25-Aug-2023 16:49:25 : End of Optimal protection. Time used 0 seconds Number of suppressions: 4 25-Aug-2023 16:49:25 : (1, 2, AS+, "tau\output\basic-example_table-1.csv") 25-Aug-2023 16:49:25 : Table: symbol x regio | income has been written Output file name: tau\output\basic-example_table-1.csv 25-Aug-2023 16:49:25 : End of TauArgus run Response: income safe status unsafe symbol regio Total Total 264.43 S 264.43 ExampleDam x M 141.57 ExampleCity x M 122.86 A Total x M 142.59 ExampleDam x U 93.13 ExampleCity x U 49.46 C Total x M 121.84 ExampleDam x U 48.44 ExampleCity x U 73.40 ``` ### Interpreting Status codes | Status | Meaning | |--------|---------------------------------| | S | Safe | | P | Protected | | U | Unsafe by primary suppression | | M | Unsafe by secondary suppression | | Z | Empty cell | ## Starting from TableData To work with tabular data, convert `input_df` into a `TableData` object: ```python input_data = pa.TableData( input_df, explanatory=["activity", "size"], reponse="value", safety_rule="P(10)", suppression_method=pa.OPTIMAL, ) ``` You can also specify additional parameters to `TableData`: | Parameter | Meaning | Example | |--------------------|-------------------------------------------------|---------------------------| | `total_codes` | Total code for each explanatory variable. | `{"regio": "US"}` | | `frequency` | Column with number of contributors to response. | `"n_obs"` | | `top_contributors` | Columns with top contributors. | `["max", "max2", "max3"]` | To run the data protection job: ```python job = pa.Job(table, directory='tau', name='my-table-data') result = tau.run(job) table_result = table.load_result() print(result) print(table_result) table_result.dataframe().to_csv('output/tabledata_result.csv') ```