Confidentiality and Anonymity Mean the Same Thing So They Can Be Used Interchangeably.

Information confidentiality written report [PDF 907 KB]

Contents

  • Introduction
  • Why do we have to protect data confidentiality?
  • What privacy, security and confidentiality hateful
  • Degrees of identification in data
  • Why information technology is important to protect data confidentiality
  • What are the principles, laws, and ethics that govern information confidentiality?
  • What are the methods used to continue data confidential?
  • How tin can we use perturbation to protect confidentiality?
  • How can we use aggregation to protect confidentiality?
  • How can nosotros use suppression to protect confidentiality?
  • How can we limit access to data to protect confidentiality?
  • How can we build synthetic and confidential unit record files to support the full general publication of microdata?
  • References

Introduction

It is important to empathize and apply confidentiality principles, rules, and methods to make sure that y'all:

  • do not release data that could identify people, households, or organisations unintentionally
  • protect data provided by people and organisations, and ensure it isn't disclosed to anyone who is non authorised to admission it
  • use statistical methods to prevent data from being disclosed in a way that could place a person, household, or organisation unintentionally.

Using statistical methods correctly protects the confidentiality of data. Methods such as perturbation, aggregation, suppression, limiting access, and edifice synthetic or confidential unit of measurement record files keep information confidential. When information is confidential, no individuals, households, or businesses tin can be identified, and no unauthorised people tin access the data.

Why do we have to protect data confidentiality?

Unlike organisations have dissimilar requirements relating to when they must or wish to protect the privacy, security, and confidentiality of data so that people, households, and organisations tin't be identified without their permission. This includes where we must or wish to protect the confidentiality of information throughout its life cycle — whenever we collect, use, store, and distribute information technology.

What privacy, security, and confidentiality mean

The terms privacy, security, and confidentiality are often used interchangeably, only each term has a different meaning:

  • Privacy refers to a person's power to control the availability of data about themselves.
  • Security refers to how an organization stores and controls access to the data it holds.
  • Confidentiality refers to the protection of data from, and well-nigh, individuals and organisations; and how we ensure that data is not fabricated bachelor or disclosed without authorisation.

Degrees of identification in data

Diagram of degrees of identification in data - full description in text below.

Degrees of identification in data [PDF 969 KB]

What do statisticians and information analysts hateful when they talk about confidentiality? How does identifiable data differ from de-identified or confidentialised information? Data identifiability is non binary. Data lies on a spectrum with multiple shades of identifiability. This is a primer on how to distinguish different categories of information in the NZ content.

Identifiable data

Data that directly or indirectly identifies an individual or business.

Information that identifies a person without additional information or by linking to information in the public domain. Where an individual can be identified through connecting upward data.

Personal, identifiable data like this are protected, and should simply be released to the public providing we have explicit permission to do and then.

For example: proper name, date of birth, gender.

Examples

Individual

Name: HÄ“ni.

Gender: Female.

DOB: 31/01/1983.

Accost: 28 My Road

                 Postcode 6012

                 Wellington.

Business organization

Name: Puzzles.

Blazon: Newspaper Stationery.

           Manufacturing.

Employees: 34.

Expenditure: $398,000.

De-identified information

De-identified: Data which has had information removed from it to reduce take a chance of spontaneous recognition (likelihood of identifying a person, identify or organisation without any endeavor).

For case: Data held inside Stats NZ's Integrated Data Infrastructure (IDI) and Longitudinal Business Database (LBD) is de-identified before approved researchers can access in a secure data lab surround.

Partially confidentialised: Data which has been modified to protect the confidentiality of respondents while also maintaining the integrity of data. Modification involves applying methods such as top-coding, information swapping, and collapsing chiselled variables to the unit records.

Examples

Private

Proper noun:Unknown.

Gender: Female.

DOB: 1985.

Accost: Postcode 6012

                 Wellington.

Business organization

Name:Unknown.

Type: Manufacturing.

Employees: 30-40.

Expenditure: $398,000.

Confidentialised data

Data which has had statistical methods applied to it to protect against disclosing unauthorised information.

Statistical methods include suppression, aggregation, perturbation, data swapping, top and bottom coding, etc. These prevent the unauthorised identification of individuals, households, or organisations. This data is publicly available.

For example: Stats NZ nz.stat datasets.

Examples

Individual

Name:Unknown.

Gender: Female.

DOB: 30-xl years.

Accost: Wellington.

Business concern

Proper name:Unknown.

Type: Manufacturing.

Employees: 10-100.

Expenditure: Under $500,000.

Why it is important to protect information confidentiality

New Zealand businesses, institutions, and organisations rely on high-quality, timely, and accurate data for planning, inquiry, and information. Practiced data helps New Zealand abound and prosper.

The New Zealand Data and Data Direction Principles mandate that government data and data should be open, readily available, well managed, reasonably priced and reusable unless at that place are necessary reasons for its protection. These principles include:

"Open: Data and information held by government should be open for public access unless grounds for refusal or limitations exist under the Official Data Act or other government policy. In such cases they should be protected.

"Protected: Personal, confidential and classified data and data are protected."

New Zealand Data and Information Direction Principles

Information drove depends on goodwill and trust

Much of the information nerveless in New Zealand is almost individual people, households, businesses, and organisations — including sensitive personal and commercial information. Data gatherers and users depend on the personal and commercial trust and goodwill of the people they collect data from. Maintaining confidentiality is crucial to the New Zealand information arrangement.

Data confidentiality is oft a legal requirement

You lot're ofttimes required by law to keep data confidential. If you provide information to an unauthorised user, or provide identifiable data without consent, you may be breaking the police force. If the information becomes public, the implications are more serious.

What are the principles, laws, and ethics that govern information confidentiality?

Ways of keeping data confidentiality are governed past principles, laws, and ethics.

Principles for managing data confidentiality

Principles and legislative requirements underpin the policies, standards, and guidelines for data confidentiality. For example, Stats NZ's microdata output guide describes the methods and rules researchers must use to confidentialise output produced from Stats NZ's microdata. The methods and rules are based on legislative requirements and four principles:

  • Utility – Ensure information outputs are as rich, detailed, and unmodified as possible.
  • Safety – Manage the risk of identifiable information existence disclosed, down to the level required past police force, ethical obligations, and the preservation of trust.
  • Simplicity – Make rules as uncomplicated to apply and check equally possible.
  • Consistency – Aim for maximum consistency across outputs released over different channels, and across similar outputs from different sources of data.

Microdata output guide (Stats NZ)

Other sets of principles that are relevant to data confidentiality include:

  • Open up Data Charter 2015:
    • guides all-time practice for making data open up
    • adopted past the NZ Regime, March 2018
  • New Zealand Data and Information Management Principles 2011:
    • guides the management of information and information that the regime holds on behalf of the public
    • agreed to in Cabinet Minute (eleven) 29/12, Baronial 2011 [PDF 177 KB].

Information confidentiality required by constabulary

Data users must comply with relevant legislation. Legislation with specific requirements nearly keeping information confidential include:

  • Privacy Act 1993
  • Official Information Act 1982
  • Statistics Act 1975.

You may also need to comply with other legislative requirements when using specific types of information. For example, the Tax Assistants Human action 1994 sets out requirements for protecting tax data and the Health Data Privacy Code 1994 sets out rules for collecting, managing and using wellness information.

Health Information Privacy Code 1994 (Role of the Privacy Commissioner)

Ethics influence the protection of data

An integral feature of any authorities data system is that it is underpinned by ethical principles, to ensure responsible data use and prevent harmful outcomes. Respect for people is about recognising the people backside the data and the interests of individuals and groups in how data is used.

Protecting confidentiality of data is an of import fashion of showing respect for people. Whenever you lot release information you must take actress intendance with data that is personally or commercially sensitive.

Among the principles in the International Statistical Institute'south Declaration on Professional Ethics are that, when statisticians produce statistics, they must baby-sit privileged information, and protect the interests of individuals and organisations.

International Statistical Institute'due south Declaration on Professional Ideals

Government agencies and other producers of official statistics are also guided by the Un Fundamental Principles for Official Statistics:

"it is the utmost business organization of official statistics, to secure the privacy of data providers (similar households or enterprises) by assuring that no information is published that might be related to an identifiable person or business."

United Nations Fundamental Principles for Official Statistics [PDF ane MB]

Protecting personal identifying information and preserving security of any output is emphasised in the Principles for safe and effective employ of data and analytics developed by the Government Master Information Steward and the Privacy Commissioner.

Principles for safe and constructive employ of information and analytics [PDF 603 KB]

Other ethical guidelines volition exist relevant for specific types of research. For example, the National Ethics Advisory Committee'southward Upstanding Guidelines for Observational Studies covers research using wellness data.

National Ethics Informational Commission's Ethical Guidelines for Observational Studies

What are the methods used to keep information confidential?

It is essential to use confidentiality methods to protect individually identifiable information in microdata. You may also need to use them to protect larger datasets and data outputs.

Whenever we release data — to the public, a researcher, or any other kind of information user — we must brand certain its confidentiality is accordingly protected.

We protect confidentiality by ensuring that details about private people, households, businesses, or organisations are not identifiable, and cannot exist deduced. Details must non exist identifiable in the raw data, published statistics, or the data output users create.

Oft yous can release individually identifiable details, where you need to, provided you accept received written say-so from the individual to do and so.

Use statistical methods to protect microdata and larger sets of data

Unit record data and summary data — called microdata — is especially likely to be identifiable, as it is records of private people, households, businesses, or organisations.

Statistical data that volition be published needs to exist organised in a manner that prevents any individual details from being identified.

Use these statistical methods for protecting the confidentiality of information

To protect the confidentiality of microdata — and where necessary, larger datasets — you can use one or more of these statistical methods:

  • Perturbation – adding random noise to data outputs.
  • Aggregation – combining and/or simplifying information outputs.
  • Suppression – non reporting some data outputs.
  • Limiting information access – putting conditions and/or limits on access to data.
  • Synthesizing synthetic unit record files (SURFs) and confidential unit record files (CURFs) for general publication. These use a combination of perturbation, aggregation, suppression, and modelling until the data is confidential, but is also still a sufficiently accurate summary of the data to fulfil information users' needs.

Review your confidentiality methods regularly

Review your confidentiality business rules, methods, and processes regularly – at least every three to five years. You need to ensure that new technology, or the public availability of additional data, has not increased the risk of disclosure. Introduce new measures for protecting confidentiality if you lot need to.

What to do if there is a data breach

Even with protection in place, there is ever a risk of disclosing identifiable data. A data breach or disclosure breach happens when data is released that identifies a person, household, business, or organisation.

You must admit that in that location is always a hazard of a data breach happening. The Part of the Privacy Commissioner'due south Data Safety Toolkit has guidelines on remedying, managing, and mitigating information breaches.

Data Safety Toolkit (Office of the Privacy Commissioner)

How tin we use perturbation to protect confidentiality?

Perturbation – calculation random noise to information – is a widely used data confidentiality method. Perturbation works by calculation a random value to the information, to mask the data. This is called calculation 'random noise'.

Perturbation is a best-practise method. It is used past Stats NZ and past many international statistical agencies, including the The states Census Bureau and the Australian Agency of Statistics.

Employ a coordinated approach to count and magnitude tables

A count measures the number of individuals whose confidentiality is existence protected.

A count magnitude (or value magnitude) measures a sum of counts (or sum of values) relating to the private information you are protecting.

For example:

  • the man population in an area is a count
  • how many television set sets a population owns is a count magnitude
  • how much they earn is a value magnitude.

Too:

  • the number of businesses in an area is a count
  • how many employees they utilise is a count magnitude
  • how much turn a profit they make is a value magnitude.

Stats NZ has developed a method which perturbs both count and magnitude tables: the Noised Counts and Magnitudes (NCM) method. NCM is role of Stats NZ's evolution of an Automated Confidentiality Service (ACS). The ACS includes software, applications, and expertise to assistance users automatically use confidentiality methods and produce consistent results.

In the NCM method, each private data tape is assigned a uniformly distributed random number. These random numbers are fixed across time, to ensure the same degree of perturbation is practical to the individual over time.

How to perturb counts

For count tables, random numbers generate a new random number for units grouped together in a cell. This is the ground for stock-still random rounding to base 3 (FRR3). Information technology ensures the same group of individuals will always be rounded the aforementioned way in related tables.

In FRR3, you randomly circular counts to base 3.

  • Counts that are already multiples of 3 are left unchanged.
  • Those not a multiple of three, you lot circular to the nearest multiple of iii two-thirds of the time, or the adjacent nearest multiple of three one-3rd of the fourth dimension.

For example, a four will be rounded to either a three (2/3 likelihood) or a six (1/three likelihood). This is to disguise small counts. Just since all table data are rounded consistently, they are protected against both:

  • differencing attacks, where closely related results might be subtracted from each other to discover underlying modest counts
  • Monte Carlo attacks, where attacks are run again and again, to find the underlying raw numbers based on the distributions of results.

Random rounding

Diagram showing random rounding - full description in text below.

Random rounding [PDF 899 KB]

What does this method do?

We tin protect information in counts tables by random rounding to base of operations 3 (RR3). The counts are randomly rounded to base three in a consistent mode. This is to disguise modest counts, merely all cells in the table are randomly rounded. The effect is to make the output more confidential, by by and large preventing individuals' data from being released.

How does this bear upon the last information?

For small numbers, where there is the most gamble that individuals could be identified, in that location are larger per centum changes compared with larger numbers. For instance, a cell with a one changed to a 3 has been changed past 200 percent, but a cell with 1,001 changed to 1,002 has been changed past only 0.i percentage. When analysing information, small counts need to be treated with caution but for the larger values the per centum changes in these cells do not cause a trouble.

Perturbation of magnitudes

Utilise an n% 'racket multiplier' to generate magnitude tables.

The noise protects sensitive data where there is a disclosure hazard merely cancels itself out in larger collections of data.

Individual values are protected by at least +/- due north% for the most vulnerable data.

How tin can we use aggregation to protect confidentiality?

Aggregation involves grouping categories together. You avoid disclosure by combining columns or rows into one new group. You lot combine or simplify data outputs. This reduces the amount of information available about individuals.

Striking a balance between releasing data and saving labour

In the long run, aggregation is effective for striking a rest between releasing as much data as possible and limiting the work involved in producing tables.

Aggregation is useful when there are many cells with modest numbers. By collapsing categories or combining data cells, you remove much of the sensitivity in the table.

You need subject matter noesis to use this method. You demand to know which values in the data are important for your data users, and how values take been aggregated in the by, so you lot tin apply aggregation consistently.

Assemblage lowers the amount of detail in the concluding output data. You need to ensure that the resulting dataset is still useful for your users.

Good data classifications and standards make aggregation easier

To maximise flexibility, code data at the lowest level of the classification possible.

Make certain that your data classifications and standards are relevant to your customer's needs.

Classifications and standards should:

  • take an underlying conceptual basis
  • fit within a statistical framework which is intuitive and like shooting fish in a barrel to understand, navigate, and apply
  • be internationally comparable when you need to compare data across countries
  • be stable and comparable over time (balanced with the demand to update classifications from time to time).

Classifications and standards must exist unambiguous, exhaustive, and mutually exclusive:

  • unambiguous – observations tin can be clearly classified into a certain group based on divers nomenclature principles and criteria
  • exhaustive – all cases of the observation information can be classified
  • sectional – groups are conspicuously defined so data can't be classified into more than one group.

Classifications and standards must be systematic and operationally feasible. To achieve this:

  • classify observations consistently using agreed criteria
  • ascertain concepts and variables related to the nomenclature
  • brand sure unspecified or residual groups similar 'not elsewhere classified' contain few cases. If the size of the residual grouping grows considerably, y'all need to revise the classification system
  • to minimise bias in the data, utilize automated processes and methods, such equally coding tools (where practical)
  • ensure classifications are hierarchical, with a primary group level which you pause down further into lower classification levels.

Utilise a mutual collapsing strategy for aggregations. Give classifications names that reflect both the nearly detailed and the collapsed levels.

How can we apply suppression to protect information confidentiality?

When yous suppress data, you do not report selected data. Suppression is removing data from an output that reveals individualised information.

Suppress data past not reporting some data outputs

If a data value reveals too much data about a person, household, or business organization, yous can remove the data value from the output past suppressing. You replace its number value with some other value, such every bit an empty space, a zero, or a character like 'South' or 'C'. This is primary suppression.

But if y'all determine a data value is at take a chance, suppressing simply that value is not enough. If you give subtotals or marginal totals, it is still possible to decide the suppressed cell'southward bodily value. You need to suppress other data values likewise, to protect the primary data value. Suppressing these other data values, in the same style, is secondary suppression.

You need to suppress other cells, so the value of the cell you commencement suppressed tin can't be determined.

To suppress the fewest cells possible, complete a square of suppressions:

2N total suppressions for an North dimensional table (for example,

2two = 4 total suppressions for a 2-dimensional tabular array).

Using secondary suppression

Secondary suppression is often non an easy chore. To do it, you demand:

  • your criteria for performing secondary suppression (for case, minimising data loss, or sticking with previous cell suppression – refer to the criteria listed below)
  • methods for identifying the best suppression pattern.

Use these criteria to decide how to apply secondary suppression.

Historical criteria

It is of import to keep track of your publication history of primary and secondary suppressions, and to take care non to disembalm data where you lot alter which cells are suppressed, over time. Changing previous cell suppression trends might crusade either:

  • a disclosure for the normally suppressed cell
  • a problem for another cell that is now suppressed.
Cost function criteria

Y'all might want to:

  • suppress cells that have pocket-sized values (simply practice not suppress cells containing zeros)
  • suppress a minimum number of cells
  • minimise the number of values you suppress.
Availability criteria

You might need to publish certain cells for statistical information reasons, so yous cannot use them for secondary cell suppression; this might give you issues finding enough, or appropriate, cells to suppress.

To test if a suppression pattern is effective enough, make certain that in each row or column you lot suppress, at that place are at least two suppressions. For a 2-way table, each suppression should be the corner of a foursquare or rectangle of suppressions.

Use an automated tool for suppression

Primary and secondary suppression can exist a time-consuming manual process. Some automatic tools to help include Tau-ARGUS, One thousand-Confid, and sdcTable.

How can we limit access to data to protect confidentiality?

Unit record datasets that contain information virtually specific people, households, and organisations (microdata) are almost likely to reveal identifiable information. Protect confidentiality by imposing strict limitations on access to it.

Put atmospheric condition effectually the access to sensitive data

Only grant access to microdata to researchers who state the statistical purposes for wanting admission.

Where you approve access, consider drawing up a legally binding contract to control admission to the data.

Follow Stats NZ'southward best practice principles for information access

Stats NZ assesses enquiry proposals to access microdata using the following principles:

  • access to microdata must be for statistical purposes and/or bona fide inquiry purposes
  • access to microdata must be consistent with relevant legislation
  • access to microdata is at the discretion of the data custodian
  • admission to microdata must protect respondents' confidentiality
  • access to microdata must non adversely affect information drove
  • decisions on requests for access to microdata will be provided through transparent processes.

Sometimes, negotiations for researcher access involve multiple data custodians. Each custodian should consider and grant access individually.

When you consider granting admission to information, likewise consider the Privacy Human action. The Human action governs the use of data beyond the purpose for which it was originally nerveless.

In some situations, you may need to consult the Privacy Commissioner. For case, y'all may have a example where a legal provision parallels or constrains the relevant legislation. Or the privacy implications of the enquiry may not be clear.

The 'five safes' framework

At Stats NZ, microdata researchers operate within the 'five safes' framework. We only grant access to microdata if all the following weather are met:

  • safe people – researchers can be trusted to apply the data appropriately and follow procedures
  • safe projects – the project has a statistical purpose and is in the public interest
  • safe settings – security arrangements prevent unauthorised admission to the data
  • rubber data – identifiers are removed earlier data is made available
  • condom output – the statistical results produced do not contain any results that disclose individually identifiable information.

How Stats NZ keeps the Integrated IDI and LBD information safe: the 5 safes framework (Stats NZ)

The microdata output guide goes into more particular

The microdata output guide  is Stats NZ's all-time-practice guide for ensuring confidentiality in outputs from microdata. It covers how to use the statistical methods in greater item, with examples.

Microdata output guide (Stats NZ)

How tin can we build synthetic and confidential unit record files to support the general publication of microdata?

You can publish open microdata in one case its confidentiality is protected. Yous use statistical methods to prepare constructed unit record files (SURFs) and confidential unit of measurement record files (CURFs) that are suitable for general publication.

You utilise the methods of perturbation, aggregation, and suppression to process microdata so individual people, households, businesses, and organisations cannot be identified.

Overseas precedents for publishing open up microdata

Publishing CURFs is done overseas, for example, the Integrated Public Use Microdata Series (IPUMS) published by the US Census Data for Social, Economic and Health Research. Open government initiatives have pioneered the release of CURFS, rather than national statistics organisations.

Edifice confidential unit tape files (CURFs) for general publication every bit open microdata

You build CURFs by perturbing, aggregating, and supressing microdata, until the data no longer discloses identifiable data about individuals, but is also however an accurate plenty summary guess of the information to see the customers' needs.

The role of synthetic data in CURFs

When y'all create CURFs, you may confidentialise information by replacing the real data with information y'all have processed or modelled. Lightly confidentialised CURFs are called partly synthetic information. Heavily confidentialised CURFs are called fully synthetic data, or SURFs.

Creating CURFs and SURFs requires expertise and resources

Creating CURFs and SURFS is challenging and requires technical expertise. Research continues into how to automate the work. Electric current techniques can quantify the confidentiality importance of each variable and mitigate the risk for each variable. You tin use k-anonymity testing, and Special Unique Detection Algorithms (SUDA), inside automated tools similar sdcMicro.

The trade-off between confidentiality and utility

Often, the more heavily yous confidentialise a record, the less useful it is to your customers or terminate- users. Y'all need to strike a rest betwixt confidentiality and usefulness.

If you lot cannot ensure data is confidential, y'all may demand to withhold it.

References

Future of Privacy Forum (2016) – A visual guide to applied information de-identification

data.govt.nz (2011) – New Zealand Information and Information Direction Principles

International Statistical Constitute (2010) – Declaration on professional ethics

National Ethics Advisory Commission (2012) – Streamlined ethical guidelines for health and disability research

OECD (2007) – OECD Glossary of Statistical Terms

Open Data Charter (2015) – International Open Data Lease Principles.

Simson Garfinkel, National Found of Standards and Engineering science (NIST) (2015) – De-identification of personal information (NISTIR 8053).

Privacy Commissioner, Stats NZ (2018) – Principles for the condom and constructive use of data and analytics [PDF 603 KB]

Stats NZ (2007) – Principles and protocols for producers of Tier ane statistics.

Stats NZ (2015) – Privacy, security, and confidentiality of data supplied to Statistics NZ

Stats NZ (2016a) – Microdata output guide (Fourth edition)

Stats NZ (2016b) – Introducing new method for confidentialising business organization demography tables

Stats NZ (2017b) – Information, Privacy, Security and Confidentiality Policy

Stats NZ (2017c) – How we keep IDI and LBD data safe: The 5 safes

Stats NZ (2018c) – Microdata access protocols

United Nations Statistics Segmentation (2015) UN Fundamental Principles of Official Statistics – Implementation guidelines [PDF ane MB]

Contact us

If you'd like more information, have a question, or want to provide feedback, e-mail datalead@stats.govt.nz.

Content last reviewed 20 August 2020.

wigginscoand1982.blogspot.com

Source: https://www.data.govt.nz/toolkit/privacy-and-security/understanding-data-confidentiality/data-confidentiality-principles-and-methods-report/

0 Response to "Confidentiality and Anonymity Mean the Same Thing So They Can Be Used Interchangeably."

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel