Data confidentiality principles and methods report

It is important to understand and apply confidentiality principles, rules, and methods to make sure that you:

Using statistical methods correctly protects the confidentiality of data. Methods such as perturbation, aggregation, suppression, limiting access, and building synthetic or confidential unit record files keep data confidential. When data is confidential, no individuals, households, or businesses can be identified, and no unauthorised people can access the data.

Why do we have to protect data confidentiality?

Different organisations have different requirements relating to when they must or wish to protect the privacy, security, and confidentiality of data so that people, households, and organisations can’t be identified without their permission. This includes where we must or wish to protect the confidentiality of data throughout its life cycle — whenever we collect, use, store, and distribute it.

What privacy, security, and confidentiality mean

The terms privacy, security, and confidentiality are often used interchangeably, but each term has a different meaning:

Degrees of identification in data

Diagram of degrees of identification in data - full description in text below.

What do statisticians and data analysts mean when they talk about confidentiality? How does identifiable data differ from de-identified or confidentialised information? Data identifiability is not binary. Data lies on a spectrum with multiple shades of identifiability. This is a primer on how to distinguish different categories of data in the NZ content.

Identifiable data

Data that directly or indirectly identifies an individual or business.

Data that identifies a person without additional information or by linking to information in the public domain. Where an individual can be identified through connecting up information.

Personal, identifiable data like this are protected, and should only be released to the public providing we have explicit permission to do so.

For example: name, date of birth, gender.

Examples

Individual

Gender: Female.

DOB: 31/01/1983.

Address: 28 My Road

Business

Name: Puzzles.

Type: Paper Stationery.

Employees: 34.

Expenditure: $398,000.

De-identified data

De-identified: Data which has had information removed from it to reduce risk of spontaneous recognition (likelihood of identifying a person, place or organisation without any effort).

For example: Data held within Stats NZ’s Integrated Data Infrastructure (IDI) and Longitudinal Business Database (LBD) is de-identified before approved researchers can access in a secure data lab environment.

Partially confidentialised: Data which has been modified to protect the confidentiality of respondents while also maintaining the integrity of data. Modification involves applying methods such as top-coding, data swapping, and collapsing categorical variables to the unit records.

Examples

Individual

Name:Unknown.

Gender: Female.

Address: Postcode 6012

Business

Name:Unknown.

Type: Manufacturing.

Employees: 30-40.

Expenditure: $398,000.

Confidentialised data

Data which has had statistical methods applied to it to protect against disclosing unauthorised information.

Statistical methods include suppression, aggregation, perturbation, data swapping, top and bottom coding, etc. These prevent the unauthorised identification of individuals, households, or organisations. This data is publicly available.

For example: Stats NZ nz.stat datasets.

Examples

Individual

Name:Unknown.

Gender: Female.

DOB: 30-40 years.

Address: Wellington.

Business

Name:Unknown.

Type: Manufacturing.

Employees: 10-100.

Expenditure: Under $500,000.

Why it is important to protect data confidentiality

New Zealand businesses, institutions, and organisations rely on high-quality, timely, and accurate data for planning, research, and information. Good data helps New Zealand grow and prosper.

The New Zealand Data and Information Management Principles mandate that government data and information should be open, readily available, well managed, reasonably priced and reusable unless there are necessary reasons for its protection. These principles include:

“Open: Data and information held by government should be open for public access unless grounds for refusal or limitations exist under the Official Information Act or other government policy. In such cases they should be protected.

"Protected: Personal, confidential and classified data and information are protected.”

Data collection depends on goodwill and trust

Much of the data collected in New Zealand is about individual people, households, businesses, and organisations — including sensitive personal and commercial data. Data gatherers and users depend on the personal and commercial trust and goodwill of the people they collect data from. Maintaining confidentiality is crucial to the New Zealand data system.

Data confidentiality is often a legal requirement

You’re often required by law to keep data confidential. If you provide data to an unauthorised user, or provide identifiable information without consent, you may be breaking the law. If the information becomes public, the implications are more serious.

What are the principles, laws, and ethics that govern data confidentiality?

Ways of keeping data confidentiality are governed by principles, laws, and ethics.

Principles for managing data confidentiality

Principles and legislative requirements underpin the policies, standards, and guidelines for data confidentiality. For example, Stats NZ’s microdata output guide describes the methods and rules researchers must use to confidentialise output produced from Stats NZ’s microdata. The methods and rules are based on legislative requirements and four principles:

Other sets of principles that are relevant to data confidentiality include: