Glossary

We often receive questions about technical terms related to gender data. This glossary compiles definitions for some of the most frequently asked terms, drawing on a variety of sources including: UNSD, OECD, and World Bank.

Background Terminology

Statistics

The collection, organization, analysis and interpretation of numerical data. For example, the average age of girls in a particular school grade based on a sample of girls attending school.

Data

Factual information from which statistics are created.

Indicator

The representation of statistical data for a specified time, place or any other relevant characteristic; measurement or gauge of events captured in data and includes the scale on which the event is measured (i.e., number, percentage or ratio). An indicator allows for meaningful comparisons because they can capture positive or negative change. For example, the proportion of men and women with access to a financial account is an indicator that allows for comparisons between men and women, and between the time periods in which the indicators were produced. Indicators may be drawn directly from questions in a survey or constructed by using multiple survey questions (composite indicator).

Index

The compilation of individual indicators; ideal for measuring multi-dimensional concepts. For example, an index of women’s empowerment across the globe.

Sex-disaggregated data

Data is that collected and tabulated separately for men and women. For example, primary school attendance rates for boys vs. girls.

Gender statistics

The sum of the following: a) data collected and presented by sex as a primary and overall classification; b) data that reflect gender issues; c) data are based on concepts and definitions that adequately reflect the diversity of women and men and capture all aspects of their lives; d) data collection methods take into account stereotypes and social and cultural factors that may induce gender bias in the data. Refer to the UN Statistics Division definition for more information.

National statistical system

The ensemble of statistical organizations and units within a country that jointly collect, process, analyze and disseminate official statistics on behalf of national government.

National statistical office

The leading statistical agency within a national statistical system.

Official statistics

Statistics produced by government agencies such as national statistical office or other public bodies. For example, census data.

Unofficial (or nonofficial) statistics

Unofficial statistics are those produced by actors other than the government, such as civil society.

Quality gender data

Data is reliable, valid and representative, free of gender biases, with good coverage (including country coverage and regular country production), and is comparable across countries in terms of concepts, definitions and measures. Quality data should have the features of complexity (meaning that data from different domains in women’s lives can be cross-referenced and cross-tabulated), and granularity (where the data can be disaggregated into smaller units by race and ethnicity, age and geographic location, as well as sex).

Microdata sets

Observation data on the characteristics of a population, such as individuals, households, or establishments, collected by a census, survey, or other questionnaire. For example, when respondents are asked survey questions, their responses are recorded which gives a value for each response. These data provide the basis from which statistics and indicators can be calculated.

Metadata

Information about statistical data. This may include information on how the data was collected or generated, sampling procedures, example questionnaires, and any processing done to the original data such as the construction of new indicators.

Data Characteristics

Validity

Validity refers to the extent to which the data measures what it claims to measure. Internal validity refers to whether there is a causal relationship between the phenomenon being studied and the factors that we think causes it. For example, a study may seek to establish a causal relationship between women’s fertility and women’s level of education. The study is internally valid if the researcher can control for the effects of other possible factors, such as access to contraception. External validity refers to whether the results of a study can be generalized to other settings (ecological validity), other people (population validity) and over time (historical validity). For example, if the fertility-education study is conducted in a particular country in a particular year, an externally valid study may yield similar results for a different country, with a different group of respondents, a decade later.

Reliability

Data is considered reliable if it produces consistent results. For example, a survey question intended to measure a woman’s highest level of education should produce an accurate measure of her educational attainment, if repeated over time.

Granularity

Granularity refers to the level of detail in a particular data set. For example, if data can be sub-divided by groupings such as sex, geographic region, income level, education level, disability status etc., this improves its level of granularity.

Data Types and Sources

Cross-sectional studies

Cross-sectional studies interview a different sample of people each time they are carried out (in contrast to surveys, which often draw from the same sample frame). They happen at one point in time and provide a snapshot that can be used to track change at the societal level.

Longitudinal data

Data drawn from the same sample of people over time. Studies collecting longitudinal data happen at several points in time and can track change at the individual level. For example, interviewing the same members of households every 5 years to collect information on their labor force participation, health, education etc.

Citizen-generated data

Data that people or their organizations produce to directly monitor, demand, or drive change on issues that affect them. Actively given by citizens, providing direct representations of their perspectives and an alternative to datasets collected by governments or international institutions. For example, citizens may generate data on air quality through active reporting of results of their in-home air quality monitors.

Big data

See “Digital Data”

Digital data

An umbrella term referring to the large amounts of data continually generated as a by-product of everyday interactions with digital products or services; often characterized by its great volume, variety, lack of structure, and high rate of velocity. For example, data is captured from our interaction with mobile phones, online services such as shopping, banking, and search activity, and social media, and can be used to generate information on mobility, sentiments and attitudes, well-being and other areas. Geospatial data from satellites can also be used to give insight into mobility, infrastructure, and other observable characteristics of landscapes.

Administrative data

The set of units and data derived from an administrative source; collected primarily for administrative and not research purposes. For example, data collected by a Ministry of Health about clients using health services in a country.

Survey data

Data that is derived from a statistical survey. Surveys are a way to gain information about a larger population from a smaller number of people chosen to represent the larger group. This is a way to gather in-depth information on specific topics.

Census data

Demographic and/or housing, economic, agricultural and social data pertaining to all persons and their living quarters. For example, most countries conduct a Population and Housing Census every 10 years.

Survey Terminology

Sampling

The process of selecting a number of items or people from all items or people in a population where selection is based on a randomized process with a known probability of selection. For example, selecting a portion of the entire national population to be interviewed for a survey.

Sampling frame

A list of items or people forming a population from which a sample is taken. For example, the results of a population census can be used as a sampling frame for smaller surveys.

Representativeness

Process used to be certain that a sample is representative of the population. Ensures each population unit has an equal opportunity of inclusion in the survey.

Head of household

The person in the “family” who has primary authority and responsibility for household affairs and in the majority of cases, is its chief economic support. Or the person whom the household members designate as head. Assumes that most households are family households (in other words, they consist entirely of persons related by blood, marriage or adoption, except possibly for in-home, non-family domestic workers).

Stay in touch. Sign up for gender data updates.