Statistics and Advanced Analytics Face-Off the Chinese Air Pollution Problem


  • Author: Lillian Pierson P.E.
  • Date: 06 Jul 2016
  • Copyright: Image appears courtesy of Getty Images

You’ve heard about the ongoing air pollution problems in China? If not, then consider this… Beijing’s air is so polluted with PM 2.5, that just living and breathing there on a bad day is equivalent to smoking about 25 cigarettes per day (Berkeley Earth). In today’s article you’re going to get a glimpse at the full extent of China’s air quality problem, and then see some delicious details on how statistics, numerical methods, and advanced analytics are being used to counter those problems.

thumbnail image: Statistics and Advanced Analytics Face-Off the Chinese Air Pollution Problem

Air quality problems in China… They’re that bad, and worse

Back in December of 2015, Beijing city government issued its first red alert to signal that air pollutant levels were significant enough to cause risk to human health. As part of active red alert protocols, the city demands that automobile usage be reduced by 50%, and that heavy-load vehicles like garbage trucks and semis be banned from the road. This, in turn, slows local business, industry, and commerce to a near grinding halt. Flight cancellations are another common and undesirable side effect of these red alert transportation restrictions. Even worse still, however, are the ways in which this level of air pollution is affecting human health throughout Beijing and China at-large.

According to, air pollution claims approximately 4,000 Chinese lives each day, which attributes for 17% of the death rate in the nation. It’s widely known that such high levels of PM2.5 in the air cause respiratory distress, asthma, lung cancer, strokes, and heart attacks. Local residents, like Gao Yuanli, must routinely wear face masks and use indoor air purifiers in feeble attempts to protect themselves from the torrent of air pollution engulfing them.

"I can't go out on weekends now if the air is bad, and I don't go to outdoor markets anymore” - Gao Yuanli, 35

Generating predictive insights for data-informed policy decision-making

Upon honest appraisal of Beijing’s air pollution problem and its detrimental impacts, it’s clear that something must be done and fast! Because the Beijing Municipal Government and IBM believe that a workable solution can be found in predictive analytics, they’ve recently teamed together in a project called Green Horizon. This 10-year project is based in Beijing, China and is led by IBM Research-China. Below-listed are just a few of the ways that predictive insights could realistically contribute towards a local solution for Beijing’s air pollution problems.

Before issuance of red alerts, predictive insights can inform government decision-makers about:

  • The increasing rate of pollution-related medical assistance needs and costs
  • Measurable ways in which worker’s lives and health are affected on a routine basis
  • Quantifiable ways in which children’s lives and health are affected on a routine basis
  • The frequency and duration of future red alert periods

For each red alert period, predictive insights can inform government decision-makers about:

  • The number of constructions sites that will be closed, and how those closures will affect the local economy
  • The number of schools that will be closed, and how those closures will affect the local economy
  • The duration of the red alert period
  • Any logistics and supply-chain problems that are likely to occur

Armed with these predictions, government decision-makers can begin making informed decisions on changes that will prevent and diminish the deleterious impact of future air pollution problems within the city.

Big data and statistical analysis are a major part of the solution

According to IBM Research-China, the Green Horizon project focuses on three main areas. Those are:

  • Air Quality Management – The goal of this project is to forecast local air quality in high-resolution and to identify major pollutant emitters. Government decision-makers will then use this information to make data-informed policy decisions that will hopefully result in a net reduction of fine particle levels (aka; Particulate Matter, PM2.5) throughout the city.
  • Renewable Energy Integration – The goal of this project is to aid and assist the Beijing Municipal Government in its adoption of renewable energy technologies, as a way to decrease the city’s dependency on coal and fossil fuels as an energy source. The overarching intention of this project is, of course, to reduce carbon emissions and support the development of sustainable energy technologies.
  • Industrial Energy Efficiency – The goal of this project is to support the Chinese government in its mission to reduce its carbon intensity up to 45% by the year 2020. In support of this mission, Green Horizon is tasked with developing an IOT system that’s capable of monitoring, managing, optimizing, and reducing China’s net energy consumption.

How statistics and analytics are useful in Beijing

Looking closer now at air quality management, there are many ways that statistics, numerical analysis, and advanced analytics are useful in air quality forecasting.

Some relevant methods include:

Selection of predictor variables – Whether your predictor variables are observed or forecasted, in air quality forecasting you can be certain there will be many predictor variables from which to choose. Statistical analyses such as cluster analysis, correlation analysis, and step-wise regression are important when attempting to choose the best predictor variables. When selecting PM 2.5 predictor variables, you’ll want to consider things like the surface wind speed, precipitation, relative humidity, and the day of the week when measurements were taken.

Classified and Regression Tree (CART) Analysis – Use CART to build decision trees that you can use to predict air pollution levels based on weather-related predictor variables.

Regression equations – Regression equations are useful for describing relationships between pollutant levels and predictor variables. Step-wise linear regression is a commonly used regression method in air quality forecasting.

Neural nets – Neural network analysis is another excellent method by which to forecast air quality metrics from complex pollutant datasets and well-chosen predictor variables.

Data assimilation methods – Data assimilation methods are useful for generating accurate estimates for initial conditions, that in turn lead to highly accurate air quality forecasting results. In Green Horizons, researcher assimilated datasets that described local weather, emissions levels, satellite imagery, and geographical data.

Adaptive machine learning – Researchers with Green Horizons also deployed innovative, “cognitive” machine learning methods to generate the most accurate air quality forecasts for each of the varying conditions. In this advanced form of machine learning, many different models were tested, including the Weather Research and Forecasting (WRF) model, the Community Multi-scale Air Quality (CMAQ) model, the Comprehensive Air Quality Model with Extension (CAMx), and the Weather Research and Forecasting with Chemistry (WRF-CHEM) model. Whichever model performed best for each specific situation was then selected and used to forecast air quality given a set of initial conditions.

As a result of Green Horizon’s air quality forecasting success, officials with Beijing Municipal Government have been able to model and evaluate various scenarios, in order to better understand and predict the far-reaching implications involved in any policy-decisions they might make.


China air pollution far worse than thought: Study. CNBC. 18 Aug, 2015.

Beijing's smog problem is even worse than you think. CNBC. 8 Dec, 2015.

China Air Pollution Kills 4,000 People a Day: Researchers. Bloomberg. 14 Aug, 2015.

Much of Beijing shuts down after first red alert for hazardous air pollution. Q13 FOX. 7 Dec, 2015.

Section 12, Air Quality Forecasting Tools. World Meteorological Organization.

Air Pollution and Cigarette Equivalence. Berkeley Earth. 2016.

Green Horizon Website. IBM Research – China.

Related Topics

Related Publications

Related Content

Site Footer


This website is provided by John Wiley & Sons Limited, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ (Company No: 00641132, VAT No: 376766987)

Published features on are checked for statistical accuracy by a panel from the European Network for Business and Industrial Statistics (ENBIS)   to whom Wiley and express their gratitude. This panel are: Ron Kenett, David Steinberg, Shirley Coleman, Irena Ograjenšek, Fabrizio Ruggeri, Rainer Göb, Philippe Castagliola, Xavier Tort-Martorell, Bart De Ketelaere, Antonio Pievatolo, Martina Vandebroek, Lance Mitchell, Gilbert Saporta, Helmut Waldl and Stelios Psarakis.