Does religious adherence affect views on the environment ?

In The United States, public opinion is viewed along a left-right spectrum. We expect certain beliefs to coexist, for eg: Denial of climate change, adherence to a certain faith. The objective is to test out this theory by using real world data across US counties. Is opposition to prioritizing the environment over the economy, correlated with adherence to evangelical Christianity. Geopandas and Leaflet are used to read in county boundaries as polygons and plot them on a map of the world.

Disclaimer: The objective of this analysis is not to cast adherents of any faith in a particular light but to test out commonly held notions about the coexistence of such beliefs.

There are three plots in the analysis, on a 2*2 grid. The first and third plots are intended to give a geographical visualization of the variables (percentage of respondents opposed to prioritizing the environment over the economy) and (number of adherents of evangelical Christianity per thousand residents).

The third plot is a scatter, of these two variables, with each point representing a county. From the geo plots we can see that evangelical Christianity is particularly popular in certain regions of the United States, particularly the southeast. Opposition to giving priority to the environment is however more dispersed geographically. though the southeast states show heavy opposition too. From the scatter, we can see that, though low and high rates of opposition to prioritizing the environment are found across all counties, counties with higher evangelical Christian adherents only show high rates of opposition. This is a nuance uncovered by the analysis, opposition to environmentalism comes from those with other beliefs as well, though there is scarce support for environmentalism in counties with a high rate of evangelical Christianity adherents. Note: a few counties have more than thousand adherents per 1000 residents, among other reasons this could be because of a large city situated at a county boundary, where residents of the city cross the boundary to worship.

Sources for Data:

1) County polygons, United States
https://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_050_00_500k.zip

2) Climate Change Opinions – Yale climate opinion maps 2018
Variable used- prienvOPP
https://climatecommunication.yale.edu/visualizations-data/ycom-us-2018/?est=happening&type=value&geo=county

3) Religiosity –  The association of religious data, archives, US Religious Census, Religious congregation and membership study 2010. Variable used: EVANRATE 
http://www.thearda.com/Archive/Files/Downloads/RCMSCY10_DL2.asp

4) US county FIPS codes
https://www.nrcs.usda.gov/wps/portal/nrcs/detail/national/home/?cid=nrcs143_013697

5) US state abbreviations – 2 digit
https://cdiac.ess-dive.lbl.gov/climate/temp/us_recordtemps/states.html

Central limit theorem

(I talk about little insights or aha moments I’ve had while learning concepts, the concepts themselves may be learned from sources far wiser than me, so I do not try to be comprehensive, instead I prod you to think by presenting the crisp little joyful moments of clarity I’ve had and invite corrections of my thought process)

Talking about the central limit theorem, I encountered this theorem many times while studying probability and statistics, without quite understanding it and as a result having a fundamental lack of clarity when it came to hypothesis testing. Why are we using the normal distribution to talk about average number of heads in a series of coin tosses? What is so ‘normal’ about tossing a coin. What about those light bulb failure rates? Why are they so faulty and how do I know they all fall in a bell curve, maybe the distribution of time to failure looks like a dinosaur tail, why a bell curve? Maybe I should just get a beer.

So today, we’ll understand a few things about the central limit theorem, twiddle around with it, with our own hands, and as a result understand a thing or two about hypothesis testing. There are many versions of this theorem, but I will restrict this discussion to the classical central limit theorem which talks about the mean of independently and identically distributed random variables. For a large enough number of such random variables, their mean will approach a normal distribution.

Before talking about what the parameters of the distribution would be, I’ll talk about the beauty of this which makes it so applicable to a wide range of problems. Remember the dinosaur tail looking distribution of time to failure for light bulbs? That may actually be so! but if I sample enough such light bulbs, the mean of their failure times, will lead to a normal distribution. The same with the average number of heads in a sample of coin tosses. You can see at once, how the convergence of all these distributions into the normal distribution is at once, frightfully wonderful and useful.

To be a little more specific. If we sample from a distribution any probability distribution, with mean  \mu  variance  \sigma^2 , then as the sample size  n  increases, the mean of the sample tends to a normal distribution with a mean  \mu  and variance  (\sigma ^2) / n

So we already get an idea of how this may be useful in testing hypotheses, given that the normal distribution is well understood (as compared to dino tails) but before delving into that. Let us play around with what we know. Observe, tinker, be silly. The jupyter notebook in the link below allows you to simulate the toss of a coin and observe how for larger sample sizes, the number of heads in a sample approximates to the well known bell curve. (The distribution of the sum of heads in a sample approaches a normal distribution as the sum is a constant times the mean. This concept, called the normal approximation to the binomial distribution can be explored in detail in the sources below.)

Press the play button on the left of the notebook cell to run the tool and observe the animation.

(Opens in a new tab, give it a bit to load the environment)

Coin Toss Notebook

Misleading Through Charts and Graphs – How you are made to buy organic food and sold other scams.

(Alberto Cairo’s paper Graphics Lies, Misleading Visuals Reflections on the Challenges and Pitfalls of Evidence-Driven Visual Communication gave guidance to the below analysis)

Humans love visual representation of data. A computer may look at long rows of data, or unstructured data even, and draw insights from it. For us humans though, that information needs to be presented as graphics we can understand, often with various shapes and colors added to drive home a key point. While I’m all for making information and trends visually insightful to humans, we must proceed with caution as often such representations can be misleading or downright dishonest. I highly recommend reading Cairo’s paper to gain a deeper understanding of this problem.
Here, I’d like to provide a quick analysis of a graph I saw on a medium article titled ‘Why We Need to Recognize and Consider Organic Foods’ .

[1] https://medium.com/@mcmahonadam2/why-we-need-to-recognize-and-consider-organic-foods-f127f69261df

I’m leaving out the statistical information on the top of the graph, including debates on the relevance of p values and R square goodness of fit values, or even the fact that correlation doesn’t imply causation, to focus simply on the visual deception of the graphic.

The deceptive tricks used fall into two categories:

  1. Too much data is represented to obscure reality
  2. Using graphic forms in inappropriate ways.
Too much data is represented to obscure reality

The graph proclaims to plot two different correlations:

between glyphosate usage and death rates from end stage renal disease

between the percentage of US corn and soy crops that are GE and death rates from end stage renal disease.

What does it show in reality though – Three data time series superimposed on each other at the same time.

Note how the x axis is time, meaning the graph doesn’t show the correlation between any two series, instead it  simply shows how three different series of data are correlated with time!

Need I point out how the series all start at different points in time. For eg: Death rates from renal disease are plotted from 1985 to 1991 even though there is no information plotted about the supposedly causal glyphosate usage and percentage of soy and corn crops that are GE.

Using graphic forms in inappropriate ways.

Now look at the Y axes.

For one, they are both truncated, also why are there two axes ? Is there a third axis for the % GE Soy and Corn series.( btw how does the same percentage apply for soy and corn)

Truncating the  Y axis helps to magnify and hence distort the magnitude of change in a series.

For a series(40,50) let’s say if the y axis is truncated at 40, the point with value 50 would look like  infinite growth from the previous point!

Including multiple y axes in data is a way to suggest correlations or superimpositions in values that don’t really exist. If I’m allowed to change the scale of the y axis and its origin, I can make almost any two series look like they correlate.

To illustrate, I constructed two series of numbers random 1 and random 2, with 1 data point each from 1991 to 2009, both series are the sum of a random number and a linear time trend.

In the above figure, the two series are plotted against time, with a common Y axis starting at the origin 0.

Above, I’ve included two y axes with truncated origins.

Hid some of the values of Random1 above, overall suggesting to a user at a first glance that the sudden occurrence of the blue line caused the changes in the orange line.

So, in conclusion, graphs are great, but they are worth pondering over beyond the initial aha moment they might create in us.