Friday, January 01, 2010

Climate Sciencapades

What is one to do these days with regard to climate change except look at the data oneself? And so I've been doing some of that.

I downloaded raw temperature data sets from, then loaded the files into R:

load("") #Data set 0, raw data with some adjustments
load("") #Data set 1, stations at one location combined and adjusted
load("") #Station data

I was especially interested in the heat island effect as a possible cause of modern land surface warming.

First I compared dset0 and dset1 to get an idea about how adjustments were made:

pt = function(sta, giss.dset0, giss.dset1) {
if(![[sta]][1])) {
color = c("red", "green", "blue", "brown", "coral", "darkgreen", "orange", "pink")
substa = ""
for(i in 1:length(giss.dset0[[sta]])) {
substa = paste(substa, names(giss.dset0[[sta]])[i])
plot(giss.dset1[[sta]][[1]][[1]], giss.dset1[[sta]][[1]][[18]],
type="l", main=names(giss.dset1[[sta]]), xlab="Year", ylab="Temp",
sub=substa, xlim=c(1880,2009))

for(i in 1:length(giss.dset0[[sta]])) {
par(col=color[i],lwd = 1)
lines(giss.dset0[[sta]][[i]][[1]], giss.dset0[[sta]][[i]][[18]], lty=i)

for(sta in 1:length(giss.dset1)) {
pt(sta, giss.dset0, giss.dset1)
print(paste(sta,[[2]][[sta]],[[8]][[sta]], sep=" "))

With this you can flip through locations and see how dset0 became dset1. One finds some pretty striking adjustments such as segements of data that were left out and stations that were ignored. No doubt the reasons for doing this are good, but it's hard to figure out what the reasons are from the data. For example:

The black plot is the data included in dset1. The colored plots are the raw data from individual stations in the same area that were combined to produce dset1. Note that the data from the third station, from about 1960 to 1985, was not included in dset1. Why? There were a number of examples of this in other locations.

Another example:

The readings of the two stations in the 1890 to 1975 period were adjusted upward by about 1 degree to produced the dset1 data. What justifies this?

In the final analysis it's the overall effect that's important. We are interested in whether or not the planet itself is getting warmer and by how much. Do these adjustments affect that measurement?

I'm mainly interested in trends, and so I dispensed with gridding of the data. This is valid since we are comparing the same locations with each other in the different sets. My simple minded approach was to average all the data in dset0 by years and compare that to dset1.

#First get mean anomalies for dset0.
#Calculate anomalies as annual mean temps minus the mean of all temps
#for a given location. Throw out locations with less than 30 good

for(i in 1:length(giss.dset0))
for(j in 1:length(giss.dset0[[i]])) if([[i]])[[j]]) next else
m = mean(giss.dset0[[i]][[j]]$ANN, na.rm=TRUE)
giss.dset0[[i]][[j]]$anom = giss.dset0[[i]][[j]]$ANN-m

years=0; anoms=0
for(i in 1:length(giss.dset0))
for(j in 1:length(giss.dset0[[i]])) if([[i]])[[j]]) next else
if(length(giss.dset0[[i]][[j]]$anom)-sum([[i]][[j]]$anom)) < 30) next else
if(length(years)==1) {
years = giss.dset0[[i]][[j]]$year
anoms = giss.dset0[[i]][[j]]$anom
else {
years = c(years, giss.dset0[[i]][[j]]$year)
anoms = c(anoms, giss.dset0[[i]][[j]]$anom)
dset0.t = data.frame("year"=years, "anom"=anoms)
dset0.means = tapply(dset0.t$anom, dset0.t$year, mean, na.rm=TRUE)

And then to dset1:

#Reorganize the data into a simpler data table and include location info
#for future comparison of rural vs urban stations.
#Put the code in a function for debugging in R.

extr1 = function(giss.dset1, {
id = 0; year=0; temp=0; ur = 0; li = length(giss.dset1)
for (i in 1:li) if(sum([[i]])) > 0) next else {
for(j in 1:length(giss.dset1[[i]][[1]][[1]])) {
id = c(id,[[2]][[i]])
ur = c(ur,[[8]][[i]])
year = c(year, giss.dset1[[i]][[1]][[1]])
temp = c(temp, giss.dset1[[i]][[1]][[18]])
return(data.frame("id"=id, "ur"=ur, "year"=year, "temp"=temp))

dset1.t = extr1(giss.dset1,

#Calculate anomaly as annual temp minus mean of all temps
dset1.s = split(dset1.t, dset1.t[[1]])
for(i in 1:length(dset1.s)){
t = mean(dset1.s[[i]][[4]], na.rm=TRUE)
dset1.s[[i]]$anom = dset1.s[[i]]$temp - t
dset1.a = unsplit(dset1.s, dset1.t[[1]])
dset1.means = tapply(dset1.means$anom, dset1.means$year, mean, na.rm=TRUE)

#Comparison plot.
plot(names(dset0.means), dset0.means, type="l", xlab="Year", ylab="degrees C",
main="Data Set1 vs Data Set 0", lwd=3)
abline(coef(line(names(dset0.means), dset0.means)))
lines(names(dset1.means), dset1.means, lty=2,lwd=3)
abline(coef(line(names(dset1.means), dset1.means)))

And voila!

There is no significant difference between the sets. A very simple averaging of data collected from various stations gives the same plot as an adjusted data set made up of stations painstakingly combined together to represent a given location. At least this is the case where determining anomalies is concerned.

As for the urban effect on temperatures:

This is dset1. The black plot is all of dset1. The red plot is urban locations and the blue plot is rural locations. As you can see there is no difference to speak of in this adjusted data set between urban stations and rural ones. The plots and regression lines all fall on each other for the most part.

The next step in the processing and adjustment of this data is to "grid" it, meaning that an effort is made to determine measurments over the whole surface of the earth in grid squares of a few degrees each. Hansen spends considerable effort making adjustments for urban effects, but it looks like those adjustments are not necessary.