Since news came out that Dick Cheney got a heart transplant there has been an eruption of hatred against him from our ever civil cousins on the left.
The funny thing about Cheney is that almost all of the major charges against him made by the left turned out to be false.
We know from two Congressional commissions that he did not "cook the books" on Iraq intelligence. He didn't lie about Iraq. He didn't have untoward influence on the decision to go to Iraq. He and Bush were just saying what everyone including the Democrats, including Clinton, knew to be true about Iraq, which is that they had chemical and biological weapons and wanted to restart a program for nuclear weapons.
The intelligence on Iraq came up through the CIA via normal channels and was vetted by the director of the CIA and published in the NIE as usual. We can read the executive summary of that NIE if we wish.
Moreover, Cheney is responsible for putting the US on track to energy independence, which we are gaining by degrees despite the efforts of the Obama administration to limit oil production by denying permits to Federal land, slow rolling offshore permits, and blocking the Keystone XL pipeline.
And yet they all still hate Cheney's guts.
Reality based? I don't think so.
Sunday, March 25, 2012
Sunday, March 11, 2012
Totalitarianism: All the Rage
Totalitarianism is becoming more popular in the West. Australia is discussing the implementation of a "News Media Council", which, like the Ministry of Truth, would have the power to reprimand news organizations that it thought were being "irresponsible." The American left is forever advocating various forms of restriction of speech in the name of tolerance. The term "hate speech" has been expanded to cover any form of speech critical to them or things they like. In Canada a bureau with the Orwellian name of "Human Rights Commission" has been prosecuting people who run afoul of dicates against speech critical of certain ethnic groups, especially Islam. News organizations have been sympathetic to Muslim insistance that Islam not be criticized, and so they self-censor in a way that they will not to protect Christianity and other religions.
One gets the impression that many of these people long for the regularity and regimentation of the totalitarian society. Commentators such as Thomas Friedman at the New York Times wax eloquent in their praise of the Chinese Communist government that can get things done without the obstructions thrown up by democracy. Over at The Nation magazine when they publish stories about free speech it is often about the "right" of left wing mobs to occupy buildings and disrupt business and services on a campus or some other venue. The "right" most of all to shut down the free speech of others. The "right" in other words to use intimidation and even violence in the name of some leftist cause. The "right" to act like a bunch of brownshirts.
It never seems to dawn on them that their own concerns and passions might be the ones crushed by such a regime. Many of them are so used to being in the elite that they can't imagine it otherwise. They did whatever it takes to become one of the elite, they will do whatever it takes to stay in the elite, but what they don't realize it that it might not be in their hands, that things like identity can be used to exclude them. For example, in Cambodia it was simple markers of identity, the possession of reading glasses, the possession of books, that decided who would go to the killing fields.
One gets the impression that many of these people long for the regularity and regimentation of the totalitarian society. Commentators such as Thomas Friedman at the New York Times wax eloquent in their praise of the Chinese Communist government that can get things done without the obstructions thrown up by democracy. Over at The Nation magazine when they publish stories about free speech it is often about the "right" of left wing mobs to occupy buildings and disrupt business and services on a campus or some other venue. The "right" most of all to shut down the free speech of others. The "right" in other words to use intimidation and even violence in the name of some leftist cause. The "right" to act like a bunch of brownshirts.
It never seems to dawn on them that their own concerns and passions might be the ones crushed by such a regime. Many of them are so used to being in the elite that they can't imagine it otherwise. They did whatever it takes to become one of the elite, they will do whatever it takes to stay in the elite, but what they don't realize it that it might not be in their hands, that things like identity can be used to exclude them. For example, in Cambodia it was simple markers of identity, the possession of reading glasses, the possession of books, that decided who would go to the killing fields.
Tuesday, June 14, 2011
Frustration
This intense eagerness to get Sarah Palin for whatever it is she might have done on line unfolded concurrently with all the attention that had to be paid to Anthony Weiner for his on-line antics. What we can’t see is the agony the poor journalists must have endured as Weiner’s wienerisms hurt the Democratic Party just when they stood ready to damage the Republican Party with Sarah’s misdeeds. And then — oh, how awful! — they got nothing from Sarah. Nothing but hard work and — urrrgghhh! — good government and — damn! — family values.
Palin is even better at confounding her enemies than Bush is. The genius of it is that she makes it look like shear luck. But then she does it again and again and again...
via: instapundit.com
Monday, June 13, 2011
Trig Letter
Among the emails in the Palin document dump is one in which Palin contemplates the birth in a few weeks of her child with Downs Syndrome. The maturity, wisdom, equanimity, and thoughtfulness with which she approaches this is a joy of which to read.
Where's the Warming?
Atmospheric CO2 levels have increased even more than the IPCC predicted in 2001. If their predictions were correct there should have been an even bigger increase in atmospheric temperatures than they predicted, but over the 10 years since their report was published there has been no increase in those temperatures at all.
Why? What is missing?
The core idea of every climate model the IPCC uses is the idea that for every unit of increase of temperature caused by CO2 there will be a total increase of 3 units caused by water vapor. Disagreement over this assumption is the source of all the disagreements and misunderstandings about climate models.
What did they find when they tried to prove that the assumption about water vapor is correct?
If the assumption about water vapor was correct then it should have been possible to detect a hot spot in the troposphere over the tropics. During the last 4 decades weather balloons and satellites have failed to detect any such spot, not even a trace of it.
This means that the assumption about water vapor is incorrect. This first became clear in the mid '90s, and with Aqua satellite data it is even more clear.
And if you plug a more reasonable figure for the water vapor effect into the climate models you get results that, surprise, match what has happened over the last 10 years and over the previous 100 years as well.
Why haven't the scientists we're all familiar with acknowledged this? I suspect it's because if they did they would lose their funding. There is simply no question that they would. Without the disaster scenario of global warming there would be no interest in the field. They'd sink out of sight into obscurity.
Why? What is missing?
The core idea of every climate model the IPCC uses is the idea that for every unit of increase of temperature caused by CO2 there will be a total increase of 3 units caused by water vapor. Disagreement over this assumption is the source of all the disagreements and misunderstandings about climate models.
What did they find when they tried to prove that the assumption about water vapor is correct?
If the assumption about water vapor was correct then it should have been possible to detect a hot spot in the troposphere over the tropics. During the last 4 decades weather balloons and satellites have failed to detect any such spot, not even a trace of it.
This means that the assumption about water vapor is incorrect. This first became clear in the mid '90s, and with Aqua satellite data it is even more clear.
And if you plug a more reasonable figure for the water vapor effect into the climate models you get results that, surprise, match what has happened over the last 10 years and over the previous 100 years as well.
Why haven't the scientists we're all familiar with acknowledged this? I suspect it's because if they did they would lose their funding. There is simply no question that they would. Without the disaster scenario of global warming there would be no interest in the field. They'd sink out of sight into obscurity.
Friday, February 18, 2011
WI Democrats, Unions Don't Like Democracy
From Power Line:
We all know why.
A common theme of the union demonstrators in Madison today was that Governor Walker is a “dictator.” This showed up on sign after sign. It sheds light, I think, on how public union members in particular, and liberals in general, think. What is going on here is that the voters of Wisconsin have elected a Republican Governor and–overwhelmingly–a Republican legislature, precisely so that they can get the state’s budget under control.
What the Democrats don’t like isn’t dictatorship, it is democracy. That is why the Democrats in the Wisconsin Senate fled the state en masse–they prevented a quorum, so that a vote they were going to lose couldn’t take place. Once again, it is democracy they are trying to frustrate, not dictatorship.
One could make the point more broadly about the organized labor movement. The unions’ top priority is to eliminate the secret ballot in union certification elections. Why?
We all know why.
Saturday, January 16, 2010
Sunday, January 10, 2010
The Heat Island Effect: Lost and Found
As I've demonstrated (to my own satisfaction, at least) the strange, arcane adjustments for HIE made by CRU and GISS are not even necessary. One gets the same trends whether they are made(GISS Data Set 2) or not (GISS Data Set 1) if an unbiased and well understood method for determining temperature anomalies is used. What exactly they are doing with those adjustments is up in the air, but the question remains to what extent the HIE is affecting the results that we are seeing for global temperatures.
Analysis of the GISS station data had begun to make me think that the heat island effect did not exist. But yesterday I was looking over station data from the local Houston area, and it did seem that there was an obvious HIE. Individual stations in the heart of the city, inside the loop, were reading about 3 or 4 degrees above those out in the suburbs and beyond. It was striking to me, though, how quickly the readings came back down to match those of the surrounding countryside getting even a little away from the city center.
Looking at other cities I found the same thing. wunderderground.com enables one to look at dozens of individual stations in an area simultaneously. Note how in Chicago the temperature readings decline as one gets away from the city center:
And here is the New York City area:
Compare to the surrounding regions:
Note the two stations in the middle of Manhattan. They are located 500 feet up.
Unless stations were actually within the city they would not detect the effect.
So, looking back over the maps of GISS station locations, it hit me that the vast majority of those stations marked as urban are not urban at all, at least not to the extent that was seemingly needed to find a HIE. All this time the suspicion has been that the stations marked rural are too urbanized when it could be exactly the opposite. Those marked urban are often not that urban.
Thus, I selected stations from the 85 largest urban centers in the world. In order the qualify the station had to be within the urban area, urban sprawl for 0.1 degrees latitude or longitude in every direction from the designated station location (which is provided to the nearest tenth of a degree). The temperature anomalies for these, as determined by the statistical algorithm used in previous posts, is compared to the published GISS data below:
To make a long story short, as more cities are added in to the list of city stations included in the analysis above, from smaller and smaller cities, the trend line comes down to be much closer to that of the GISS data. And this is still well short, only about 10%, of the stations listed as urban in the giss.dset1 data. As I've shown previously, if all the "urban" stations are included the HIE disappears.
Analysis of the GISS station data had begun to make me think that the heat island effect did not exist. But yesterday I was looking over station data from the local Houston area, and it did seem that there was an obvious HIE. Individual stations in the heart of the city, inside the loop, were reading about 3 or 4 degrees above those out in the suburbs and beyond. It was striking to me, though, how quickly the readings came back down to match those of the surrounding countryside getting even a little away from the city center.
Looking at other cities I found the same thing. wunderderground.com enables one to look at dozens of individual stations in an area simultaneously. Note how in Chicago the temperature readings decline as one gets away from the city center:
And here is the New York City area:
Compare to the surrounding regions:
Note the two stations in the middle of Manhattan. They are located 500 feet up.
Unless stations were actually within the city they would not detect the effect.
So, looking back over the maps of GISS station locations, it hit me that the vast majority of those stations marked as urban are not urban at all, at least not to the extent that was seemingly needed to find a HIE. All this time the suspicion has been that the stations marked rural are too urbanized when it could be exactly the opposite. Those marked urban are often not that urban.
Thus, I selected stations from the 85 largest urban centers in the world. In order the qualify the station had to be within the urban area, urban sprawl for 0.1 degrees latitude or longitude in every direction from the designated station location (which is provided to the nearest tenth of a degree). The temperature anomalies for these, as determined by the statistical algorithm used in previous posts, is compared to the published GISS data below:
To make a long story short, as more cities are added in to the list of city stations included in the analysis above, from smaller and smaller cities, the trend line comes down to be much closer to that of the GISS data. And this is still well short, only about 10%, of the stations listed as urban in the giss.dset1 data. As I've shown previously, if all the "urban" stations are included the HIE disappears.
Saturday, January 09, 2010
Population Density and Global Warming
It has often been observed that the center of a city is warmer than the surrounding countryside. Thus the claim has been made that global warming is an error due to weather stations gradually being taken over by urbanization. As detailed in previous posts, I've been unable to demonstrate that from the weather station data available from NASA.
The hypothesis I address on this occasion is that idea that population density in a country could be used as a proxy of urbanization. If so, then there should be a correlation between population density change over time and the temperature in that region.
I used population density figures from the CIA World Fact Book and assumed an exponential decline in density from that time depending on time:
D = d*((100-g)/100)*exp((year-2008)/127)
where D = population density in a given year
d is the population density in 2008
g is the %growth in population in 2008
year is the year of measurement.
Thus D decreases in a log linear fashion over time proportional to the rate of growth observed in 2008, an attempt to mimick population growth.
Plugging this group into the random effects statistical algorithm used in previous posts, I found no relationship between D and temperature in 60 randomly selected countries:
The x axis is the log of D (population density) and the y axis is degrees C.
No effect of population density over time emerges with this analysis.
The hypothesis I address on this occasion is that idea that population density in a country could be used as a proxy of urbanization. If so, then there should be a correlation between population density change over time and the temperature in that region.
I used population density figures from the CIA World Fact Book and assumed an exponential decline in density from that time depending on time:
D = d*((100-g)/100)*exp((year-2008)/127)
where D = population density in a given year
d is the population density in 2008
g is the %growth in population in 2008
year is the year of measurement.
Thus D decreases in a log linear fashion over time proportional to the rate of growth observed in 2008, an attempt to mimick population growth.
Plugging this group into the random effects statistical algorithm used in previous posts, I found no relationship between D and temperature in 60 randomly selected countries:
The x axis is the log of D (population density) and the y axis is degrees C.
No effect of population density over time emerges with this analysis.
Friday, January 08, 2010
Seeking A Heat Island Effect
Continuing on with the data mining of the GISSTEMP material, I have been looking for evidence of a heat island effect, and my conclusion is that it won't be found in this data.
Since my own analysis didn't show a robust HIE, I looked for bias in the dset1 data.
I thought it might be due to urban records being longer than rural ones and therefore dominating the data set. This did not prove out. The average length of rural and urban records is about the same.
I thought it might be inaccurate rural, suburban, and urban designations. But a review of 300 rural sites selected at random did not reveal a single obvious misclassification.
No matter how I parsed and combed through this data I was getting no significant difference in rural and urban stations. The best I've done is to see a 1.8 degree difference in the aggregate analysis that I've posted previously, and that does not necessarily represent a trend.
So, what are the possible explanations?
Here is my list of guesses:
1) The database is contaminated with thermometers that are sitting on asphalt or next to air conditioner compressors that are purported to be in rural locations.
2) The population density in the areas of the USA where most of the thermometers are located (the Northeast) has been flat or declining since 1950. Therefore, a grouping of U, R, or S or brightness in one cross-section in the database does not adequately represent the situation.
I don't have lot of enthusiasm for working through this possibility. It seems to me that even if the U, R, and S designations in the GISS data had errors that there would still be some differences emerging.
For what it's worth, here are the results of comparing locations that are designated rural and are not designated as airport locations versus all the rest of the locations using the random effects statistical algorithm described in previous posts:
These are not the results I hoped for. But hiding them would make me no better than a climate scientist.
Since my own analysis didn't show a robust HIE, I looked for bias in the dset1 data.
I thought it might be due to urban records being longer than rural ones and therefore dominating the data set. This did not prove out. The average length of rural and urban records is about the same.
I thought it might be inaccurate rural, suburban, and urban designations. But a review of 300 rural sites selected at random did not reveal a single obvious misclassification.
No matter how I parsed and combed through this data I was getting no significant difference in rural and urban stations. The best I've done is to see a 1.8 degree difference in the aggregate analysis that I've posted previously, and that does not necessarily represent a trend.
So, what are the possible explanations?
Here is my list of guesses:
1) The database is contaminated with thermometers that are sitting on asphalt or next to air conditioner compressors that are purported to be in rural locations.
2) The population density in the areas of the USA where most of the thermometers are located (the Northeast) has been flat or declining since 1950. Therefore, a grouping of U, R, or S or brightness in one cross-section in the database does not adequately represent the situation.
I don't have lot of enthusiasm for working through this possibility. It seems to me that even if the U, R, and S designations in the GISS data had errors that there would still be some differences emerging.
For what it's worth, here are the results of comparing locations that are designated rural and are not designated as airport locations versus all the rest of the locations using the random effects statistical algorithm described in previous posts:
These are not the results I hoped for. But hiding them would make me no better than a climate scientist.
Wednesday, January 06, 2010
Tuesday, January 05, 2010
Climate Science
As Steve McIntyre has so brilliantly shown, Jim Hansen's methods for processing station temperatures into global temperatures don't really make sense.
The burning question for me therefore has been: If the stick guys' methods don't make sense and perhaps introduce bias, then what do we get when proper methods are used? What, exactly, is the truth of the matter?
And so I've worked through Steve's postings and the writings of others trying to find out what a proper method might be.
In a post from 18 months ago, Steve pointed out that the job of processing this data and eliminating the effects of various factors to come up with an estimation of the temperature anomalies over time could be done using well understood mixed effects statistical modeling algorithms. I thought it rather astounding that a statistical method seemingly completely different from Hansen's methods would come up with much the same results. Steve used this in modeling the combination of location data into grid cells, but why not apply it to the whole data set?
And so I tried that.
giss.dset0, you may recall is temperature data taken from different weather stations. In some cases these stations are all together in one location, as in 4 or 5 stations in a given city area. Hansen first combined these stations together for a location, then he did a lot of complicated smoothing and adjustment of the data in a series of steps. dset0 is raw data, dset1 includes some adjustments and combination of data of stations at given locations, dset2 includes more adjustments within a given location, and gridding of the data does more of the same, smoothing data over geometrical areas, between locations. With all of the adjusting and smoothing going on the introduction of some sort of bias seems likely. For example, Hansen always used the longest records to start with and then increased or decreased the shorter records to match the longer. But aren't the longer records usually from urban stations? Would this not cement the heat island effect into the combined location data?
And so, working in the R Project statistical package and following McIntyre's lead, I first reorganized the data in giss.dset0 into a simple long data frame with location information folded in from giss.info.
extr = function(giss.dset0, giss.info) {
year = 0; temp = 0; alt=0; ur=0; loc=0; lxs=0; sta=0
Ni = (1:length(giss.dset0))[!is.na(giss.dset0)]
for(i in Ni) #if(sum(is.na(giss.dset0[[i]])) > 0) next else
for(j in 1:length(giss.dset0[[i]])) if(is.na(giss.dset0[[i]][[j]])) next else
{
N = length(giss.dset0[[i]][[j]][[1]])
sta = c(sta, rep(names(giss.dset0[[i]])[j],N))
year = c(year, giss.dset0[[i]][[j]][[1]])
temp = c(temp, giss.dset0[[i]][[j]][[18]])
ur = c(ur, rep(giss.info$urban[i],N))
alt = c(alt, rep(250*(giss.info$alt.interp[i]%/%250),N))
tmp=substr(names(giss.dset0[[i]])[j],1,8)
loc = c(loc, rep(tmp, N))
lxs = c(lxs, rep(paste(tmp, "i", names(giss.dset0[[i]])[j],sep=""),N))
}
sta=sta[-1]; year=year[-1]; temp=temp[-1]; alt=alt[-1]; loc=loc[-1]
ur = ur[-1]; lxs=lxs[-1]
return(data.frame(loc=loc,sta=sta,year=year,ur=ur,alt=alt,lxs=lxs,temp=temp))
}
dset0.t = extr(giss.dset0, giss.info)
(No, I could not use monthly data without running out of memory.)
Then calculate random effects:
(load the lme4 package)
dset0.fm = lmer(temp~(1|loc)+(1|sta)+(1|ur)+(1|alt)+(1|year), data=dset0.t)
dset0.fm1 = lmer(temp~(1|sta)+(1|year), data=dset0.t)
(loc = location id, sta=station id with location, ur=urban vs suburban vs rural, alt= location altitude in 250 meter bands)
ranef(dset0.fm)$year and ranef(dset0.fm1)$year contain the random effects year over year. The temperature anomaly for each year, in other words. Most of the variance is dumped into other groups. Using more groupings in the model formula makes no difference in these estimates (dset0.fm vs dset0.fm1 R-squared = 0.999).
Here is your heat island effect:
> ranef(dset0.fm)$ur
(Intercept)
R -0.23716400 (rural)
S -0.04312743 (suburban)
U 0.28028908 (urban)
And here is the effect of altitude:
> ranef(dset0.fm)$alt
(Intercept)
0 10.0821297
250 9.5952519
500 8.2160810
750 7.4049196
1000 6.4774994
1250 5.4415439
1500 4.2931860
1750 3.4352673
2000 1.7974976
2250 -0.5588156
2500 -1.6522236
2750 -5.5811832
3000 -5.2383055
3250 -13.6373919
3500 -8.7024097
3750 -13.8671371
4000 1.2833311
4250 -6.0728011
4500 -3.3266110
4750 0.8495323
5000 -1.0044707
5750 -0.4843320
What that yearly random effect looks like compared to the GISSTEMP book numbers:
And here is the difference between the two:
R-squared = 0.73. Close enough for government work. The slope coefficients are nearly identical (a 0.00035 per year difference). And I'm sure that a statistician would do much better. Much of what Hansen did in step 1 and 2 and in gridding the data was smoothing. Application of a smoothing algorithm would no doubt take out the variation in the random effects estimation I got, but that would not improve the fidelity of the results.
dset0 annual averages are as close to the raw data as I can get without running out of computer memory. I went from that data to a final result with one command in R.
If I can come up with the same results with no spacial factors at all other than altitude and station grouping then does gridding mean anything? Can the analysis represent anything more than the 25% of the earth's surface the stations occupy?
Hansen is vindicated in a big respect. This shows that his methods, regardless how byzantine, introduce no significant amount of bias where temperature trends are concerned with regard to the station data available. The proof is in the pudding, and all that.
That lack of coverage of the earth's surface in station data, I'm guessing, is the reason for the difference in satellite, radiosonde, and station estimates of global temperature.
The amount of memory and processing power I have in my cheap, clearance priced desktop PC would have been duplicated by a computer the size of a city block, if at all, back in the days Hansen started doing this climate business. His multi step process of analyzing station data was perhaps an effort to perform an analysis that would have been impossible using the statistical methods I used here.
Update
Including another grouping factor, latitude in 5 degree bands, has no effect on the temperature anomalies, but it's interesting to look at the random effect for that group:
dset0.fm2 = lmer(temp~(1|ur)+(1|lat)+(1|alt)+(1|sta)+(1|year),
data=dset0.t)
> ranef(dset0.fm2)$lat
(Intercept)
-90 -48.0765255
-85 -30.6208086
-80 -30.5843329
-75 -26.8087508
-70 -17.5837928
-65 -10.6445832
-55 -0.4677458
-50 1.9302840
-45 4.6332613
-40 7.5261542
-35 10.1896612
-30 7.2221295
-25 16.5283583
-20 17.4532507
-15 18.3289638
-10 18.4701555
-5 18.5784181
0 18.7600821
5 19.5108772
10 19.8686941
15 19.7051168
20 17.5278219
25 15.0410194
30 11.2047241
35 7.3644670
40 3.2930061
45 -0.4509212
50 -0.3098546
55 -5.3064547
60 -9.0101544
65 -13.3917199
70 -16.3916299
75 -20.7265894
80 -22.7631131
And including that group gives more consistent altitude and urban estimates:
> ranef(dset0.fm2)$alt
(Intercept)
-1000 7.6009074
0 8.7884377
250 8.2316191
500 5.1136268
750 9.5794809
1000 5.9054092
1250 5.3426933
1500 4.0768528
1750 3.4410112
2000 2.0043036
2250 1.0247369
2500 -0.5251244
2750 -1.8542720
3000 -0.6135464
3250 -8.0035847
3500 -3.3222669
3750 -5.1468110
4000 -5.1510483
4250 -8.3572983
4500 -8.9318326
4750 -5.5925205
5000 -5.9627467
5750 -7.6483137
> ranef(dset0.fm2)$ur
(Intercept)
R -0.89551186
S -0.03941906
U 0.93493366
Here is the summary from lmer():
Linear mixed model fit by REML
Formula: temp ~ (1 | loc) + (1 | sta) + (1 | ur) + (1 | alt) + (1 | year)
Data: dset0.t
AIC BIC logLik deviance REMLdev
1274202 1274280 -637094 1274190 1274188
Random effects:
Groups Name Variance Std.Dev.
sta (Intercept) 0.510624 0.71458
loc (Intercept) 113.129532 10.63624
year (Intercept) 0.141931 0.37674
alt (Intercept) 29.856394 5.46410
ur (Intercept) 0.068179 0.26111
Residual 0.521113 0.72188
Number of obs: 548500, groups: sta, 13483; loc, 4514; year, 127; alt, 23; ur, 3
Fixed effects:
Estimate Std. Error t value
(Intercept) 5.20 1.25 4.159
Linear mixed model fit by REML
Formula: temp ~ (1 | ur) + (1 | lat) + (1 | alt) + (1 | sta) + (1 | year)
Data: dset0.t
AIC BIC logLik deviance REMLdev
1288347 1288426 -644167 1288337 1288333
Random effects:
Groups Name Variance Std.Dev.
sta (Intercept) 11.67732 3.41721
year (Intercept) 0.14162 0.37633
lat (Intercept) 330.71739 18.18564
alt (Intercept) 38.58698 6.21184
ur (Intercept) 0.84184 0.91752
Residual 0.52133 0.72203
Number of obs: 548500, groups: sta, 13483; year, 127; lat, 34; alt, 23; ur, 3
Fixed effects:
Estimate Std. Error t value
(Intercept) -1.544 3.427 -0.451
Update II
I get almost exactly the same results with the model used in the first update when I do the analysis with giss.dset2 (R-squared of temp random effects = 0.97)
The burning question for me therefore has been: If the stick guys' methods don't make sense and perhaps introduce bias, then what do we get when proper methods are used? What, exactly, is the truth of the matter?
And so I've worked through Steve's postings and the writings of others trying to find out what a proper method might be.
In a post from 18 months ago, Steve pointed out that the job of processing this data and eliminating the effects of various factors to come up with an estimation of the temperature anomalies over time could be done using well understood mixed effects statistical modeling algorithms. I thought it rather astounding that a statistical method seemingly completely different from Hansen's methods would come up with much the same results. Steve used this in modeling the combination of location data into grid cells, but why not apply it to the whole data set?
And so I tried that.
giss.dset0, you may recall is temperature data taken from different weather stations. In some cases these stations are all together in one location, as in 4 or 5 stations in a given city area. Hansen first combined these stations together for a location, then he did a lot of complicated smoothing and adjustment of the data in a series of steps. dset0 is raw data, dset1 includes some adjustments and combination of data of stations at given locations, dset2 includes more adjustments within a given location, and gridding of the data does more of the same, smoothing data over geometrical areas, between locations. With all of the adjusting and smoothing going on the introduction of some sort of bias seems likely. For example, Hansen always used the longest records to start with and then increased or decreased the shorter records to match the longer. But aren't the longer records usually from urban stations? Would this not cement the heat island effect into the combined location data?
And so, working in the R Project statistical package and following McIntyre's lead, I first reorganized the data in giss.dset0 into a simple long data frame with location information folded in from giss.info.
extr = function(giss.dset0, giss.info) {
year = 0; temp = 0; alt=0; ur=0; loc=0; lxs=0; sta=0
Ni = (1:length(giss.dset0))[!is.na(giss.dset0)]
for(i in Ni) #if(sum(is.na(giss.dset0[[i]])) > 0) next else
for(j in 1:length(giss.dset0[[i]])) if(is.na(giss.dset0[[i]][[j]])) next else
{
N = length(giss.dset0[[i]][[j]][[1]])
sta = c(sta, rep(names(giss.dset0[[i]])[j],N))
year = c(year, giss.dset0[[i]][[j]][[1]])
temp = c(temp, giss.dset0[[i]][[j]][[18]])
ur = c(ur, rep(giss.info$urban[i],N))
alt = c(alt, rep(250*(giss.info$alt.interp[i]%/%250),N))
tmp=substr(names(giss.dset0[[i]])[j],1,8)
loc = c(loc, rep(tmp, N))
lxs = c(lxs, rep(paste(tmp, "i", names(giss.dset0[[i]])[j],sep=""),N))
}
sta=sta[-1]; year=year[-1]; temp=temp[-1]; alt=alt[-1]; loc=loc[-1]
ur = ur[-1]; lxs=lxs[-1]
return(data.frame(loc=loc,sta=sta,year=year,ur=ur,alt=alt,lxs=lxs,temp=temp))
}
dset0.t = extr(giss.dset0, giss.info)
(No, I could not use monthly data without running out of memory.)
Then calculate random effects:
(load the lme4 package)
dset0.fm = lmer(temp~(1|loc)+(1|sta)+(1|ur)+(1|alt)+(1|year), data=dset0.t)
dset0.fm1 = lmer(temp~(1|sta)+(1|year), data=dset0.t)
(loc = location id, sta=station id with location, ur=urban vs suburban vs rural, alt= location altitude in 250 meter bands)
ranef(dset0.fm)$year and ranef(dset0.fm1)$year contain the random effects year over year. The temperature anomaly for each year, in other words. Most of the variance is dumped into other groups. Using more groupings in the model formula makes no difference in these estimates (dset0.fm vs dset0.fm1 R-squared = 0.999).
Here is your heat island effect:
> ranef(dset0.fm)$ur
(Intercept)
R -0.23716400 (rural)
S -0.04312743 (suburban)
U 0.28028908 (urban)
And here is the effect of altitude:
> ranef(dset0.fm)$alt
(Intercept)
0 10.0821297
250 9.5952519
500 8.2160810
750 7.4049196
1000 6.4774994
1250 5.4415439
1500 4.2931860
1750 3.4352673
2000 1.7974976
2250 -0.5588156
2500 -1.6522236
2750 -5.5811832
3000 -5.2383055
3250 -13.6373919
3500 -8.7024097
3750 -13.8671371
4000 1.2833311
4250 -6.0728011
4500 -3.3266110
4750 0.8495323
5000 -1.0044707
5750 -0.4843320
What that yearly random effect looks like compared to the GISSTEMP book numbers:
And here is the difference between the two:
R-squared = 0.73. Close enough for government work. The slope coefficients are nearly identical (a 0.00035 per year difference). And I'm sure that a statistician would do much better. Much of what Hansen did in step 1 and 2 and in gridding the data was smoothing. Application of a smoothing algorithm would no doubt take out the variation in the random effects estimation I got, but that would not improve the fidelity of the results.
dset0 annual averages are as close to the raw data as I can get without running out of computer memory. I went from that data to a final result with one command in R.
If I can come up with the same results with no spacial factors at all other than altitude and station grouping then does gridding mean anything? Can the analysis represent anything more than the 25% of the earth's surface the stations occupy?
Hansen is vindicated in a big respect. This shows that his methods, regardless how byzantine, introduce no significant amount of bias where temperature trends are concerned with regard to the station data available. The proof is in the pudding, and all that.
That lack of coverage of the earth's surface in station data, I'm guessing, is the reason for the difference in satellite, radiosonde, and station estimates of global temperature.
The amount of memory and processing power I have in my cheap, clearance priced desktop PC would have been duplicated by a computer the size of a city block, if at all, back in the days Hansen started doing this climate business. His multi step process of analyzing station data was perhaps an effort to perform an analysis that would have been impossible using the statistical methods I used here.
Update
Including another grouping factor, latitude in 5 degree bands, has no effect on the temperature anomalies, but it's interesting to look at the random effect for that group:
dset0.fm2 = lmer(temp~(1|ur)+(1|lat)+(1|alt)+(1|sta)+(1|year),
data=dset0.t)
> ranef(dset0.fm2)$lat
(Intercept)
-90 -48.0765255
-85 -30.6208086
-80 -30.5843329
-75 -26.8087508
-70 -17.5837928
-65 -10.6445832
-55 -0.4677458
-50 1.9302840
-45 4.6332613
-40 7.5261542
-35 10.1896612
-30 7.2221295
-25 16.5283583
-20 17.4532507
-15 18.3289638
-10 18.4701555
-5 18.5784181
0 18.7600821
5 19.5108772
10 19.8686941
15 19.7051168
20 17.5278219
25 15.0410194
30 11.2047241
35 7.3644670
40 3.2930061
45 -0.4509212
50 -0.3098546
55 -5.3064547
60 -9.0101544
65 -13.3917199
70 -16.3916299
75 -20.7265894
80 -22.7631131
And including that group gives more consistent altitude and urban estimates:
> ranef(dset0.fm2)$alt
(Intercept)
-1000 7.6009074
0 8.7884377
250 8.2316191
500 5.1136268
750 9.5794809
1000 5.9054092
1250 5.3426933
1500 4.0768528
1750 3.4410112
2000 2.0043036
2250 1.0247369
2500 -0.5251244
2750 -1.8542720
3000 -0.6135464
3250 -8.0035847
3500 -3.3222669
3750 -5.1468110
4000 -5.1510483
4250 -8.3572983
4500 -8.9318326
4750 -5.5925205
5000 -5.9627467
5750 -7.6483137
> ranef(dset0.fm2)$ur
(Intercept)
R -0.89551186
S -0.03941906
U 0.93493366
Here is the summary from lmer():
Linear mixed model fit by REML
Formula: temp ~ (1 | loc) + (1 | sta) + (1 | ur) + (1 | alt) + (1 | year)
Data: dset0.t
AIC BIC logLik deviance REMLdev
1274202 1274280 -637094 1274190 1274188
Random effects:
Groups Name Variance Std.Dev.
sta (Intercept) 0.510624 0.71458
loc (Intercept) 113.129532 10.63624
year (Intercept) 0.141931 0.37674
alt (Intercept) 29.856394 5.46410
ur (Intercept) 0.068179 0.26111
Residual 0.521113 0.72188
Number of obs: 548500, groups: sta, 13483; loc, 4514; year, 127; alt, 23; ur, 3
Fixed effects:
Estimate Std. Error t value
(Intercept) 5.20 1.25 4.159
Linear mixed model fit by REML
Formula: temp ~ (1 | ur) + (1 | lat) + (1 | alt) + (1 | sta) + (1 | year)
Data: dset0.t
AIC BIC logLik deviance REMLdev
1288347 1288426 -644167 1288337 1288333
Random effects:
Groups Name Variance Std.Dev.
sta (Intercept) 11.67732 3.41721
year (Intercept) 0.14162 0.37633
lat (Intercept) 330.71739 18.18564
alt (Intercept) 38.58698 6.21184
ur (Intercept) 0.84184 0.91752
Residual 0.52133 0.72203
Number of obs: 548500, groups: sta, 13483; year, 127; lat, 34; alt, 23; ur, 3
Fixed effects:
Estimate Std. Error t value
(Intercept) -1.544 3.427 -0.451
Update II
I get almost exactly the same results with the model used in the first update when I do the analysis with giss.dset2 (R-squared of temp random effects = 0.97)
Sunday, January 03, 2010
Climate Sciencapades IV
Someone told me back during the days when I took the anthropogenic global warming claims at face value that if I wanted to completely lose faith in the climate models I should take a look at them. And so I did, and he was right.
I looked at the code for the model used by GISS to do the predictions for the IPCC report of 2001. What is striking is how crude the model is. It divides the earth into a grid with about 8000 squares. The continental US is therefore represented by a 16 by 36 grid. Then there is what is not in the model. There's no way to model clouds and rain, so they have to make a guess at what those will be doing in response to changes in other factors. Consider that a 2% change in average cloud cover would completely negate any warming due to CO2 doubling. And the natural variability of cloud cover year to year and decade to decade is much larger than 2 percent.
The assumption of the modelers has been that a slight increase in temperature due to CO2 would be magnified by the resulting increase in water vapor in the air. But that might also result in an increase in cloud formation, which would offset the water vapor feedback.
Satellite measurements of clouds, water vapor, and temperature in the tropics show cycles. Heat forcing increases, temperatures increase, water vapor increases, and then clouds increase resulting, apparently, in a fall in temps. Rinse and repeat.
And the distribution of water vapor in the atmosphere is not uniform. Even over the open oceans the humidity is less than 100% most of the time, and it varies constantly in response to other factors. Satellite pictures of water vapor show that it swirls around in a more or less random fashion in the atmosphere. And CO2 does the same thing. Simple assumptions about these things won't do. And the models have a host of other fudge factors that can be tweaked endlessly until the model coughs up the desired results. The agreement between various modelers is not evidence of the validity of the results.
In any case, I have no faith in the computer models, having seen how they work.
More recently I became interested in surface temperature measurements, as reflected in previous posts. I wanted to know why satellite measurements of global temperatures don't agree with surface measurements. And so I looked at the data from GISS. I didn't appreciate the problem until I started trying to come up with a global average from station measurements. And then I found that the stations only represent about 25% of the earth's surface. I don't care how clever you are with statistics, there is no way in hell that you can come up with an accurate measurement of global temperatures if you're not even looking at three fourths of the globe, and that's really all I need to explain the difference between surface and satellite results. Trends can be determined by using a limited set of measurements, but as one area goes up another might be going down, as we have often found is the case.
Therefore, the four alarm increases in temperatures, according to GISS, NOAA, and others, over the last 30 years are probably not valid. Much closer to accurate is the satellite data, which shows a much more sedate rate of increase on the order of 0.14 degrees C per decade since 1972. The disagreement between satellite and temperature data has given rise to efforts to reconcile the two mainly by "adjusting" the satellite data using computer models(!) among other things. Which is doing it exactly the wrong way around.
I looked at the code for the model used by GISS to do the predictions for the IPCC report of 2001. What is striking is how crude the model is. It divides the earth into a grid with about 8000 squares. The continental US is therefore represented by a 16 by 36 grid. Then there is what is not in the model. There's no way to model clouds and rain, so they have to make a guess at what those will be doing in response to changes in other factors. Consider that a 2% change in average cloud cover would completely negate any warming due to CO2 doubling. And the natural variability of cloud cover year to year and decade to decade is much larger than 2 percent.
The assumption of the modelers has been that a slight increase in temperature due to CO2 would be magnified by the resulting increase in water vapor in the air. But that might also result in an increase in cloud formation, which would offset the water vapor feedback.
Satellite measurements of clouds, water vapor, and temperature in the tropics show cycles. Heat forcing increases, temperatures increase, water vapor increases, and then clouds increase resulting, apparently, in a fall in temps. Rinse and repeat.
And the distribution of water vapor in the atmosphere is not uniform. Even over the open oceans the humidity is less than 100% most of the time, and it varies constantly in response to other factors. Satellite pictures of water vapor show that it swirls around in a more or less random fashion in the atmosphere. And CO2 does the same thing. Simple assumptions about these things won't do. And the models have a host of other fudge factors that can be tweaked endlessly until the model coughs up the desired results. The agreement between various modelers is not evidence of the validity of the results.
In any case, I have no faith in the computer models, having seen how they work.
More recently I became interested in surface temperature measurements, as reflected in previous posts. I wanted to know why satellite measurements of global temperatures don't agree with surface measurements. And so I looked at the data from GISS. I didn't appreciate the problem until I started trying to come up with a global average from station measurements. And then I found that the stations only represent about 25% of the earth's surface. I don't care how clever you are with statistics, there is no way in hell that you can come up with an accurate measurement of global temperatures if you're not even looking at three fourths of the globe, and that's really all I need to explain the difference between surface and satellite results. Trends can be determined by using a limited set of measurements, but as one area goes up another might be going down, as we have often found is the case.
Therefore, the four alarm increases in temperatures, according to GISS, NOAA, and others, over the last 30 years are probably not valid. Much closer to accurate is the satellite data, which shows a much more sedate rate of increase on the order of 0.14 degrees C per decade since 1972. The disagreement between satellite and temperature data has given rise to efforts to reconcile the two mainly by "adjusting" the satellite data using computer models(!) among other things. Which is doing it exactly the wrong way around.
Saturday, January 02, 2010
Climate Sciencapades III
My interest in this material isn't in terms of trying to see if Hansen and the other stick players are doing things right. I'm more interested in seeing if processing the data in a simple and straightforward manner that makes sense to me gets results similar to what they are getting.
I've always had a great deal of trouble with the idea that a basic measurement like temperature is in need of massive computerized analysis. The temperature is the temperature, and it ought to be clear what's going on just by looking at the raw data, or something close to it.
So far we've seen that the differences between Hansen's dset0 and dset1 (raw and locally adjusted data respectively) are minimal when anomalies are calculated for each station or location. How about if the absolute temperatures are calculated?
So, I did this by averaging readings from the stations from dset0 in a given location together and then averaging all the locations from dset0 and then dset1 across years and then plotted the means to compare the two sets.
How much of the adjustment of dset0 is necessary if the two sets are so similar at the end of the adjustment process? Why build a gnarly black box of an analysis method if a simple and straightforward approach will do as well? When comparing dset0 and dset1 on a location by location basis the differences look pretty big in some cases, but apparently they don't make much difference overall. Perhaps adjustments are cancelling each other out.
Note that the plots look quite different from the plots of anomalies. The calculation of anomalies smooths the data considerably. This happens apparently because differences in locations are removed and many of the locations are not represented through the whole time period examined. The process of dealing with these difficulties is apparently handled by gridding the data, and for that I'll be looking into what McIntyre has found again.
I've always had a great deal of trouble with the idea that a basic measurement like temperature is in need of massive computerized analysis. The temperature is the temperature, and it ought to be clear what's going on just by looking at the raw data, or something close to it.
So far we've seen that the differences between Hansen's dset0 and dset1 (raw and locally adjusted data respectively) are minimal when anomalies are calculated for each station or location. How about if the absolute temperatures are calculated?
So, I did this by averaging readings from the stations from dset0 in a given location together and then averaging all the locations from dset0 and then dset1 across years and then plotted the means to compare the two sets.
How much of the adjustment of dset0 is necessary if the two sets are so similar at the end of the adjustment process? Why build a gnarly black box of an analysis method if a simple and straightforward approach will do as well? When comparing dset0 and dset1 on a location by location basis the differences look pretty big in some cases, but apparently they don't make much difference overall. Perhaps adjustments are cancelling each other out.
Note that the plots look quite different from the plots of anomalies. The calculation of anomalies smooths the data considerably. This happens apparently because differences in locations are removed and many of the locations are not represented through the whole time period examined. The process of dealing with these difficulties is apparently handled by gridding the data, and for that I'll be looking into what McIntyre has found again.
Friday, January 01, 2010
Climate Sciencapades II
I might as well share this shocking discovery that I made this evening.
Shocking to me, at least.
You may recall from my previous post that I came up with a simple-minded way of calculating yearly anomalies, that is by subtracting the mean of all data from a given location from each of the yearly means.
Hansen describes a much more complicated method of calculating the yearly anomalies. I won't go into it at this point, but I was concerned that my simple approach might make a significant difference.
And so I compared the two. That is, anomalies calculated using Hansen's method with my simple method. And here is the result:
The black plot is dset1 with anomalies calculated by Hansen's method. The red dashed plot is dset1 with anomalies calculated using my method. If you have trouble seeing the red plot it's because it's right on top of the black one.
Here's a plot of the difference between the two:
So the difference is very small. Once again Hansen's complicated approach appears to be unnecessary.
Shocking to me, at least.
You may recall from my previous post that I came up with a simple-minded way of calculating yearly anomalies, that is by subtracting the mean of all data from a given location from each of the yearly means.
Hansen describes a much more complicated method of calculating the yearly anomalies. I won't go into it at this point, but I was concerned that my simple approach might make a significant difference.
And so I compared the two. That is, anomalies calculated using Hansen's method with my simple method. And here is the result:
The black plot is dset1 with anomalies calculated by Hansen's method. The red dashed plot is dset1 with anomalies calculated using my method. If you have trouble seeing the red plot it's because it's right on top of the black one.
Here's a plot of the difference between the two:
So the difference is very small. Once again Hansen's complicated approach appears to be unnecessary.
Climate Sciencapades
What is one to do these days with regard to climate change except look at the data oneself? And so I've been doing some of that.
I downloaded raw temperature data sets from http://www.climateaudit.info/data/station/giss, then loaded the files into R:
load("giss.dset0.tab.old") #Data set 0, raw data with some adjustments
load("giss.dset1.tab") #Data set 1, stations at one location combined and adjusted
load("giss.info.tab") #Station data
I was especially interested in the heat island effect as a possible cause of modern land surface warming.
First I compared dset0 and dset1 to get an idea about how adjustments were made:
pt = function(sta, giss.dset0, giss.dset1) {
if(!is.na(giss.dset1[[sta]][1])) {
color = c("red", "green", "blue", "brown", "coral", "darkgreen", "orange", "pink")
par(col="black",lwd=3)
substa = ""
for(i in 1:length(giss.dset0[[sta]])) {
substa = paste(substa, names(giss.dset0[[sta]])[i])
}
plot(giss.dset1[[sta]][[1]][[1]], giss.dset1[[sta]][[1]][[18]],
type="l", main=names(giss.dset1[[sta]]), xlab="Year", ylab="Temp",
sub=substa, xlim=c(1880,2009))
for(i in 1:length(giss.dset0[[sta]])) {
par(col=color[i],lwd = 1)
lines(giss.dset0[[sta]][[i]][[1]], giss.dset0[[sta]][[i]][[18]], lty=i)
}}}
for(sta in 1:length(giss.dset1)) {
pt(sta, giss.dset0, giss.dset1)
print(paste(sta, giss.info[[2]][[sta]], giss.info[[8]][[sta]], sep=" "))
getGraphicsEvent(onKeybd=keybd)
}
With this you can flip through locations and see how dset0 became dset1. One finds some pretty striking adjustments such as segements of data that were left out and stations that were ignored. No doubt the reasons for doing this are good, but it's hard to figure out what the reasons are from the data. For example:
The black plot is the data included in dset1. The colored plots are the raw data from individual stations in the same area that were combined to produce dset1. Note that the data from the third station, from about 1960 to 1985, was not included in dset1. Why? There were a number of examples of this in other locations.
Another example:
The readings of the two stations in the 1890 to 1975 period were adjusted upward by about 1 degree to produced the dset1 data. What justifies this?
In the final analysis it's the overall effect that's important. We are interested in whether or not the planet itself is getting warmer and by how much. Do these adjustments affect that measurement?
I'm mainly interested in trends, and so I dispensed with gridding of the data. This is valid since we are comparing the same locations with each other in the different sets. My simple minded approach was to average all the data in dset0 by years and compare that to dset1.
#First get mean anomalies for dset0.
#Calculate anomalies as annual mean temps minus the mean of all temps
#for a given location. Throw out locations with less than 30 good
#years.
for(i in 1:length(giss.dset0))
for(j in 1:length(giss.dset0[[i]])) if(is.na(giss.dset0[[i]])[[j]]) next else
{
m = mean(giss.dset0[[i]][[j]]$ANN, na.rm=TRUE)
giss.dset0[[i]][[j]]$anom = giss.dset0[[i]][[j]]$ANN-m
}
years=0; anoms=0
for(i in 1:length(giss.dset0))
for(j in 1:length(giss.dset0[[i]])) if(is.na(giss.dset0[[i]])[[j]]) next else
if(length(giss.dset0[[i]][[j]]$anom)-sum(is.na(giss.dset0[[i]][[j]]$anom)) < 30) next else
{
if(length(years)==1) {
years = giss.dset0[[i]][[j]]$year
anoms = giss.dset0[[i]][[j]]$anom
}
else {
years = c(years, giss.dset0[[i]][[j]]$year)
anoms = c(anoms, giss.dset0[[i]][[j]]$anom)
}
}
dset0.t = data.frame("year"=years, "anom"=anoms)
dset0.means = tapply(dset0.t$anom, dset0.t$year, mean, na.rm=TRUE)
And then to dset1:
#Reorganize the data into a simpler data table and include location info
#for future comparison of rural vs urban stations.
#Put the code in a function for debugging in R.
extr1 = function(giss.dset1, giss.info) {
id = 0; year=0; temp=0; ur = 0; li = length(giss.dset1)
for (i in 1:li) if(sum(is.na(giss.dset1[[i]])) > 0) next else {
for(j in 1:length(giss.dset1[[i]][[1]][[1]])) {
id = c(id, giss.info[[2]][[i]])
ur = c(ur, giss.info[[8]][[i]])
}
year = c(year, giss.dset1[[i]][[1]][[1]])
temp = c(temp, giss.dset1[[i]][[1]][[18]])
}
return(data.frame("id"=id, "ur"=ur, "year"=year, "temp"=temp))
}
dset1.t = extr1(giss.dset1, giss.info)
#Calculate anomaly as annual temp minus mean of all temps
dset1.s = split(dset1.t, dset1.t[[1]])
for(i in 1:length(dset1.s)){
t = mean(dset1.s[[i]][[4]], na.rm=TRUE)
dset1.s[[i]]$anom = dset1.s[[i]]$temp - t
}
dset1.a = unsplit(dset1.s, dset1.t[[1]])
dset1.means = tapply(dset1.means$anom, dset1.means$year, mean, na.rm=TRUE)
#Comparison plot.
par(col="black")
plot(names(dset0.means), dset0.means, type="l", xlab="Year", ylab="degrees C",
main="Data Set1 vs Data Set 0", lwd=3)
abline(coef(line(names(dset0.means), dset0.means)))
par(col="red")
lines(names(dset1.means), dset1.means, lty=2,lwd=3)
abline(coef(line(names(dset1.means), dset1.means)))
And voila!
There is no significant difference between the sets. A very simple averaging of data collected from various stations gives the same plot as an adjusted data set made up of stations painstakingly combined together to represent a given location. At least this is the case where determining anomalies is concerned.
As for the urban effect on temperatures:
This is dset1. The black plot is all of dset1. The red plot is urban locations and the blue plot is rural locations. As you can see there is no difference to speak of in this adjusted data set between urban stations and rural ones. The plots and regression lines all fall on each other for the most part.
The next step in the processing and adjustment of this data is to "grid" it, meaning that an effort is made to determine measurments over the whole surface of the earth in grid squares of a few degrees each. Hansen spends considerable effort making adjustments for urban effects, but it looks like those adjustments are not necessary.
I downloaded raw temperature data sets from http://www.climateaudit.info/data/station/giss, then loaded the files into R:
load("giss.dset0.tab.old") #Data set 0, raw data with some adjustments
load("giss.dset1.tab") #Data set 1, stations at one location combined and adjusted
load("giss.info.tab") #Station data
I was especially interested in the heat island effect as a possible cause of modern land surface warming.
First I compared dset0 and dset1 to get an idea about how adjustments were made:
pt = function(sta, giss.dset0, giss.dset1) {
if(!is.na(giss.dset1[[sta]][1])) {
color = c("red", "green", "blue", "brown", "coral", "darkgreen", "orange", "pink")
par(col="black",lwd=3)
substa = ""
for(i in 1:length(giss.dset0[[sta]])) {
substa = paste(substa, names(giss.dset0[[sta]])[i])
}
plot(giss.dset1[[sta]][[1]][[1]], giss.dset1[[sta]][[1]][[18]],
type="l", main=names(giss.dset1[[sta]]), xlab="Year", ylab="Temp",
sub=substa, xlim=c(1880,2009))
for(i in 1:length(giss.dset0[[sta]])) {
par(col=color[i],lwd = 1)
lines(giss.dset0[[sta]][[i]][[1]], giss.dset0[[sta]][[i]][[18]], lty=i)
}}}
for(sta in 1:length(giss.dset1)) {
pt(sta, giss.dset0, giss.dset1)
print(paste(sta, giss.info[[2]][[sta]], giss.info[[8]][[sta]], sep=" "))
getGraphicsEvent(onKeybd=keybd)
}
With this you can flip through locations and see how dset0 became dset1. One finds some pretty striking adjustments such as segements of data that were left out and stations that were ignored. No doubt the reasons for doing this are good, but it's hard to figure out what the reasons are from the data. For example:
The black plot is the data included in dset1. The colored plots are the raw data from individual stations in the same area that were combined to produce dset1. Note that the data from the third station, from about 1960 to 1985, was not included in dset1. Why? There were a number of examples of this in other locations.
Another example:
The readings of the two stations in the 1890 to 1975 period were adjusted upward by about 1 degree to produced the dset1 data. What justifies this?
In the final analysis it's the overall effect that's important. We are interested in whether or not the planet itself is getting warmer and by how much. Do these adjustments affect that measurement?
I'm mainly interested in trends, and so I dispensed with gridding of the data. This is valid since we are comparing the same locations with each other in the different sets. My simple minded approach was to average all the data in dset0 by years and compare that to dset1.
#First get mean anomalies for dset0.
#Calculate anomalies as annual mean temps minus the mean of all temps
#for a given location. Throw out locations with less than 30 good
#years.
for(i in 1:length(giss.dset0))
for(j in 1:length(giss.dset0[[i]])) if(is.na(giss.dset0[[i]])[[j]]) next else
{
m = mean(giss.dset0[[i]][[j]]$ANN, na.rm=TRUE)
giss.dset0[[i]][[j]]$anom = giss.dset0[[i]][[j]]$ANN-m
}
years=0; anoms=0
for(i in 1:length(giss.dset0))
for(j in 1:length(giss.dset0[[i]])) if(is.na(giss.dset0[[i]])[[j]]) next else
if(length(giss.dset0[[i]][[j]]$anom)-sum(is.na(giss.dset0[[i]][[j]]$anom)) < 30) next else
{
if(length(years)==1) {
years = giss.dset0[[i]][[j]]$year
anoms = giss.dset0[[i]][[j]]$anom
}
else {
years = c(years, giss.dset0[[i]][[j]]$year)
anoms = c(anoms, giss.dset0[[i]][[j]]$anom)
}
}
dset0.t = data.frame("year"=years, "anom"=anoms)
dset0.means = tapply(dset0.t$anom, dset0.t$year, mean, na.rm=TRUE)
And then to dset1:
#Reorganize the data into a simpler data table and include location info
#for future comparison of rural vs urban stations.
#Put the code in a function for debugging in R.
extr1 = function(giss.dset1, giss.info) {
id = 0; year=0; temp=0; ur = 0; li = length(giss.dset1)
for (i in 1:li) if(sum(is.na(giss.dset1[[i]])) > 0) next else {
for(j in 1:length(giss.dset1[[i]][[1]][[1]])) {
id = c(id, giss.info[[2]][[i]])
ur = c(ur, giss.info[[8]][[i]])
}
year = c(year, giss.dset1[[i]][[1]][[1]])
temp = c(temp, giss.dset1[[i]][[1]][[18]])
}
return(data.frame("id"=id, "ur"=ur, "year"=year, "temp"=temp))
}
dset1.t = extr1(giss.dset1, giss.info)
#Calculate anomaly as annual temp minus mean of all temps
dset1.s = split(dset1.t, dset1.t[[1]])
for(i in 1:length(dset1.s)){
t = mean(dset1.s[[i]][[4]], na.rm=TRUE)
dset1.s[[i]]$anom = dset1.s[[i]]$temp - t
}
dset1.a = unsplit(dset1.s, dset1.t[[1]])
dset1.means = tapply(dset1.means$anom, dset1.means$year, mean, na.rm=TRUE)
#Comparison plot.
par(col="black")
plot(names(dset0.means), dset0.means, type="l", xlab="Year", ylab="degrees C",
main="Data Set1 vs Data Set 0", lwd=3)
abline(coef(line(names(dset0.means), dset0.means)))
par(col="red")
lines(names(dset1.means), dset1.means, lty=2,lwd=3)
abline(coef(line(names(dset1.means), dset1.means)))
And voila!
There is no significant difference between the sets. A very simple averaging of data collected from various stations gives the same plot as an adjusted data set made up of stations painstakingly combined together to represent a given location. At least this is the case where determining anomalies is concerned.
As for the urban effect on temperatures:
This is dset1. The black plot is all of dset1. The red plot is urban locations and the blue plot is rural locations. As you can see there is no difference to speak of in this adjusted data set between urban stations and rural ones. The plots and regression lines all fall on each other for the most part.
The next step in the processing and adjustment of this data is to "grid" it, meaning that an effort is made to determine measurments over the whole surface of the earth in grid squares of a few degrees each. Hansen spends considerable effort making adjustments for urban effects, but it looks like those adjustments are not necessary.
Subscribe to:
Posts (Atom)