Showing posts with label missing data. Show all posts
Showing posts with label missing data. Show all posts

Saturday, March 27, 2010

More Holes than a Swiss Cheese

It seems that looking at the effective data start date is not the only yardstick by which to evaluate the BoM "high quality" RCS stations. It is also necessary to graph the data to look for the additional holes in the data.

In the previous post, we noticed that several stations had data missing for 40 or more years beyond their notional commencement date. One of the stations looked at was 008039-Dalwallinu, which has a notional start date of 1912, but a data start date of 1955 - or 43 years of missing data. Upon closer examination it appears that the continuous data start date is actually 1957. A closer look again shows that there is missing data for 1970 as well, which clearly shows on the graph of the data:













There are several stations that have incomplete data in the early years, including:
As bad as they are, the sample above are by no means the worst of the bunch. Several of the stations have large gaps in the middle of the data as well, including:

014508 - Gove Airport - [data] [graph]













017043 - Oodnadatta Airport - [data] [graph]













063005 - Bathurst Agricultural Station - [data] [graph]













200790 - Christmas Island Airport - [data] [graph]













... and even 300000 - Davis Station - [data] [graph]













Summary
As noted in the previous post, there are 32 RCS stations that have data missing from the beginning of the dataset. From this analysis, we see that there are 22 RCS stations that have data missing beyond the initial start date. Eight of these RCS stations are in the same list as the previous post, which means that there is a total of 46 RCS stations (or a staggering 44.6%) that have significant missing data.

In the HARRY_READ_ME.txt file from the CRU ("Climategate") files, the author of the READ_ME makes the following telling comments:

"getting seriously fed up with the state of the Australian data. so many new stations have been introduced, so many false references.. so many changes that aren't documented."
"Now looking at the dates.. something bad has happened, hasn't it. COBAR AIRPORT AWS cannot start in 1962, it didn't open until 1993!"
"I am very sorry to report that the rest of the databases seem to be in nearly as poor a state as Australia was."
It is very hard to disagree with these comments. In this and the last two posts it has become apparent that:
  • Cunderdin station has been summarily dropped out of the RCS network and substituted by Cunderdin Airport (with no documentation to indicate the change).
  • Many stations exhibit the same problems as "Harry" observed with Cobar Airport
  • Over 44% of the stations have sigificant amounts of data missing - in a number of cases, enough to make the data that is left unusable.
One does not have to be a rocket scientist to appreciate the frustration that "Harry" must have gone through in attempting to make sense of the RCS data. How bad is the RCS data? It's all hidden in plain view for anyone who takes the time to look.

Thursday, March 25, 2010

Where has all the data gone ?

As noted yesterday, the BoM temperature data for the "new" Cunderdin station (010286 - Cunderdin Airfield) begins at 1996, while the station record indicates that the station commenced in 1942.












Apparently, there are 52 years worth of data missing from the record for this "new" Cunderdin station.

Which begs the question - is this a unique occurence, resulting from the late-night changes at the BoM to substitute station 010286 for the inconvenient Cunderdin station 010035?

A look at the listing of RCS stations on the BoM site shows that there are 103 "high quality" stations in this network. They have notional station start dates ranging from 1860 through to 2003, with 65% of them having a notional start date prior to 1969. Overall, the average notional station start date is 1950.

However, when you look at the data start date, a different picture emerges. For openers, 32 of the stations (32%) have a data start date which is later than the station start date by more than a year, or have significant gaps in the data in the early years.

Some of these missing years of data are considerable - several are in the order of the 52 year gap noted for Cunderdin, above. For example:

008039 - Dalwallinu Comparison. [data] [graph]
Station date = 1912, Data date = 1955 (43 years missing)













010592 - Lake Grace PO. [data] [graph]
Station date = 1914, Data date = 1956 (42 years missing)



 
 
 
 
 
 
 
 
 
 
017031 - Marree Comparison. [data] [graph]
Station date = 1885, Data date = 1939 (54 years missing)


 
 
 
 
 
 
 
 
 
 
 
 
And several more in addition to these.
 
The net effect of all this missing data is that for these 103 stations, the oldest station data start date moves forward by 4 years (to 1864), while the average station data start date moves forward by 9 years (to 1959).
 
Where's the Beef?
 
Just how "high quality" are the stations in the RCS network? What is the overall impact on the derived climate record of all these missing years? With so much data from the early years missing, how can anyone make a valid long-term climate assessment? Is this RCS data of any practical value at all?