A Critique of the BERKLEY EARTH scalpel: Cut away the signal, Analyze the noise

Composed 120814 – BEST Part 3 (unpublished),
Published to this blog on 140316

Until I see a defense of the Berkley Earth BEST process from the context of Fourier Analysis (Frequency content) and an Information theory, I cannot trust any conclusion BEST reaches on time scales longer than 10 years.

I stated my theoretical objection to the BEST scalpel back on April 2, 2011 in Expect the BEST, plan for the worst.

I believe the fourier domain of the BEST process has received far too little scientific and theoretical attention. From time to time I attempt to focus on the frequency domain of BEST until I see a defense. For instance, this post at Climate Audit, Nov. 1, 2011

I have a fundamental problem with the use of any scalpel and suture technique in the context of determining long term temperature trends. The basis for my objections are based upon Fourier Analysis and information content. My argument is summarized in these bullet points:

1. The Natural climate and Global Warming (GW) signals are extremely low frequency, less than a cycle per decade.

2. A fundamental theorem of Fourier analysis is
frequency resolution dw/2π Hz = 1/(N*dt) .
where dt is the sample time and
N*dt is the total length of the digitized signal.

3. The GW climate signal, therefore, is found in the very lowest frequencies, low multiples of dw, which can only come from the longest time series.

4. Any scalpel technique destroys the lowest frequencies in the original data.

5. Suture techniques recreate long term digital signals from the short splices.

6. Sutured signals have in them very low frequency data, low frequencies which could NOT exist in the splices. Therefore the low frequencies, the most important stuff for the climate analysis, must be derived totally from the suture and the surgeon wielding it. Where is the low-frequency original data to control the results ?

Perhaps it can be argued, demonstrated, and proved, that somehow the low frequencies were extracted, saved, and returned to the signal intact. Statements like the following from Muller (WSJ Eur 10/20/2011) make me believe that most people do not appreciate this problem.

Many of the records were short in duration, … statisticians developed a new analytical approach that let us incorporate fragments of records. By using data from virtually all the available stations, we avoided data-selection bias. Rather than try to correct for the discontinuities in the records, we simply sliced the records where the data cut off, thereby creating two records from one.

“Avoided data-selection bias” – and Embraced high frequency selection bias and created a bias against low frequencies. There is no free lunch here. Look at what is happening in the Fourier Domain. You are throwing a way low frequency climate signal and keeping the higher frequency weather noise. How can you possibly be improving climate signal/noise ratio?

Climate is a low frequency signal. The farther you go back in time, the lower frequencies you need in your analysis. Yet what does BEST do with all their data? Throw all the temperature records into the Cuisinart, chop them into bits “avoid data selection bias” (and induce who knows what sort of bias in the choice of where to use the scalpel), and unavoidably eliminating all the low frequencies in the data.

Yes, somehow they take all these fragments and glue them back together to be able to present a graph of temperatures from 1750 to 2010. That graph has low frequency data – but from where did it come? The low frequencies are counterfeit – contamination in the gluing process, manufacturing what appears to be low frequency signal from fitting high frequency data. How can low frequencies can be recreated from lots of fragments containing just high frequencies?

Phase vs Frequency is the dual formulation of Amplitude vs Time. There is a one to one correspondence. If you apply a filter to eliminate low frequencies in the Fourier Domain, and a scalpel does that, where does it ever come back?

A beautiful example of frequency content that I expect to be found in century scale uncut temperature records is found in Lui-2011 Fig. 2.  also found in WUWT 12/7/2011: In China there are no hockey sticks The grey area on the left of the Fig. 2 chart is the area of low frequency, the climate signal. In the Lui study, a lot of the power is in that grey area. It is this portion of the spectrum that BEST’s scalpel removes! Fig. 4 of Lui-2011 (at JoNova) is a great illustration of what happens to a signal as you add first the lowest frequency and successively add higher frequencies.

I’m a geophysicist. Geophysical seismic processing is heavily dependent upon Fourier Analysis. What I see BEST doing is eliminating low frequencies with the scalpel, performing some magic semi-regional homogenization of the high frequency segments behind the scenes, then returning with a result with “better” low frequency in it. I would sooner believe that the 2nd Law of Thermodynamics could be violated. How did they get something for nothing? How did they throw away low-frequency only to get back “better low frequencies” in the result?

In petroleum exploration, seismic data is recorded with band-pass instrumentation. The highest frequencies cut off at anywhere from 60 to 250 Hz, depending upon the care and expense in acquisition. But the data are also limited on the low side, with 6 to 10 Hz as the lowest frequency the receivers can record. Geophysicists gather lots of data. Many shots heard from many receivers repeated from many locations. There is lots of noise in the recordings, but by “stacking” the data, repeating data from the same place over many shots, signal to noise increases. One of the key steps in the processing is finding the best “stacking velocity”, average earth velocity to use to correct for “move-out”, differences between Source-receiver pairs and migration.

I explain this because what BEST is attempting to do is very similar to what seismic processors do when they Invert the seismic to obtain a full impedance profile. What must be understood is to get a full inversion you need two things.
(1) the band pass seismic data for high-frequency detail, and
(2) the velocity-density profile which provides the low frequency information.
When we invert, we integrate the seismic data, but that means we integrate the noise, too, so error grows with time. For (2) we get the velocity information from the study of velocities that maximizes the signal-to-noise in the stacked data. Density can be estimated from anticipated rock, depth, and fluid content. It is very model dependent, but it is controlled by the stacking and move-out process and is an independent control on the cumulative inversion error from band pass data in (1).

Returning to BEST, all those fragments of temperature records are equivalent to the band-pass seismic data. Finding the long term temperature signal is equivalent to inverting the seismic trace, but the error in the data must also accumulate as you go back in time. Since the temperature record fragments are missing the lowest frequencies, where is the low frequency control in the BEST process? In the seismic world, we have the velocity studies to control the low-frequency result.

What does BEST use to constrain the cumulating error? What does BEST use to provide valid low-frequency content from the data? What is the check that the BEST result is not just a regurgitation of modeler’s preconceptions and contamination from the suture glue? Show me the BEST process that preserves real low frequency climate data from the original temperature records. Only then can I even begin to give Berkley Earth results any credence.


Here is a map of the stations from the Global Summary of Day TAVG and the Monthly Climatic Data of the world, 25,569 sites, colored by elevation. No distinction yet between stations with 1 year of records or 50 years.

Here is a map of the stations from the Global Summary of Day TAVG and the Monthly Climatic Data of the world, 25,569 sites, colored by elevation. No distinction yet between stations with 1 year of records or 50 years.