Pubsplained #2: How many forams for a good climate signal?


Thirumalai, K., J. W. Partin, C. S. Jackson, and T. M. Quinn (2013), Statistical constraints on El Niño Southern Oscillation reconstructions using individual foraminifera: A sensitivity analysis, Paleoceanography, 28(3), 401–412, doi:10.1002/palo.20037. (Free Access!)


#Pubsplained #2: We provide a method to quantify uncertainty in estimates of past climate variability using foraminifera. This technique uses numerous, individual shells within a sediment sample and analyzes their geochemistry to reconstruct seasonal and year-to-year variations in environmental conditions.

Here is a link to our code.


 This plot shows how uncertainty in IFA statistics decreases (but not all the way!) as you increase the number of foraminiferal shells analyzed.

This plot shows how uncertainty in IFA statistics decreases (but not all the way!) as you increase the number of foraminiferal shells analyzed.

Planktic foraminifera are tiny, unicellular zooplankton that are widely found in the open ocean and can tolerate a large range of environmental conditions. During their short (2-4 weeks) lifespan, they build shells (or tests) made of calcium carbonate. The tests fall to the seafloor and continually become covered by sediments over time. We can access these foraminiferal tests using sediment-cores and analyze their geochemistry to unravel all sorts of things about past ocean conditions.

Typically, ~10-100 shells of a particular species are taken from a sediment sample, and collectively, analyzed for their isotopic or trace metal composition. This procedure is repeated with each subsequent sample as you move down in the core. Each of these measurements provides an estimate of the "mean climatic state" during the time represented by the sediment sample. In contrast, individual foraminiferal analyses (IFA), i.e. the geochemistry of each shell within a sample, can provide information about month-to-month fluctuations in ocean conditions during that time interval. The statistics of IFA have been used to compare and contrast climate variability between various paleoclimate time periods.

There are many questions regarding the uncertainty and appropriate interpretation of IFA statistics. We addressed some of these issues in this publication. We provided a code that forward-models modern observations of ocean conditions and approximates, with uncertainty, the minimum number of foraminiferal tests required for a skilled reconstruction. In other words: "how many shells are needed for a good climate signal?"

Armed with this algorithm, we tested various cases in the Pacific Ocean to obtain better estimates of past changes in the El Niño/Southern Oscillation, a powerful mode of present-day climate variability. We found that the interpretation of IFA statistics is tightly linked to the study location's climate signal. Namely, we found that the ratio of seasonality1 to interannual variability2 at a site controlled the IFA signal for a given species occurring throughout the year. We then demonstrated that this technique is far more sensitive to changes in El Niño amplitude rather than its frequency.

In the central equatorial Pacific, where the seasonal cycle is minimal and year-to-year changes are strong, we showed that IFA is skillful at reconstructing El Niño. In contrast, the eastern equatorial Pacific surface-ocean is a region where El Niño anomalies are superimposed on a large annual cycle. Here, IFA is better suited to estimate past seasonality and attempting to reconstruct El Niño is problematic. Such a pursuit becomes more complicated due to changes in the past synchrony of El Niño and seasonality.

Our results also suggest that different species of foraminifera, found at different depths in the water column, or with a particular seasonal preference for calcification, might have more skill at recording past changes in El Niño. However, care should be taken in these interpretations too because these preferences (which are biological in nature) might have changed in the past as well (with or without changes in El Niño).

You can use our MATLABTM code, called INFAUNAL, to generate your own probability distributions of the sensitivity of IFA towards seasonality or interannual variability for a given sedimentation rate, number of foraminifera, and climate signal at a core location in the Pacific. Do let me know if you have any difficulties running the code!

1 - The difference in environmental conditions between summer and winter, average over multiple years

2 Changes from year-to-year (could be winter-to-winter or summer-to-summer etc.) within the time period represented by the sediment sample

Book Review: Indica by Pranay Lal


Balancing the nuanced and involved intricacies of the scientific method versus proselytizing the fantastic “factoids” of popular science is a tough act. Having to straddle this line to focus on the geology and geobiological history of the Indian subcontinent, an ambitiously multidisciplinary topic, on which there are scant accessible texts (popular science or not), is an even tougher act to follow. Fortunately, Pranay Lal manages to achieve such a balance and convey his infectious enthusiasm about the subject matter rather effectively for the most part of Indica’s ~400 pages.

It was refreshing and enjoyable to learn about new geological and paleontological information of the Indian subcontinent - a topic dear to my heart. The detailed place-markers and the McPhee-esque narratives of sites where geological features are found scattered throughout India was highly interesting. The accompanying photographs and schematics are also very nicely done. You can quickly see that Lal put in hours and hours of (non-book-based) research into Indica — it shows. It felt as if Indica was an attempt to channel Sagan)’ or Bryson or Winchester but with a focus on the history of the Indian subcontinent — a fantastic idea, and frankly, it’s puzzling that it took someone so long to do so. However, it becomes apparent through Lal’s reporting that it is quite challenging to piece together and chronicle information on such a vastly “big-picture” topic, especially, when construction, urban expansion, and apathy are on their path to eroding many of India’s geological marvels.

Lal is a geneticist by training and his disposition towards anthropology, biology, and paleontology becomes discernible as his writing on these topics shines. For example, his narrative on the evolutionary history of the recently discovered Indian purple frog (Nasikabatrachus sahayadrensis), its evolutionary ties to another frog found in Seychelles, and its parallels to the tuatara or kiwi was a treat to read. Moreover, the lengthy descriptions of India’s Phanerozoic paleoenvironment and the medley of dinosaurs that walked on the subcontinent were entertaining. The closing chapters on hominid evolution and India’s potential contribution to this story were thought-provoking.

As a downside to Indica, there are many small inaccuracies conveyed with certainty that are really more uncertain than presented. My friend Suvrat Kher has an excellent blog post on many problematic sections that dealing with sedimentology, tectonics, and mantle dynamics. I can echo Suvrat’s concerns in the paleomonsoon and paleoclimate domain where, amongst other things, Lal makes it seem as if we have a more concrete picture of the vagaries of the monsoon, its initiation, and its intensification than we actually do. Many of these points amount to more than sheer nitpicking. Ultimately, these inaccuracies are a significant downside to Indica, and I wonder about errors revolving around geobiology and other realms removed from my own field. Nevertheless, these inaccuracies did not prevent me from puzzling about them for a few minutes and moving on, driven by Lal’s ardor (one day, on my second read, I might find the time to write down my concerns as well and as thoroughly as Suvrat did).

As a closing statement, Indica is for anyone and everyone interested in the geological natural history of the Indian subcontinent. It should be mandatory reading for anyone working on the topic, and more importantly, for students/workers who do read it, I recommend trying to spot the inaccuracies and perhaps making a list.

Pubsplained #1: How to fit a straight line through a set of points with uncertainty in both directions?


Thirumalai, K., Singh, A., & Ramesh, R. (2011). A MATLAB™ code to perform weighted linear regression with (correlated or uncorrelated) errors in bivariate data. Journal of the Geological Society of India, 77(4), 377–380. 
doi: 10.1007/s12594–011–0044–1


We present a code that fits a line through a set of points (“linear regression”). It is based on math first described in 1966 that provides general and exact solutions to the multitude of linear regression methods out there. Here is a link to our code.


 Fitting a straight line through a bunch of points with X and Y uncertainty.

Fitting a straight line through a bunch of points with X and Y uncertainty.

My first peer-reviewed publication in the academic literature described a procedure to perform linear regression, or, in other words, build a straight line (of “best fit”) through a set of points. We wrote our code in MATLAB and applied it to a classic dataset from Pearson (1901).

“Why?”, you may ask, perhaps followed by “doesn’t MATLAB have linear regression built into it already?” or “wait a minute, what about polyfit?!”

Good questions, but here’s the kicker: our code numerically solves this problem when there are errors in both x and y variables… and… get this, even when those errors might be correlated! And if someone tells you that there is no error in the x measurement or that errors are rarely correlated - I can assure you that they are most probably erroneous.

York was the first to find general solutions for the “line of best fit” problem when he was working with isochron data where the abscissa (x) and ordinate (y) axis variables shared a common term (and hence resulted in correlated errors). He first published the general solutions to this problem in 1966 and subsequently published the solutions to the correlated-error problem in 1969.

If these solutions were published so long ago, why are there so many different regression techniques detailed in the literature? Well, it’s always useful to have different approaches to solving numerical problems, but as Wehr & Saleska (2017) point out in a nifty paper from last year, the York solutions have largely remained internal to the geophysics community (in spite of 2000+ citations), escaping even the famed “Numerical Recipes” textbooks. Furthermore, they state that there is abundant confusion in the isotope ecology & biogeochemistry community about the myriad available linear regression techniques and which one to use when. I can somewhat echo that feeling when it comes to calibration exercises in the (esp. coral) paleoclimate community. A short breakdown of these methods follows.

Ordinary Least Squares (OLS) or Orthogonal Distance Regression (ODR) or Geometric Mean Regression (GMR): which one to use?!

Although each one of these techniques might be more appropriate for certain sets of data versus others, the ultimate take-home message here is that all of these methods are approximations of York’s general solutions, when particular criteria are matched (or worse, unknowingly assumed).

  • OLS provides unbiased slope and intercept estimates only when the x variable has negligible errors and when the y error is normally distributed and does not change from point to point (i.e. no heteroscedasticity).
  • ODR, formulated by Pearson (1901), works only when the variances of the x and y errors do not change from point-to-point, and when the errors themselves are not correlated. ODR also fails to handle scaled data i.e. slopes and intercepts devised from ODR do not scale if the x or y data are scaled by some factor. Note that ODR is also called “major axis regression”.
  • GMR transforms x and y data and can thus scale estimates of the slope and intercept but works only under the condition when the ratio of the standard deviation of x to the standard deviation of the error on x is equal to that same ratio in the y coordinate.

Most importantly, and perhaps quite shockingly, NONE of these methods involve the actual measurement uncertainty from point-to-point in the construction of the ensuing regression. Essentially, each method is an algebraic approximation of York’s equations, and whereas his equations have to be solved numerically in their most general form, they provide the most unbiased estimates of the slope and intercept for a straight line. In 2004, York and colleages showed that his 1969 equations, (based on least-square estimation) were also consistent with (newer) methods based on maximum likelihood estimation when dealing with (correlated or uncorrelated) bivariate errors. Our paper in 2011 provides a relatively fast way to iteratively solve for the slope and estimate.

In our publication, besides the Pearson data, we also applied our algorithm to perform “force-fit” regression - a unique case where one point is almost exactly known (i.e. very little error and near-infinite weight) - on meteorite data and showed that our results were consistent with published data.

All in all, if you want to fit a line through a bunch of points in an X-Y space, you won’t be steered too far off course by using our algorithm.