Visualizing ENMs in environment space

August 1, 2016, 3:52 pm

≫ Next: Automatic formula building for enmtools.glm()

Hey, wouldn't it be nice if you could look at the predictions your ENM makes in environment space? Well now you can!*

*two-dimensional plots only, only works with enmtools.model objects, offer void in Nebraska

With a new function that I just uploaded last night, you can take any enmtools.model object and a set of environment layers, and you can visualize the response of your model to those two layers. Cool, huh? Check this out:

allogus.glm = enmtools.glm(pres ~ layer.1 + layer.2 + layer.3 + layer.4, allogus, env)

plot(allogus.glm)

env.plots = visualize.enm(allogus.glm, env, layers = c("layer.1", "layer.2"))

env.plots

The first plot shows us the predicted suitability in environment space for two variables (layer.1 and layer.2), while holding the remaining variables constant at their mean value across all presence points. Here's that GLM:

OH GOD THAT'S APPALLING. It does make sense given our geographic projection of the model, though - many occurrence points have low suitability scores. So what happened? The second plot gives us some insight.

The colored background here shows us the relative density of our background points in environment space. And this really points up one of the most significant conceptual things about ENMs that is worth having a good think about: presence/background methods are trying to estimate a function that distinguishes your occurrence points from your background points. Many of the occurrences for this species are very similar to the distribution of background data. As a result of this, these models tend to emphasize occurrences that happen in areas of environment space that are under-represented in the background. The model is essentially being "pulled" towards those points that occur in the black/purple areas of the background density plot, and as a result it tends to extrapolate heavily into areas of environment space in the bottom left. Where, it should be noted, we have no data of any sort. Yikes.

Cathy Newman asked via Twitter whether there were any of these plots that don't look funky, and the answer so far is "not many"! For Bioclim and Domain models you often get something that looks fairly reasonable, even though the geographic prediction may not be great. For example, here's a Domain model for another species:

Doesn't look as insane in environment space as that GLM up there, but as you can see the predicted habitat suitability is not a great reflection of the species' distribution. Which one of those models is more believable and/or useful is a very good question that I'm not going to delve into just now. I do think these plots are really useful and interesting for thinking about the modeling process, even though what they usually tell us is that our pretty maps are often associated with shockingly weird estimates of the underlying ecology.

All of the above is in the current version of ENMTools on GitHub. It does only work with enmtools.model objects, so you're going to need to walk through how to build those first. There's a nice readme on the GitHub landing page that should explain a lot.

Side note: the limits of the x and y axes are set by the max and min for each layer in the environmental layers you provide. If you want to zoom in, you can add xlim and ylim arguments after the fact. For instance, to zoom into the first environment space plot up at the top there, we could do:

env.plots$suit.plot + xlim(c(1500,3000)) + ylim(c(900, 2100))

↧

Automatic formula building for enmtools.glm()

August 2, 2016, 2:37 pm

≫ Next: Check out those GAMs!

≪ Previous: Visualizing ENMs in environment space

I just added a new bit of functionality to enmtools.glm. Nothing major, but it's kinda cool and it saves a bit of time when building GLMs that are strictly an additive function of a set of environment layers. It used to be that you'd have to pass enmtools.glm() a formula object in order to build a model, e.g.:

ahli.glm = enmtools.glm(f = pres ~ layer.1 + layer.2 + layer.3 + layer.4, species = ahli, env = env, test.prop = 0.2)

You CAN still do that, but now you no longer have to. Now if you don't pass it a formula, it assumes that your formula takes the form:

presence ~ layer.1 + layer.2 + ...

For every layer in your environment stack. That means you can just call it like the other enmtools modeling functions, e.g.,

ahli.glm = enmtools.glm(species = ahli, env = env, test.prop = 0.2)

And it will use all the layers in env. If you want a strictly additive formula that just uses a subset of those layers, you can just pass a subset of them to the env argument, like this:

ahli.glm = enmtools.glm(species = ahli, env = env[[c(1,3,5)]], test.prop = 0.2)

The function will then build a model using the formula

presence ~ env[[1]] + env[[3]] + env[[5]]

And ignore all the other layers.

If you want some sort of more complicated functional response (polynomials, interactions, etc.), you'll still need to supply the formula manually.

↧

Check out those GAMs!

August 3, 2016, 2:27 pm

≫ Next: Ecospat tests!

≪ Previous: Automatic formula building for enmtools.glm()

As of yesterday afternoon (Australia time), ENMTools can now do GAMs as well! That includes all of the hypothesis tests, visualization, etc. that you get with the other methods.

Just like the recent update to the enmtools.glm() function, enmtools.gam() has the ability to automatically build a function if you don't supply one. For instance if you have four layers in a stack called "env" named "layer.1", "layer.2", etc. and call enmtools.gam() thusly:

ahli.gam = enmtools.gam(ahli, env)

The function will automatically build the function:

presence ~ s(layer.1, k = 4) + s(layer.2, k = 4) + s(layer.3, k = 4) + s(layer.4, k = 4)

As you can see above, the default value for k, the smoothing parameter, is 4. This is not necessarily optimal, though, and it's definitely worth exploring for your specific data. You can either supply GAM formulas manually (using the "f" argument to enmtools.gam), or you can just provide a "k" argument, e.g,

ahli.gam = enmtools.gam(ahli, env, k = 6)

Which produces

presence ~ s(layer.1, k = 6) + s(layer.2, k = 6) + s(layer.3, k = 6) + s(layer.4, k = 6)

Formula arguments or k values can also be passed to the hypothesis testing functions (identity, background, rangebreak, etc.).

↧

Ecospat tests!

August 3, 2016, 10:48 pm

≫ Next: The flood becomes a trickle

≪ Previous: Check out those GAMs!

I haven't added this to the vignette on GitHub yet (because I just literally got it working a couple of minutes ago), but I've leveraged the enmtools.species object structure to make the equivalency and similarity tests from ecospat much more accessible.

For those that aren't familiar, these tests essentially do the same sort of thing that the identity/equivalency and background/similarity tests in good ol' ENMTools do, but they do them directly in environment space with no need to build an ENM. Basically they make kernel density estimates of your species' occurrence in environment space, the available habitat for your species in environment space, and the environment space itself. Then they basically "correct" the density of your species for the density of available habitat and measure overlap using I and D. Those overlaps are then tested against a null distribution from a permutation test, much like the ones in ENMTools.

I really liked the Broennimann et al. idea when I read about it, and was super excited when ecospat came out. The interface is a bit clunky, though, and requires a lot of setup to run. The nice thing about the enmtools.species object structure is that I can actually automate all of this setup for you, and in the end you get a really cool ecospat test with basically zero hassle!

Oh, one note, though: ecospat only works in two environmental dimensions. Keep that in mind, because if you try to pass the enmtools.ecospat functions more or less than two dimensions they will barf and yell at you. If, for instance, you have two enmtools.species objects with backgrounds and presence points (named ahli and allogus here) you can call the ecospat tests like this:
And you get the good stuff you're used to by now...

esp.id = enmtools.ecospat.id(ahli, allogus, env[[c("layer.1", "layer.3")]])

esp.bg.sym = enmtools.ecospat.bg(ahli, allogus, env[[c("layer.1", "layer.3")]], test.type = "symmetric")

esp.bg.asym = enmtools.ecospat.bg(ahli, allogus, env[[c("layer.1", "layer.3")]], test.type = "asymmetric")

ecospat.id test p-values:

D I

0.02 0.02

As well as this cool sucker:

This plot shows the availability and occupancy of the environment for each species. Neat, huh?

Noe that these scaled densities look a little weird. That's because I left low density thresholds for ecospat at zero (th.sp and th.env options for function ecospat.grid.clim.dyn). Those are definitely something you should mess with, otherwise you get (as I have here) situations where the most relevant habitat to the I and D metrics is in fact some of the most marginal habitat for the species.

↧

The flood becomes a trickle

August 9, 2016, 6:22 pm

≫ Next: Automatic report generation

≪ Previous: Ecospat tests!

Just a quick update: the flood of posts is going to slow down a bit from here on out, because I've now built enough of the framework of ENMTools that I can start working on the really cool experimental stuff I've got planned. I will put some more time into polishing up what's already available, beefing up the help files, etc., but won't be adding new features to the final version anywhere near as quickly.
Please absolutely do let me know when you run into problems with the R version, though! I've got a few test data sets I can use for evaluation purposes, but it's almost certain that some of you out there will run into new issues using your own data sets. I want to figure out what those issues are and catch them as soon as possible, so that the end product is as easy to use as I can make it.

↧

Automatic report generation

August 14, 2016, 6:49 pm

≫ Next: Passing args to enmtools.maxent()

≪ Previous: The flood becomes a trickle

One of the things I'm working on now on the develop branch of enmtools is code to automatically generate html reports on model structure and performance. The goal here is to provide an accessible maxent-style output with as little hassle as possible. The current structure is just a skeleton, but I think it's already pretty neat. Here's a sample html report for a GAM, exactly as it comes out of enmtools:

Summary of ENMTools gam object for allogus

Summary of ENMTools gam object for allogus

Spatial prediction

plot of chunk plot-suitability

Model: presence ~ s(layer.1, k = 4) + s(layer.2, k = 4) + s(layer.3, , k = 4) + s(layer.4, k = 4)

plot of chunk response-plots

## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## presence ~ s(layer.1, k = 4) + s(layer.2, k = 4) + s(layer.3, 
##     k = 4) + s(layer.4, k = 4)
## 
## Parametric coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -3.3732     0.1911  -17.65   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##              edf Ref.df Chi.sq  p-value    
## s(layer.1) 1.641  1.994  1.101  0.57526    
## s(layer.2) 1.000  1.001 26.379 2.81e-07 ***
## s(layer.3) 2.850  2.963 10.804  0.00856 ** 
## s(layer.4) 2.725  2.922  8.414  0.03309 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.0661   Deviance explained = 12.7%
## UBRE = -0.63881  Scale est. = 1         n = 1052

Evaluation

Geographic space

plot of chunk eval-geo-train

## class          : ModelEvaluation 
## n presences    : 52 
## n absences     : 1000 
## AUC            : 0.7715385 
## cor            : 0.2319033 
## max TPR+TNR at : -2.695771

## 
## 
## Proportion of data wittheld for model testing:

## [1] 0.2

plot of chunk eval-geo-test

## class          : ModelEvaluation 
## n presences    : 13 
## n absences     : 1000 
## AUC            : 0.7910385 
## cor            : 0.1300592 
## max TPR+TNR at : -3.014335

Environment space

plot of chunk eval-env-train

## class          : ModelEvaluation 
## n presences    : 52 
## n absences     : 10000 
## AUC            : 0.5357038 
## cor            : -0.0329415 
## max TPR+TNR at : 0.01546464

## 
## 
## Proportion of data wittheld for model testing:

## [1] 0.2

plot of chunk eval-env-test

## class          : ModelEvaluation 
## n presences    : 13 
## n absences     : 10000 
## AUC            : 0.5380154 
## cor            : -0.01697503 
## max TPR+TNR at : 0.01534051

Model fit using gam.check

plot of chunk model-fit

## 
## Method: UBRE   Optimizer: outer newton
## full convergence after 9 iterations.
## Gradient range [-4.371328e-07,2.558592e-06]
## (score -0.6388078 & scale 1).
## Hessian positive definite, eigenvalue range [4.369759e-07,0.0004284836].
## Model rank =  13 / 13 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##               k'   edf k-index p-value
## s(layer.1) 3.000 1.641   0.932    0.47
## s(layer.2) 3.000 1.000   0.927    0.36
## s(layer.3) 3.000 2.850   0.862    0.02
## s(layer.4) 3.000 2.725   0.806    0.00

Notes

## [1] "No formula was provided, so a GAM formula was built automatically"

Citations

Warren, D.L. (2016) Package ‘enmtools’. Available online at: https://github.com/danlwarren/ENMTools

Hijmans, R.J, Phillips, S., Leathwick, J. and Elith, J. (2011), Package ‘dismo’. Available online at: http://cran.r-project.org/web/packages/dismo/index.html.

↧

Passing args to enmtools.maxent()

August 16, 2016, 9:34 am

≫ Next: Quick FYI about lat/lon data in the ENMTools R package

≪ Previous: Automatic report generation

Thanks to comments from Matthew King and Nicholas Huron, I've been chasing down some bugs in how enmtools.maxent passes arguments to the "args" parameter of dismo's maxent function. There were a couple of issues here: first, I'd just flat-out screwed something up. That's fixed now, so go get the newest version before you do any maxenting. Second, it just doesn't recognize "args" automatically, so you need to explicitly assign it when you call the function. For instance, this doesn't work:

my.args =c("betamultiplier=0.5", "product=FALSE", "hinge=FALSE", "threshold=FALSE")

allogus.mx.args = enmtools.maxent(allogus, env, my.args)

But this does:

my.args =c("betamultiplier=0.5", "product=FALSE", "hinge=FALSE", "threshold=FALSE")

allogus.mx.args = enmtools.maxent(allogus, env, args = my.args)

Happy maxenting!

↧

Quick FYI about lat/lon data in the ENMTools R package

August 22, 2016, 7:06 am

≫ Next: Age-overlap correlation tests and building an enmtools.clade object

≪ Previous: Passing args to enmtools.maxent()

In case any of you are hitting errors with the ENMTools R package, just be aware that at present it assumes that your data has longitude as the first column and latitude as the second column. I've got some code to fix this, but I'm traveling at the moment and don't have the ability to merge it into the master branch quite yet. I'll get to it soon (promise!) but for now just format your data in the way it expects to see it. Thanks to Utku Perktas for reminding me!

↧

Age-overlap correlation tests and building an enmtools.clade object

September 1, 2016, 7:40 pm

≫ Next: Small but useful change: ENMTools R package now auto-recognizes lat and lon columns

≪ Previous: Quick FYI about lat/lon data in the ENMTools R package

NOTE: I wrote much of the core code for this at the PhyloDevelopeR workshop/hackathon in Nantucket last week, and I just wanted to express my appreciation for all I learned there. I want to thank Liam Revell for hosting the workshop, and April Wright and Klaus Schliep for all their work organizing the workshop and teaching all of the classes. It was a great experience, and I learned a lot. Thanks!

2nd note: Blogger keeps absolutely wrecking my R code. Sorry about that. Try copy/pasting the bits in grey boxes into a text editor, they should come out fine there.

The ENMTools R package now allows you to do age-overlap correlation tests. This is a generalization of the age-range correlation tests of Barraclough and Vogler, Fitzpatrick and Turelli, etc. Basically what these methods do is take the average (topologically corrected) overlap across each node in the tree, and perform a linear regression of overlap as a function of time. The significance of the departure of that slope and intercept from zero is estimated via a Monte Carlo test, in which the tips of the tree are shuffled randomly.

Although this method was originally developed for used with overlap between species' geographic ranges, it has since been adapted for use with niche overlap metrics (Knouft et al. 2006, Warren et al. 2008) and metrics for similarity of point distributions (Cardillo and Warren 2016). The current implementation in the ENMTools R package does all of the above, with one slick interface.

To run an AOC test, first you need to build an enmtools.clade object. This involves building a set of enmtools.species objects, and then lumping them together into an enmtools.clade object with a phylogeny that includes all of those species. It's important that the tip labels of the tree and the names of the species objects match, of course. Here we'll build a clade for five Hispanolan anoles, using a tree called "hisp.anoles".

brev.clade=enmtools.clade(species=list(brevirostris, marron, caudalis, websteri, distichus), tree=hisp.anoles)
check.clade(brev.clade)

## 
## 
## An enmtools.clade object with 5 species
## 
## Species names: 
##   brevirostris    caudalis    distichus   marron      websteri
## 
## Tree: 
## 
## Phylogenetic tree with 5 tips and 4 internal nodes.
## 
## Tip labels:
## [1] "brevirostris""caudalis""distichus""marron"
## [5] "websteri"
## 
## Rooted; includes branch lengths.
## 
## 
## Data Summary: 
##              species.names  in.tree presence background range    
## brevirostris "brevirostris" TRUE    175      0          "present"
## caudalis     "caudalis"     TRUE    16       0          "present"
## distichus    "distichus"    TRUE    628      0          "present"
## marron       "marron"       TRUE    11       0          "present"
## websteri     "websteri"     TRUE    17       0          "present"

Easy, right? Pay attention to the summary table at the end there, because it can help you to spot problems before they happen. These AOC analyses can take a LONG time and if you have a run that aborts halfway through it really sucks. Regardless of what kind of analysis you're doing you want to be sure that "in.tree" reads as "TRUE" for every species. The "presence" and "background" columns tell you how many points each of your species has for the presence and background points, and the "range" column tells you whether each species has a range raster.

You don't need all of those data types for every analysis - which ones you do need will be determined by the type of analysis you end up doing. For the methods that use ENMs (overlap.source = "bc", "dm", "glm", "gam", and "mx"), you will need presence points at a minimum and will have to supply a set of environmental layers for the function's "env" argument. For overlap.source = "range", every species needs to have a range raster with numerical values within the species range and NA values outside of it. For the point overlap method, all you need is presence points.

Okay, let's do an old fashioned age-range correlation!

range.aoc=enmtools.aoc(clade=brev.clade,  nreps=50, overlap.source="range")
summary(range.aoc)

## 
## 
## Age-Overlap Correlation test
## 
## 50 replicates 
## 
## p values:
## (Intercept)         age 
##  0.07843137  0.03921569

Cool, right? This shows that the slope of the relationship between range overlap and node age is statistically significantly different from zero, meaning species ranges tend to overlap more as they become more distantly related. It's a bit iffy to infer anything too meaningful from five species, but that suffices to demonstrate the functionality anyway.

You can do the same thing with point overlap metrics:

point.aoc=enmtools.aoc(clade=brev.clade,  nreps=50, overlap.source="points")

Or with ENM overlaps:

bc.aoc = enmtools.aoc(clade=brev.clade,  env=hisp.env, nreps=50, overlap.source="bc")

For ENM overlaps you can use any method of ENM construction that ENMTools knows ("bc", "dm", "gam", "glm", "mx"), and any metric of overlap that ENMTools knows ("D", "I", "cor", "env.D", "env.I", "env.cor"). You can also provide a formula for GLM and GAM models. For instance:

glm.aoc= enmtools.aoc(clade=brev.clade,  env=hisp.env, nreps=50, overlap.source="glm", model = presence ~ snv_1 + snv_10, metric = "I")

This produces an analysis of ENM overlap as a function of time, with ENMs built using GLMs with presence modeled as a function of snv_1 + snv_10, using the I overlap metric from Warren et al. 2008.

Just to reiterate: these analyses can get VERY time-consuming, particularly for clades with a lot of species.

Addendum: In the future it'd be pretty interesting to hook this whole thing up to GLM instead of regular old lm, so we could get away from the assumption that the relationship between time and overlap is necessarily just a straight line. Shouldn't be too hard to do, but I have so many other plans at the moment that it's not super high on my priorities list.

↧

Small but useful change: ENMTools R package now auto-recognizes lat and lon columns

September 8, 2016, 5:43 pm

≫ Next: Many core functions now parallelized

≪ Previous: Age-overlap correlation tests and building an enmtools.clade object

Just a quick little note to let you know that the ENMTools R package now automatically looks for columns named "x" and "y", or beginning with "lat" and "lon" as part of its pre-analysis check of the species objects. It's a little thing, but it should prevent a lot of errors and confusion.

↧

Many core functions now parallelized

September 8, 2016, 11:37 pm

≫ Next: Note: Parallelization not working with Maxent models

≪ Previous: Small but useful change: ENMTools R package now auto-recognizes lat and lon columns

I've now got parallelized code running for the background and identity tests, as well as the linear and blob rangebreak tests. The ribbon rangebreak test is going to take longer, because it contains some necessary failure detection code that needs to be wrapped differently than the other tests.

This code is not yet on the main branch on GitHub, it's on the branch named "apply".

As it stands, each of the functions by default uses all of the cores available in the system. You can decrease that by just supplying a "cores = x" argument, where x is however many cores you want it to use. If you're happy using all of the cores on your system, you can just call the functions exactly as before.

Obviously the speed differences here are going to depend on how many cores you have on your system. I've got a 24-core machine I'm working on right now, and going from 1 core to 24 on my test data results in massive speed increases - identity and background tests for 20 reps drop from ~10 minutes to ~1 minute. Pretty slick!

Anyway, give it a shot if you can and let me know if you run into any issues with it. Thanks to Nick Huron for reminding me!

↧

Note: Parallelization not working with Maxent models

September 12, 2016, 2:22 pm

≫ Next: RWTY: R We There Yet? A package for looking at MCMC chain performance in Bayesian phylogenetics

≪ Previous: Many core functions now parallelized

For the time being and for the foreseeable future, Maxent models aren't working with multiple cores. This is due to an issue with the mclapply and rJava functions in R; rJava just straight-up does not work with mclapply, and as far as I can tell there's no way to make it do so. As it stands, ENMTools just sets the number of cores to 1 for any of the tests when "type" is set to "mx". If anyone knows, or discovers, a workaround for this please do let me know!

↧

RWTY: R We There Yet? A package for looking at MCMC chain performance in Bayesian phylogenetics

September 28, 2016, 10:20 pm

≫ Next: RWTY post: New vignette for diagnosing MCMC convergence and lack thereof

≪ Previous: Note: Parallelization not working with Maxent models

In case you're wondering why the ENMTools posts and Git commits have slammed to a halt, it's because my other R package (RWTY) just took focus. We got really good reviewer comments back, but they require a bit of work and have a deadline so for the moment they take precedence. I've also decided I'm going to start blogging about RWTY here along with ENMTools, because RWTY is cool as can be and I'll be darned if I want to start another blog for it.

You can find RWTY at https://github.com/danlwarren/RWTY

It's a collaboration between me, Rob Lanfear, and Anthony Geneva, and I think it's pretty darn special.

↧

RWTY post: New vignette for diagnosing MCMC convergence and lack thereof

September 28, 2016, 11:16 pm

≫ Next: Nice tutorial on conducting background tests using the Perl version of ENMTools

≪ Previous: RWTY: R We There Yet? A package for looking at MCMC chain performance in Bayesian phylogenetics

This was requested by a couple of reviewers on the forthcoming RWTY app note. It's something we had discussed doing anyway, but it's good that they forced us to sit down and do it because it's super helpful. Basically it's a graphical rundown of two different data sets and what you can learn from the absolute legion of plots RWTY produces. This is very much a first draft, but it's all there. My hands are aching for typing; it was 5000 words' worth of yammering within the course of about eight hours.

http://danwarren.net/plot-comparisons.html

It's also available in the newest version of RWTY on github by using browseVignettes("rwty")

↧

Nice tutorial on conducting background tests using the Perl version of ENMTools

October 13, 2016, 10:22 pm

≫ Next: Solution to erroneous "please update your maxent program to version 3.3.3b or later" error when running maxent from dismo & ENMTools

≪ Previous: RWTY post: New vignette for diagnosing MCMC convergence and lack thereof

Here's a really nice tutorial by Daniel Romero on how to do the background test in the standalone version of ENMTools. He walks you through setting up the program, running the analysis, and how to avoid some of the errors that might crop up when trying to run the software.

Thanks, Daniel!

↧

Solution to erroneous "please update your maxent program to version 3.3.3b or later" error when running maxent from dismo & ENMTools

April 29, 2017, 11:51 pm

≫ Next: Model-based inference in historical and ecological biogeography 2017!

≪ Previous: Nice tutorial on conducting background tests using the Perl version of ENMTools

I originally encountered this problem when trying to run maxent from ENMTools; but it turns out it was a combination of problem with the dismo package, and with my Mac machine having different versions of java installed (1.6 and 1.8).

I am pasting the solution that worked for me below, and to other relevant lists if I find them, since I didn't find a direct answer online.

Thanks - Nick

Solution to erroneous "please update your maxent program to version 3.3.3b or later" error when running maxent from dismo

Hi all,

I spent about 2 hours figuring this out on my Mac, so I might as well share it, as I didn't find anything else on specifically this problem online.

SUMMARY: After installing rJava, dismo, and downloading/pasting the new maxent.jar file into the dismo install, I still got this error: "please update your maxent program to version 3.3.3b or later"

RESOLUTION: The error was erroenous, it was actually due to a java mismatch error, in R/R.app on Mac OS X

I am posting my notes on this for google-ability, perhaps it will help others, or perhaps it will help myself when I forget all of this and get stuck with the same error on a mac laptop or something in 6 months!

I. SETUP

MacOSX El Capitan, Version 10.11.6

Use R from Terminal, and from R.app

II. PROBLEM:

I got set up to run maxent from R/dismo. Steps:

- installed rJava

- installed dismo

- Downloaded the new maxent.jar (3.4.0, released December 2016 I think) from:

http://biodiversityinformatics.amnh.org/open_source/maxent/

- copied the jar file into the java directory of the dismo install:

/Library/Frameworks/R.framework/Versions/3.3/Resources/library/dismo/java/

But, when, when running the example code in ?maxent, I got:

Loading required namespace: rJava

Error in .getMeVersion() :

please update your maxent program to version 3.3.3b or later. This version is no longer supported.

You can download it here: http://www.cs.princeton.edu/~schapire/maxent/'

Seeing as I had the newest version of maxent, this was confusing.

III. DIAGNOSIS

It turns out that this was the real problem. Inside the maxent.R code is the following:

https://github.com/cran/dismo/blob/master/R/maxent.R

=============================

.getMeVersion <- function() {

jar <- paste(system.file(package="dismo"), "/java/maxent.jar", sep='')

if (!file.exists(jar)) {

stop('file missing:\n', jar, '.\nPlease download it here: http://www.cs.princeton.edu/~schapire/maxent/')

}

.rJava()

mxe <- rJava::.jnew("meversion")

v <- try(rJava::.jcall(mxe, "S", "meversion") )

if (class(v) == 'try-error') {

stop('"dismo" needs a more recent version of Maxent (3.3.3b or later) \nPlease download it here: http://www.cs.princeton.edu/~schapire/maxent/

\n and put it in this folder:\n',

system.file("java", package="dismo"))

} else if (v == '3.3.3a') {

stop("please update your maxent program to version 3.3.3b or later. This version is no longer supported. \nYou can download it here: http://www.cs.princeton.edu/~schapire/maxent/'")

}

return(v)

}

=============================

This bit of the code determines the version of maxent.jar being used:

mxe <- rJava::.jnew("meversion")

v <- try(rJava::.jcall(mxe, "S", "meversion") )

...however, "rJava::.jcall" was giving an error due to a mismatch in java versions. Mac OS X comes with Java 1.6 installed, but I had installed the newest java for various other applications (Beast2 etc.)

The messages were slightly different in Terminal R and R.app, but they were messages like this:

java.lang.UnsupportedClassVersionError: Unsupported major.minor version 52.0

java.lang.UnsupportedClassVersionError: density/Utils

These produce a "try-error" in "v", leading to this code mistakenly assuming that the problem is an old maxent.jar:

===============

mxe <- rJava::.jnew("meversion")

v <- try(rJava::.jcall(mxe, "S", "meversion") )

if (class(v) == 'try-error') {

stop('"dismo" needs a more recent version of Maxent (3.3.3b or later) \nPlease download it here: http://www.cs.princeton.edu/~schapire/maxent/

\n and put it in this folder:\n',

system.file("java", package="dismo"))

===============

I was able to find my java installs, on my Mac, with:

cd /Library/Java/JavaVirtualMachines/

I had:

1.6.0.jdk

jdk1.8.0_51.jdk

IV. SOLUTION(S)

Various hints were on this page:

http://stackoverflow.com/questions/26948777/how-can-i-make-rjava-use-the-newer-version-of-java-on-osx

...but I did *not* find that I had to use sudo to re-install anything. For me, what worked was:

SOLUTION: R from Terminal

1. From the Terminal, run: "R CMD javareconf"

2. Enter R, reinstall rJava *from source*, e.g.:

install.packages('rJava',,'http://www.rforge.net/')

3. I also re-installed dismo from source, and then re-pasted maxent.jar into /Library/Frameworks/R.framework/Versions/3.3/Resources/library/dismo/java/

Then, everything worked:

library(rJava)

library(dismo)

maxent()

> This is MaxEnt version 3.4.0

V. NEW PROBLEM

However, when I opened R.app and tried running maxent(), I now got a new error on library(rJava):

========================================

library(rJava)

Error : .onLoad failed in loadNamespace() for 'rJava', details:

call: dyn.load(file, DLLpath = DLLpath, ...)

error: unable to load shared object '/Library/Frameworks/R.framework/Versions/3.3/Resources/library/rJava/libs/rJava.so':

dlopen(/Library/Frameworks/R.framework/Versions/3.3/Resources/library/rJava/libs/rJava.so, 6): Library not loaded: @rpath/libjvm.dylib

Referenced from: /Library/Frameworks/R.framework/Versions/3.3/Resources/library/rJava/libs/rJava.so

Reason: image not found

Error: package or namespace load failed for ‘rJava’

========================================

VI. NEW SOLUTION

The solution to this, in R.app, was:

========================================

dyn.load('/Library/Java/JavaVirtualMachines/jdk1.8.0_51.jdk/Contents/Home/jre/lib/server/libjvm.dylib')

library(rJava)

maxent()

> This is MaxEnt version 3.4.0

========================================

To make this "permanent", I added:

dyn.load('/Library/Java/JavaVirtualMachines/jdk1.8.0_51.jdk/Contents/Home/jre/lib/server/libjvm.dylib')

...to my .Rprofile file, an invisible file (you might have to create it in a plain-text editor) in your mac user directory ("cd ~").

This works, but presumably will have to be changed if/when I update java.

VII. RECOMMENDATION

This bit of code in the dismo R package should probably be updated:

===============

mxe <- rJava::.jnew("meversion")

v <- try(rJava::.jcall(mxe, "S", "meversion") )

if (class(v) == 'try-error') {

stop('"dismo" needs a more recent version of Maxent (3.3.3b or later) \nPlease download it here: http://www.cs.princeton.edu/~schapire/maxent/

\n and put it in this folder:\n',

system.file("java", package="dismo"))

===============

...to (1) distinguish between Java errors and old Maxent versions; and (2) the link http://www.cs.princeton.edu/~schapire/maxent/ should be updated to the new link, for the open-source maxent, online at the American Museum: http://biodiversityinformatics.amnh.org/open_source/maxent/

I am posting this to a couple of relevant lists, perhaps it will help others, or perhaps it will help myself when I forget all of this and get stuck with the same error on a mac laptop or something in 6 months.

Thanks,

Nick

Nick Matzke contact info

Usually in Australia:

================================

Nicholas J. Matzke

Professional introduction: http://www.nickmatzke.net

Active work website: http://phylo.wikidot.com/nicholas-j-matzke

Discovery Early Career Researcher Award (DECRA) Fellow

DECRA granted by ARC (Australian Research Council)

Division of Ecology and Evolution (E&E)

College of Medicine, Biology & Environment (CMBE)

Research School of Biology (RSB)

The Australian National University

Room 208, Building 116, Gould Wing

The Australian National University

ACT 2601 AUSTRALIA

Email: nickmatzke.ncse@gmail.com

nick.matzke@anu.edu.au

Skype: nicholas.matzke

Cell (preferred): +61 0410-726-191

Office (not preferred): +61 02 612 52 450

================================

Sometimes at meetings in the U.S.:

================================

Emails: same

Phone: 510-301-0179

================================

↧

Model-based inference in historical and ecological biogeography 2017!

October 16, 2017, 12:03 am

≫ Next: Pardon our dust

≪ Previous: Solution to erroneous "please update your maxent program to version 3.3.3b or later" error when running maxent from dismo & ENMTools

Matthew van Dam and I will be offering a new class in Barcelona with Transmitting Science this year, based in large part on the class that Nick Matzke and I taught last year. It's called Model-based Inference in Historical and Ecological Biogeography, and if last year was any indication it's going to be a lot of fun. We'll mostly be focusing on BioGeoBEARS and the new ENMTools R package, and there's a whole lot of very cool stuff going on with both. Join us if you can!

http://www.transmittingscience.org/courses/biogeography/model-based-statistical-inference-ecological-evolutionary-biogeography/

↧

Pardon our dust

November 20, 2017, 6:48 am

≫ Next: Best to avoid using B1 breadth metric in environment space

≪ Previous: Model-based inference in historical and ecological biogeography 2017!

We're doing a whole bunch of stuff to the ENMTools R package right now. A lot of it is below the hood (cleaning up dependencies for CRAN and whatnot), but some of it is in service of implementing new functionality. Anyway, in the process of fixing a few things we also broke a few things, so if you've downloaded ENMTools from the main branch in the past few days you might be having some trouble. I've fixed the bits that I have found so far, and the new code is now up on the main branch. We're going to make a real push over the next little while to get some testthat code working and get set up with continuous integration to avoid these mishaps in the future.

↧

Best to avoid using B1 breadth metric in environment space

January 5, 2018, 3:50 am

≫ Next: Massive wad of ENMTools-R updates just published

≪ Previous: Pardon our dust

This just came to light relatively recently: the latin hypercube version of the B1 metric in environment space is probably not trustworthy as currently implemented. Due to the combination of standardizing the distribution and the use of logs in the calculation, there's a dependence on sample size that makes the metric fail to converge. For an illustration, here's B2 as a function of sample size:

That's behaving as you'd like it to - seems to be converging on a relatively stable value, not changing much with additional sampling (note the scale of the Y axis).

Now look at B1:

There's an obvious trend here with increasing sample size, and the scale of the Y axis is such that those differences could be quite significant.

At some future date we may figure out how to adjust for this, but for now I'd say just avoid using B1 in environment space altogether.

↧

Massive wad of ENMTools-R updates just published

January 31, 2018, 4:06 am

≫ Next: Correlation MDS-space plots added to raster.cor.plot

≪ Previous: Best to avoid using B1 breadth metric in environment space

I've spent the last month relentlessly tweaking ENMTools-R's code to make it CRAN-compatible, and we're pretty much there now. Most of the changes aren't visible from the user's end of things, but they're necessary to make sure that it's suitable for wider distribution. I've tested everything, and it seems to all be working.

THAT SAID, it's entirely possible that something has been borked up that isn't popping up in my own code. If you download ENMTools from the GitHub repository and notice it acting weird in some way, please don't hesitate to raise a GitHub issue about it.

Also, there's a nice new function called raster.cor.plot that does this:

Which is pretty darn cute if I do say so myself. It's visualizing the correlations between a set of predictor rasters.

↧