A Conversation on Georeferencing, TCNs, and the HUB

The following is an attempt to shed some light on an important and sometimes confusing aspect of the digitization process – georeferencing – in the context of the Advances in Digitization of Biological Collections. Georeferencing, in our discipline, refers to the assignment of an analytical representation of a place where an event, such as a collecting event, occurred on the earth. The primary purpose of this posting is to give a brief statement about the state of the art and give a few metrics and perspectives that may help those writing TCNs or other digitization proposals. This posting was motivated by several requests for information on the subject, or which the following is a good example.
Corinna Gries wrote:
Now questions regarding georeferencing.

John, are you anticipating the HUB developing/automating workflows for that? And should we anticipate those to be available somewhat later in the process? We have now put in for georeferencing during the later part of the project and are anticipating that we still need some workforce on the ground. But should we also anticipate to be involved with developing technology to automate workflows somewhat more than they currently are? I am not talking developing new approaches. But given the large number of herbaria we are working with and their various approaches to data management we need to have those workflows on-line rather than connected to a desktop application. Is that something you are anticipating as well?


John Wieczorek wrote:

A HUB will definitely have to better document known-good georeferencing workflows. A reasonable HUB would also consult with projects to optimize for their specific situation and provide the necessary support to make sure that any proposed georeferencing work
plan is successful. As there is no way for a HUB to anticipate the georeferencing workload and budget for it, the TCNs and other digitization projects will have to budget to get the work done, whether through personnel within the projects, or from others willing to do it.

The three fundamental steps in the georeferencing process will be Prepare, Collaborate, and Repatriate. Anyone who will do georeferencing should definitely plan to participate in a week-long workshop to be trained as part of the Prepare phase. Anyone who will georeference and has never been to one and thinks s/he doesn’t need to is a prime candidate to attend. The training workshops are quite mature and proven effective. More than twenty of them have been given internationally and over 500 people have been trained. The lastest, co-sponsored by GBIF and TanBIF, took place 26-29 Oct 2010 in Dar es Salaam, Tanzania.

The longer you can wait, the better documented the georeferencing workflows will be. The tools are all fully functional now, and further  development is already funded from grants outside the HUB, so the tools WILL only get better. But there is another reason to wait as long as you can to georeference. The more you have digitized when you start to georeference, the greater the economies of scale, as you want to make your first pass georeferencing locations, not specimens.

Ideally, you will also through in your lot with others with the same georeferencing challenges and achieve further economies of scale by collaborating to georeferencing your combined holdings together. This sort of thinking will make the most of the Collaboration phase. The three biggest georeferencing collaborations to date have all used a similar workflow and the same best practices, with spectacular results. All of them are still in the lengthy repatriation process. The Prepare and Collaboration phases were easy by comparison.

As for metrics, the mean georeferencing rate for localities in the US and Canada is 30 localities per hour (following complete best practices and using the BioGeomancer Workbench batch processing). The most difficult localities (China, Russia) have been the worst case scenario with a rough rate of 6 localities per hour. So the numbers you need to make a reasonable estimate the costs is the number of localities and the rate for the region of the earth where they occur. Those may be a very difficult numbers to get for undigitized material, but it may be worth some research to estimate it based on the mean
number of specimens per location. For terrestrial vertebrates the rate seems to hold fairly well at 6 specimens per locality, but this is likely quite different in other taxonomic groups.

You can see the documentation and get an idea of what’s possible at the project web sites listed below. I think you would be safe to write into your proposal that there are proven solutions with well-known metrics for georeferencing as demonstrated by the NSF-funded vertebrate networks participating in VertNet (http://vertnet.org).



GEOLocate: http://www.museum.tulane.edu/geolocate/
BioGeomancer: http://bg.berkeley.edu/latest
Georeferencing Calculator: http://manisnet.org/gci2.html

Latest Workshop

Posted in Uncategorized | Leave a comment

White House Message on Scientific Collections

This post courtesy of David Schindel:

I am pleased to attach the policy on scientific collections that was signed recently by Dr. John Holdren, Science Advisor to the President and Director of the White House Office of Science and Technology Policy.  The memo was distributed to all federal agencies this week. Those of us involved in the Interagency Working Group on Scientific Collections are very pleased that it raises the profile of scientific collections, and sets deadlines for implementation of three recommendations from the IWGSC report (http://www.whitehouse.gov/sites/default/files/sci-collections-report-2009-rev2.pdf).

This message demonstrates the seriousness attached to scientific collections by the Obama Administration.  I hope you will transmit it to colleagues in your respective countries and scientific disciplines along with the IWGSC report.

Posted in Uncategorized | 1 Comment

A New HUB Proposal Takes Form

Three weeks ago we announced that individuals from a number of institutions across the United States got together in Boulder, Colorado, to discuss how the biological and informatics communities might go about responding to the Advancing Digitization of Biological Collections (ADBC) solicitation from NSF. For many, this solicitation represents a once in a lifetime opportunity to tackle, in a coordinated manner, a national leadership challenge. If we can work together effectively, we can make significant inroads towards digitizing all biological specimens and data, both recent and paleontological, in collections around the United States.

When the meeting in Boulder concluded, it was agreed by all participants that high levels of transparency, communication, and involvement by the community were both needed and expected in the development of any and all HUB proposals. Since then, many of us have come to feel that these levels have not been addressed in a meaningful way and the resulting silence over the last three weeks has become a liability to the community.

We believe that it is vital for able members of the community to step up and move the process forward and to keep the broad community abreast of developments.  To wait to begin the processes of community organization and communication after all of the awards are announced would be waste the first six months to one year of the HUB’s limited time.  Therefore, it is critical that the community begin to come together now to support the success of the HUB, regardless of who is selected to lead.

In this spirit, we’d like to announce that as the end result of many discussions over the last couple of weeks, a proposal for a HUB is going forward with CU Boulder as the lead institution and with one of us (Guralnick) as the lead Principal Investigator. As yet no one is bound to remain in this collaboration, nor are we certain that others will not join, but there was broad support from Yale, University of Kansas, Berkeley, the Field Museum, University of New Mexico, Tulane, and Harvard for this HUB arrangement.

No doubt there are others of you considering submitting HUB proposals, and if so we’d like to hear from you via this blog. Why is this? The simple answer is that we firmly believe that regardless of who finally obtains funding from NSF, more feedback from the community will lead to a better HUB. This blog was started to further that discussion.

Some of the things we want to know from members of the broad community:

  • What do you want a Home Uniting Biocollections (HUB) to do?
  • How might we start that process now?
  • What kinds of things will help you the most, whether you are planning to submit a TCN or not?

Of course, we have our ideas (and we’ll want to share them), but there is a chance for us to begin syncing up now as opposed to later.

As we move forward, we’ll be using the blog forum here, listservers, and the NSF wiki, to engage you and ask questions. It is our belief that community participation now will pay big dividends when this process actually begins, no matter who ends up leading the HUB. We hope that you, individuals and groups from all biocollections, will take the time to comment and provide input.


Rob Guralnick, University of Colorado

Christopher Norris, Yale University/SPNHC

David Bloom, VertNet

Posted in Uncategorized | 4 Comments

An Open Letter About ADBC

The role of ADBC in the larger digitization context: a beginning, not the end

The ADBC solicitation from NSF represents a tremendous opportunity to begin the process of digitizing our nation’s biodiversity and paleodiversity collections.  We believe tremendous progress will be made over the multi-year time span of the ADBC program, leading to a very significant increase in digital and mobile holdings. Yet, we caution the community to not make the mistake of overestimating OR underestimating what ADBC can do.  ADBC should been seen as a starting point for the enormous task of digitizing our natural heritage, not the sole solution.  We argue below that the community must use ADBC to leverage other opportunities and work towards an inclusive view of supporting multiple collections communities.

ADBC came out of a community-led process that has its roots in a set of reports that assess the state of federally held collections.  The Interagency Working Group on the Scientific Collections report determined a compelling need for “the creation of an online clearinghouse of information about Federal scientific collections”.  A subsequent NSF Scientific Collections Survey concluded that a key need is “coordination and interoperability of data networks critical for effective use of collections in research.”  Federal agency support led to two workshops held at NESCent to develop a strategic plan for a national digitization effort. This strategic plan led to ADBC, but the aims and objectives of the plan are much wider and more ambitious.

To be successful, a national digitization effort must do more than just capture collections data.  It must generate tools to access and mobilize these data and build user communities around the data without simultaneously diverting critical resources from the care and maintenance of the collections themselves.  ADBC will address some of these needs, but digitization of biological and paleontological collections will need to be pursued through Biological Research Collections grants and other systematics, survey, and biodiversity-based granting mechanisms.  Existing and new projects funded in programs such as Advances in Biological Informatics and the burgeoning numbers of cyberinfrastructure programs will address the technological pieces of the puzzle needed to increase efficiency rates, data improvements, and data mobilization.

ADBC is not the only game in town at NSF for biological digitization, and NSF is not the only mechanism to support the larger mission of completing the digitizing task.  The Institute for Museum and Library Science (IMLSmay support digitization efforts, as might other federal funding agencies (e.g., National Institute of Health, Department of the Interior via USGS, National Biological Information Infrastructure).  Opportunities abound to use ADBC as a springboard to approach foundations that may want to support such efforts.  Partnerships with private industry are also well worth considering.

Our conclusion is that we should be operating as a community, not only to develop the best set of proposals for ADBC but also to address the ultimate challenge: completing the digitization of all of our nation’s natural heritage in the next ten years. A Home Unifying Biocollections (HUB) and a set of Thematic Collections Networks (TCNs) is a start, but they will not be enough for success.  We need to harness ADBC as a flash point to catalyze our efforts with other agencies and potential funders through which we can begin to assemble a broader view of the potential opportunities for our community.  ADBC is a rare opportunity to leverage a strong federal focus on the digitization of biological collections and, most importantly, to mobilize the community to do it the right way.

We appreciate all the comments, thoughts, etc. that you can muster here.  We want your feedback!

Best regards,

Rob Guralnick, University of Colorado Boulder

Christopher Norris, Yale University/SPNHC

David Bloom, VertNet

Posted in Uncategorized | 3 Comments

HUB Open Discussion

The ADBC solicitation invites proposals for the creation of a Home Unifying Biocollections (HUB).  Specifically,

The national HUB will coordinate the digitization effort, fostering partnerships, training, and innovations, facilitating workflows, serving as a central site for integrating data and techniques, monitoring data online in a timely manner and regular schedule, and establishing cohesion and interconnectivity among digitization projects funded by this program or other existing digitization activities.  In addition, the HUB will coordinate activities with the thematic collections networks, described below, enable ongoing communication between partners in the digitization activity, and help to identify gaps and priorities for digitization efforts. Innovative proposals for this entity are strongly encouraged and can come from a single institution, an institution with partners through subawards, a virtual organization, or other creative models that will provide unity and oversight for the national resource.

Are you planning to submit a HUB proposal?  Would you like to collaborate on a HUB proposal?  Do you have any questions or ideas about the HUB or the ADBC solicitation?  Please let us know using the comment string.

If you are planning a proposal, please let us know, at a high level, what you propose to accomplish, how you intend to create a HUB, and the philosophy(ies) behind your actions (e.g., how you would engage the community or TCNs?).

An open and frank conversation with the community will only make the HUB a welcome asset to our work with collections and informatics.

Posted in Uncategorized | 4 Comments

TCN Open Discussion

The ADBC solicitation invites proposals for the creation of Themtic Collections Networks (TCNs).  Specifically,

Thematic Collections Network (TCN) proposals will be submissions for two-to-four year awards based on size of the collections to be digitized. Recipients will perform fundamental collections digitization but will also be engaged in training activities and the development of appropriate technology and standards to produce an interoperable network. Collaborative TCN proposals are strongly encouraged.

Are you planning to submit a proposal for a TCN?  Would you like to collaborate on a TCN proposal?  Do you have any questions or ideas about TCNs or the ADBC solicitation?  Please let us know using the comment string.

If you are planning a proposal, please let us know, at a high level, what you propose to accomplish, what research question you intend to address, what types of collections, georaphic region(s), and technologies will be involved.

An open and frank conversation with the community will only make each TCN a welcome asset to our work with collections and informatics.

Posted in Uncategorized | 1 Comment

Informal Survey of HUB Roles and Responsibilities

Prior to convening the round-table in Boulder, a simple survey was sent out to all of the participants. to gauge the support, or lack thereof, for specific roles and responsibilities that a HUB might perform.

Of the 30 individuals asked to complete the survey, 17 replied.  The survey asked four questions:

  1. Which roles (listed) do you believe should be a role of responsibility of a HUB?
  2. Which roles should be a centralized responsibility of a HUB?
  3. Are there additional roles or responsibilities that you believe should be the responsibility of the HUB?
  4. Please identify an organization, agency, or individual that you believe is best qualified to address this issue or role?

The survey was intended to encourage thoughtful discussion during the meeting in Boulder.  All answers to the survey were anonymous and no decisions were made based solely on its results.

We invite the community to review the raw data from the survey.  If you have any questions, please contact David Bloom.

Posted in Uncategorized | Leave a comment