New toy in town – GNI

Some days ago – well, maybe weeks then – I touched on the usefulness of ZooBank, Catalog of Fishes, and friends. The bigger of them all is, however, GNI, a pronounceable acronym, a component of the GNA (The Global Names Architecture), but unrelated to GNU (GNU’s Not Unix). The Global Names Index is a name aggregator for scientific names of organisms. It contains 12 million names. You will now know why it takes a special category of wizards to practice taxonomy. These gentle people are managing 12 million names, and of course they will love this new toy brought to them by GBIF and EoL. Them, because GNI seems not to have much appeal beyond the professional taxonomist and biodiversity informatician.

GNI is one necessity when trying to build large systems of biological information, because all is indexed against names of organisms. To be sure, specialized systems like FishBase realized this many, many years ago and have systems that are superior within their domain. In the long run, however, a common approach may be the only way to endorse.

GNI is ok to search already now. Try Astyanax kullanderi, a fish you have not heard of before. Does it exist? One chance in 12 million. Enter and be confirmed.

It is there, in uBio, with one NameBank record drawn from Catalogue of Life and ultimately FishBase. It has an LSID there, but this is not the ZooBank LSID. We do not want to be confused, so we make a back click to find two GBIF records, neither georeferenced. It is the holotype, catalogued in NRM 21000 and served by GBIF-Sweden, but also in the GBIF edition of FishBase, which happens to be served also by GBIF-Sweden although the entry says it is served by FishBase Philippines. And at NRM this catalog number refers four paratypes.

Amazing, no?

Of course this tool is better needed for machine use than for humans to click around in. Or as David Remsen, the architect behind this construction puts it:

GNI was developed because of the central importance of the names of organisms in the management of data about organisms. The primary users of this site are not people, but other machines, so please don’t complain because the site is boring.

As a tool for testing the existence of names, it is already worth being bored a bit. If the result is positive, that is reassuring. If negative, apply the precautionary principle and ask your favorite taxonomist.

YouTube has this video of David Remsen explaining how the GNA works.

This is not Astyanax kullanderi, but a species of Synbranchus from Brazil,
closely related to Monopterus albus from Asia. Photo A. Kullander, CC-BY-NC

From fishes to ZooBank

Fishes are among the most informatized organisms that I know. There may be a number of reasons for that, but reasons aside, the fact is that Daniel Pauly and Rainer Froese created FishBase independently of Bill Eschmeyer’s Catalog of Fishes, and ichthyologist Julian Humphries created the museum collection database with the collection management system MUSE back in the late 1980s, giving fish collections a head start in informatics. Since 1976 Joseph Nelson has published the Fishes of the World, now in its fourth edition, as an index to systematic ichthyology and with an eclectic classification.

Whereas FishBase has a given hit rate of 20 million per month and so is doing fine, the Catalog of Fishes is maybe less well known. It started first as a catalog of genera of fishes, but was expanded and eventually published as three huge volumes with scientific names of fishes, over 50 000, complete with type locality, current status, and literature reference. It is presently a web resource and updated frequently. For the layman it may look like just too boring, but for the scientists it is a goldmine saving enormously on the time of finding information about specific species and their names.

This kind of compilations is important, because biodiversity research is facing now an enormous problem with names. There may be 1.8 million named species, and many more million out there to be found, but only a million or so species have been secured in databases. And every year at least 16 000 new species are described.

GBIF have an initiative called the GNI (Global Names Index) to harvest all names, and a structure, the GNA (Global Names Architecture) to manage them. They will do this together with other acronyms such as PESI. In the meantime, the Catalogue of Life, a collaboration including the US ITIS and global consortium Species2000, have a checklist of the world’s species with just a little over 1 million in, and where FishBase is one of the best parts.

But we cannot have it like this, endlessly chasing names that people drop here and there in more or less obtainable publications. Zoological and Botanical nomenclature have to go modern and collaborate with information society. There have to be a registration system for names, and the habit of paper publication has to go away in favor of digital publication.

To those not familiar with nomenclature, the situation is the following: For a name to be available and thus accepted to use as a scientific name for a species, genus, or family, it has to be published on paper with a few more simple conditions such as a certain number of copies and a degree of obtainability. It is perfectly OK to publish 2 copies of a species description and give them away to 25 people who all except one throw their copy away within 24 hours. The single surviving copy is now the globally accepted token for the name of that species. Not surprising that many taxonomist spend most of their time searching for publications instead of doing real research.

Digital-only publishing is not permitted. Well, there is an exception for CD-ROMs with deposition in libraries, but it is a bit awkward and it may be difficult to find those CD-ROMs.

The International Commission on Zoological Nomenclature is the body that writes the rules for Zoological Nomenclature – of course with the needs and well-being of the taxonomic community taken good care of. The Commission is now seriously considering digital-only, what we call e-only publishing of zoological names; and seriously considering a registration system for old and new names.

Both proposals are controversial. Concerning e-only publising there is now a proposed amendment to the Code, and the Commission has invited comments and discussions. Some of the discussion is now published, and worth reading.

Formalisation of ZooBank as a registry for new names is maybe a bit further away, but unavoidable. In contrast to a Code amendment, it requires an infrastructure and running funds that are not immediately available. Nevertheless, Richard Pyle, ichthyologist at the Bernice P. Bishop Museum in Hawai’i, is working day and night to build up the structure for ZooBank. You can already get a glimpse of the future from the development site. There are already much more than 5 000 nomenclatural acts registered.

ZooBank will have a healthy starter boost from Catalog of Fishes, so from a fish perspective this is perhaps no big step forward. But notice, there is an ichthyologist programming ZooBank!

Yes, Ichthyology rules biodiversity informatics …

In the beginning …

This is a fast start blog to introduce myself (only a glimpse) and what possible kind of writings can be expected here.

As an ichthyologist, I will write mainly about fish. I manage two e-mail lists, my twitter, and blogs for two projects. Let’s see if there is more to say.

As a biodiversity informatician, I will try to connect fish with computers. I already post biodiversity informatics news on a Swedish language blog. Let’s see if there is more to say.

Naturally, I must first introduce you to those wonderful resources.

cichlid-l is the discussion list for professionals and others interested in cichlids. Cichlids are freshwater fishes found in Africa, South and Central America, Madagascar and parts of Asia. It is the second or third most speciose family of fishes (and vertebrates). This list is fairly old, started in January 1995.

eurofish-l is the discussion list for all other ichthyologists, but with an intended focus on Europe and particularly the activities of the European Ichthyological Society.

I will be back about the access to these lists, since they currently seem to have been locked up behind the firewall.

The FishBase Blog is the news blog in FishBase. FishBase contains information about all the world’s fishes, available for free on the web, e.g., on the Swedish FishBase server.

The Swedish Fishbase team also maintains its own news blog, in the Swedish language.

And finally, GBIF-Sweden serves news form the biodiversity informatics world in the form of a blog.

Ah, twitter, somewhat neglected: http://twitter.com/svenok