DataCite

ci badge coveralls.io badge

Python API wrapper for the DataCite REST and Metadata Store APIs as well as DataCite JSON and XML generation.

Installation

The datacite package is on PyPI so all you need is:

$ pip install datacite

Usage

The datacite package implements a Python client for DataCite REST and MDS APIs.

You can find below an usage example of the DataCite REST client API wrapper. Please see the DataCite REST API documentation for further information.

 1import os
 2from datacite import DataCiteRESTClient, schema45
 3
 4data = {
 5    "creators": [
 6        {"name": "Smith, John"},
 7    ],
 8    "titles": [
 9        {
10            "title": "Minimal Test Case",
11        }
12    ],
13    "publisher": {"name": "Invenio Software"},
14    "publicationYear": "2015",
15    "types": {"resourceType": "Dataset", "resourceTypeGeneral": "Dataset"},
16    "schemaVersion": "http://datacite.org/schema/kernel-4",
17}
18
19# Validate dictionary
20schema45.validator.validate(data)
21
22# Generate DataCite XML from dictionary.
23doc = schema45.tostring(data)
24
25print(doc)
26
27# Initialize the REST client.
28d = DataCiteRESTClient(
29    username="DATACITE.ACCOUNT",
30    password="mypassword",
31    prefix="10.12345",
32    test_mode=True,
33)
34
35# Mint a DOI
36doi = d.public_doi(data, "http://example.org/test-doi")
37print(doi)
38
39# Reserve a draft DOI
40doi = d.draft_doi(data)
41print(doi)
42
43# Make the DOI public
44url = d.update_url(doi, url="http://example.org/test-doi2")
45d.show_doi(doi)
46
47# Get the DOI metadata
48doc = d.get_metadata(doi)
49
50# Hide the DOI
51d.hide_doi(doi)

You can find below full usage example of the DataCite MDS client API wrapper. Please see the DataCite MDS API documentation for further information.

 1from datacite import DataCiteMDSClient, schema45
 2
 3prefix = "10.1234"
 4
 5data = {
 6    "doi": f"{prefix}/test-doi",
 7    "creators": [
 8        {"name": "Smith, John"},
 9    ],
10    "titles": [
11        {
12            "title": "Minimal Test Case",
13        }
14    ],
15    "publisher": {"name": "Invenio Software"},
16    "publicationYear": "2015",
17    "types": {"resourceType": "Dataset", "resourceTypeGeneral": "Dataset"},
18    "schemaVersion": "http://datacite.org/schema/kernel-4",
19}
20
21# Validate dictionary
22assert schema45.validate(data)
23
24# Generate DataCite XML from dictionary.
25doc = schema45.tostring(data)
26
27# Initialize the MDS client.
28d = DataCiteMDSClient(
29    username="DATACITE.ACCOUNT",
30    password="mypassword",
31    prefix=prefix,
32    test_mode=True,
33)
34
35# Set metadata for DOI
36d.metadata_post(doc)
37
38# Mint new DOI
39d.doi_post(f"{prefix}/test-doi", "http://example.org/test-doi")
40
41# Get DOI location
42location = d.doi_get(f"{prefix}/test-doi")
43
44# Set alternate URL for content type (available through content negotiation)
45d.media_post(
46    f"{prefix}/test-doi",
47    {
48        "application/json": "http://example.org/test-doi/json/",
49        "application/xml": "http://example.org/test-doi/xml/",
50    },
51)
52
53# Get alternate URLs
54mapping = d.media_get(f"{prefix}/test-doi")
55assert mapping["application/json"] == "http://example.org/test-doi/json/"
56
57# Get metadata for DOI
58doc = d.metadata_get(f"{prefix}/test-doi")
59
60# Make DOI inactive
61d.metadata_delete(f"{prefix}/test-doi")

Please see the DataCite Testing guide to know how to test this client with your test credentials.

DataCite API Wrappers

Python API wrapper for the DataCite API.

class datacite.DataCiteMDSClient(username, password, prefix, test_mode=False, url=None, timeout=None)[source]

DataCite MDS API client wrapper.

Warning: The DataCite MDS API is being maintained but is no longer actively developed.

doi_get(doi)[source]

Get the URL where the resource pointed by the DOI is located.

Parameters:

doi – DOI name of the resource.

doi_post(new_doi, location)[source]

Mint new DOI.

Parameters:
  • new_doi – DOI name for the new resource.

  • location – URL where the resource is located.

Returns:

“CREATED” or “HANDLE_ALREADY_EXISTS”.

media_get(doi)[source]

Get list of pairs of media type and URLs associated with a DOI.

Parameters:

doi – DOI name of the resource.

media_post(doi, media)[source]

Add/update media type/urls pairs to a DOI.

Standard domain restrictions check will be performed.

Parameters:

media – Dictionary of (mime-type, URL) key/value pairs.

Returns:

“OK”

metadata_delete(doi)[source]

Mark as ‘inactive’ the metadata set of a DOI resource.

Parameters:

doi – DOI name of the resource.

Returns:

“OK”

metadata_get(doi)[source]

Get the XML metadata associated to a DOI name.

Parameters:

doi – DOI name of the resource.

metadata_post(metadata)[source]

Set new metadata for an existing DOI.

Metadata should follow the DataCite Metadata Schema: http://schema.datacite.org/

Parameters:

metadata – XML format of the metadata.

Returns:

“CREATED” or “HANDLE_ALREADY_EXISTS”

class datacite.DataCiteRESTClient(username, password, prefix, test_mode=False, url=None, timeout=None)[source]

DataCite REST API client wrapper.

check_doi(doi)[source]

Check doi structure.

Check that the doi has a form 12.12345/123 with the prefix defined

delete_doi(doi)[source]

Delete a doi.

This will only work for draft dois

Parameters:

doi – DOI (e.g. 10.123/456)

Returns:

doi_get(doi)[source]

Get the URL where the resource pointed by the DOI is located.

Parameters:

doi – DOI name of the resource.

draft_doi(metadata=None, doi=None)[source]

Create a draft doi.

A draft DOI can be deleted

If doi is not provided, DataCite will automatically create a DOI with a random, recommended DOI suffix

Parameters:
  • metadata – metadata for the DOI

  • doi – DOI (e.g. 10.123/456)

Returns:

get_doi(doi)[source]

Get the URL where the resource pointed by the DOI is located.

Parameters:

doi – DOI name of the resource.

get_media(doi)[source]

Get list of pairs of media type and URLs associated with a DOI.

Parameters:

doi – DOI name of the resource.

get_metadata(doi)[source]

Get the JSON metadata associated to a DOI name.

Parameters:

doi – DOI name of the resource.

hide_doi(doi)[source]

Hide a previously registered DOI.

This DOI will no longer be found in DataCite Search

Parameters:

doi – DOI to hide e.g. 10.12345/1.

Returns:

media_get(doi)[source]

Get list of pairs of media type and URLs associated with a DOI.

Parameters:

doi – DOI name of the resource.

metadata_get(doi)[source]

Get the JSON metadata associated to a DOI name.

Parameters:

doi – DOI name of the resource.

post_doi(data)[source]

Post a new JSON payload to DataCite.

private_doi(metadata, url, doi=None)[source]

Publish a doi in a registered state.

A DOI generated by this method will not be found in DataCite Search

This DOI cannot be deleted

If doi is not provided, DataCite will automatically create a DOI with a random, recommended DOI suffix

Metadata should follow the DataCite Metadata Schema: http://schema.datacite.org/

Parameters:

metadata – JSON format of the metadata.

Returns:

public_doi(metadata, url, doi=None)[source]

Create a public doi.

This DOI will be public and cannot be deleted

If doi is not provided, DataCite will automatically create a DOI with a random, recommended DOI suffix

Metadata should follow the DataCite Metadata Schema: http://schema.datacite.org/

Parameters:
  • metadata – JSON format of the metadata.

  • doi – DOI (e.g. 10.123/456)

  • url – URL where the doi will resolve.

Returns:

put_doi(doi, data)[source]

Put a JSON payload to DataCite for an existing DOI.

show_doi(doi)[source]

Show a previously registered DOI.

This DOI will be found in DataCite Search

Parameters:

doi – DOI to hide e.g. 10.12345/1.

Returns:

update_doi(doi, metadata=None, url=None)[source]

Update the metadata or url for a DOI.

Parameters:
  • url – URL where the doi will resolve.

  • metadata – JSON format of the metadata.

Returns:

update_url(doi, url)[source]

Update the url of a doi.

Parameters:
  • url – URL where the doi will resolve.

  • doi – DOI (e.g. 10.123/456)

Returns:

Errors

Errors for the DataCite API.

MDS error responses will be converted into an exception from this module. Connection issues raises datacite.errors.HttpError while DataCite MDS error responses raises a subclass of datacite.errors.DataCiteError.

exception datacite.errors.DataCiteBadRequestError[source]

Bad request error.

Bad requests can include e.g. invalid XML, wrong domain, wrong prefix. Request body must be exactly two lines: DOI and URL One or more of the specified mime-types or urls are invalid (e.g. non supported mimetype, not allowed url domain, etc.)

exception datacite.errors.DataCiteError[source]

Exception raised when the server returns a known HTTP error code.

Known HTTP error codes include:

  • 204 No Content

  • 400 Bad Request

  • 401 Unauthorized

  • 403 Forbidden

  • 404 Not Found

  • 410 Gone (deleted)

static factory(err_code, *args)[source]

Create exceptions through a Factory based on the HTTP error code.

exception datacite.errors.DataCiteForbiddenError[source]

Login problem, dataset belongs to another party or quota exceeded.

exception datacite.errors.DataCiteGoneError[source]

Requested dataset was marked inactive (using DELETE method).

exception datacite.errors.DataCiteNoContentError[source]

DOI is known to MDS, but not resolvable.

This might be due to handle’s latency.

exception datacite.errors.DataCiteNotFoundError[source]

DOI does not exist in the database.

exception datacite.errors.DataCitePreconditionError[source]

Metadata must be uploaded first.

exception datacite.errors.DataCiteRequestError[source]

A DataCite request error. You made an invalid request.

Base class for all 4XX-related HTTP error codes as well as 204.

exception datacite.errors.DataCiteServerError[source]

An internal server error happened on the DataCite end. Try later.

Base class for all 5XX-related HTTP error codes.

exception datacite.errors.DataCiteUnauthorizedError[source]

Bad username or password.

exception datacite.errors.HttpError[source]

Exception raised when a connection problem happens.

DataCite v4.5 Metadata Management

DataCite v4.5 JSON to XML transformations.

datacite.schema45.dump_etree(data)[source]

Convert JSON dictionary to DataCite v4.5 XML as ElementTree.

datacite.schema45.tostring(data, **kwargs)[source]

Convert JSON dictionary to DataCite v4.5 XML as string.

datacite.schema45.validate(data)[source]

Validate DataCite v4.5 JSON dictionary.

DataCite v4.3 Metadata Management

DataCite v4.3 JSON to XML transformations.

datacite.schema43.dump_etree(data)[source]

Convert JSON dictionary to DataCite v4.3 XML as ElementTree.

datacite.schema43.tostring(data, **kwargs)[source]

Convert JSON dictionary to DataCite v4.3 XML as string.

datacite.schema43.validate(data)[source]

Validate DataCite v4.3 JSON dictionary.

DataCite v4.2 Metadata Management

DataCite v4.2 JSON to XML transformations.

datacite.schema42.dump_etree(data)[source]

Convert JSON dictionary to DataCite v4.2 XML as ElementTree.

datacite.schema42.tostring(data, **kwargs)[source]

Convert JSON dictionary to DataCite v4.2 XML as string.

datacite.schema42.validate(data)[source]

Validate DataCite v4.2 JSON dictionary.

DataCite v4.1 Metadata Management

DataCite v4.1 JSON to XML transformations.

datacite.schema41.dump_etree(data)[source]

Convert JSON dictionary to DataCite v4.1 XML as ElementTree.

datacite.schema41.tostring(data, **kwargs)[source]

Convert JSON dictionary to DataCite v4.1 XML as string.

datacite.schema41.validate(data)[source]

Validate DataCite v4.1 JSON dictionary.

DataCite v4.0 Metadata Management

DataCite v4.0 JSON to XML transformations.

datacite.schema40.dump_etree(data)[source]

Convert JSON dictionary to DataCite v4.0 XML as ElementTree.

datacite.schema40.tostring(data, **kwargs)[source]

Convert JSON dictionary to DataCite v4.0 XML as string.

datacite.schema40.validate(data)[source]

Validate DataCite v4.0 JSON dictionary.

Changes

Version v1.3.1 (released 2025-07-23):

  • remove deprecated schema 31 test files

  • tests: reorganize test files and remove http schema access that was removed in lxml

  • fix: replaced pkg_resources with importlib.resources

Version v1.3.0 (released 2025-07-03):

  • Removes deprecated schema 3.1

  • Adds minimum python version 3.9

  • Switches 4.5 schema to https

  • Documentation clanup and improvements

Version v1.2.0 (released 2024-10-17):

  • Updates package setup and adds black formatting

  • Adds support for DataCite Metadata Schema v4.5. The version 4.5 jsonschema includes a number of changes and improvements:

    • Switches to jsonschema 2019-09 and adds more complete validation to catch mistyped elements

    • Switches publisher from a string to an object. This means you will need to change publisher to be structured like “publisher”: {“name”: “Invenio Software”} when you use version 4.5. This change is needed to support the addition of publisher identifiers.

    • Removes the identifiers field and added doi, prefix, and suffix fields. These fields are clearer, and DataCite appears to be moving away from the combined identifiers field. doi is not a required field since you may or may not have a DOI depending on your workflow.

    • Adds new relatedItem elements for publication metadata

    • Switches geolocation point values to numbers. This is to enable validation and is consistent with GeoJson and InvenioRDM. It is different from the DataCite REST API which uses strings, and submitted numbers will be turned into strings by DataCite.

    • Reorganizes geolocationPolygon to how DataCite is currently rendering this metadata

    • Adds support for the new resourceTypeGeneral and relationType values

    • General jsonschema organization improvements

Version v1.1.3 (released 2023-03-20):

  • Updates dependency versions and adds python 3.9 support

  • Changes internal definition name for affiliation in 4.3 schema

Version v1.1.2 (released 2021-06-22):

  • Standardizes function names in DataCiteRESTClient. Old functions will be depreciated in a future release

Version v1.1.1 (released 2021-04-20):

  • Fixes DataCiteRESTClient attributes’ type. Prefix, username and password are always cast to string.

Version v1.1.0 (released 2021-04-15):

  • Adds full support for DataCite Metadata Schema v4.2 and v4.3 XML generation.

  • Uses Official DataCite JSON Schema, which has the following notable changes from the previous schema:

    • Uses “identifiers” which is a combination of the XML “identifier” and “alternativeIdentifiers” elements

    • “creatorName” is now “name”

    • “contributorName” is now “name”

    • “affiliations” is now “affiliation” (is still an array)

    • “affilition” is now “name”

    • There is no longer a funder identifier object (the identifier and type are just elements)

  • Removes Python 2 support

  • Removes the old way of testing with DataCite: test mode for the MDS APIs and the test DOI 10.5072

Version v1.0.1 (released 2018-03-08):

  • Fixes schema location url for DataCite v4.1

Version v1.0.0 (released 2018-02-28):

  • Adds full support for DataCite Metadata Schema v4.1 XML generation.

Version v0.3.0 (released 2016-11-18):

  • Adds full support for DataCite Metadata Schema v4.0 XML generation.

  • Adds the message from the server in the error exceptions.

Version v0.2.2 (released 2016-09-23):

  • Fixes issue with generated order of nameIdentifier and affiliation tags.

Version v0.2.1 (released 2016-03-29):

  • Fixes issue with JSON schemas not being included when installing from PyPI.

Version v0.2.0 (released 2016-03-21):

  • Adds DataCite XML generation support.

Version 0.1 (released 2015-02-25):

  • Initial public release.

Contributing

Bug reports, feature requests, and other contributions are welcome. If you find a demonstrable problem that is caused by the code of this library, please:

  1. Search for already reported problems.

  2. Check if the issue has been fixed or is still reproducible on the latest master branch.

  3. Create an issue with a test case.

If you create a feature branch, you can run the tests to ensure everything is operating correctly:

$ python setup.py test

License

DataCite is free software; you can redistribute it and/or modify it under the terms of the Revised BSD License quoted below.

Copyright (C) 2015-2018 CERN. Copyright (C) 2018 Center for Open Science. Copyright (C) 2019-2024 Caltech. Copyright (C) 2024 Institute of Biotechnology of the Czech Academy of Sciences.

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  • Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Note

In applying this license, CERN does not waive the privileges and immunities granted to it by virtue of its status as an Intergovernmental Organization or submit itself to any jurisdiction.

Thank you to all the contributors <https://github.com/inveniosoftware/datacite/graphs/contributors>!