Research data support at the Bolin Centre

Publish at the Bolin Centre Database

How to publish data

Getting started

It is easy, just click on 'Contribute' in the upper right corner on our web site https://bolin.su.se/data/ to reach our data entry form, and follow the instructions there.

Please, fill in as much metadata as you can. Follow the short instructions under each entry point.

Save and update

Click on the green button ‘Save draft’ at the bottom of the Data Entry Form, and then click on ‘Preview your data’. You will then see a preview of your draft metadata. You can save the long and cryptic address to your draft metadata (which you find in the address bar) as a bookmark. Then you can come back, and make further revisions later.

Each time you save the metadata, a new revision with a new web address is created. You can find all revisions in the Revision list shown at the bottom of the metadata presentation. Start from the most recent revision when you want to update the metadata.

Share draft

You can give the address to the draft dataset if someone needs to see it before it is published. This is the long and cryptic address that you can find in the address bar in your browser. Note that this is not the final address for the published dataset. Please do not use this in your publication and do not spread it to others than reviewers and collaborators.

Upload data files

If the total data size is smaller than 10 GB, you can upload it directly in the Data Entry Form if you are connected to the wired Stockholm University computer network. If your data is contained in more than one file, please upload a zip file containing all data. If not, please provide us a link to your data. We also offer other data transfer possibilities if needed.

Publish

Send us an email to bolindata@su.se when you have entered as much information as you can on your own, and give us information about when you want the dataset to be published.

We will then review and edit your metadata and data in order to fulfil general requirements for data publishing. This is called data curation. It is an iterative process between you and us, which can be quick (a day) or slow (weeks) depending on the complexity and how famliar you are with our procedures.

We will publish your dataset when we agree that it can be made public.

DOI

Each dataset in the Bolin Centre Database is assigned a unique DOI. This makes it easy to cite and find your data. If a dataset has more than one version, each version will have its own DOI. The DOI will be active when the dataset is published.

We will decide the final DOI after we have understood the data topic. This decision might involve some correspondence. You can provide the final DOI to a journal when we have decided what the DOI will be. The best guarantee is to publish your dataset first to be sure of the DOI and then provide it to the journal.

Please note that a preliminary DOI will always be shown in your draft metadata, already when you have saved your very first revision of your draft metadata. This preliminary DOI is not active, will not work until the dataset is published and might change before publication. You should not give the DOI to your data to anyone before we have decided the final DOI.

Get support

Please do not hesitate to contact us at bolindata@su.se if you have questions or need assistance with filling in the Data Entry Form. We will be happy to hear from you, and we will answer as soon as possible. We love data!

Examples

We recommend that you look at some recently published datasets to see different examples of how metadata can be written. Our five latest datasets are always found at the bottom of our start page.

Large data

If your dataset is large (>10 GB), or if you are working from a computer outside Stockholm University, we will need to be in contact about how to transfer your data files to us. We can also talk about possible ways to reduce the data size, if needed.

If your data is very large (> 10 TB) and it is more appropriate to store it permanently somewhere else than in the Bolin Centre Database, your metadata should include an internet address leading to the data.

Data size reduction

Some research projects can result in very large datasets. All data may, or can, not always be stored forever. It is therefore necessary to reduce the data size before they are published and archived. For datasets larger than 200 GB we recommend to reduce the size before publication. However, for datasets that are considered particularly important we can accept substantially larger data volumes, in the order of TB.

Primarily, we ask you to keep the data values but reduce the file sizes by:

  • Compressing data
  • Reducing data precision
Observations

Publish everything or as much as possible, including raw data.

Processed data

Publish entire large data products such as interpolated grids, re-analyses, and other important large datasets which are highly requested.

Model simulations

We divide model data into two types:

  • Permanent small sub-set for publication, where all results may not be fully re-produced
  • Temporary extensive model output for the scientific review process, where results can be re-produced
Sub-set for publication

Most simulation data, except CMIP and similar, is probably irrelevant for future scientists to re-use. Therefore, we publish only a small sub-set. However, we encourage scientists to sub-set a bit more than only “figure data” but try very hard to get under 200 GB. This may imply that the results can not be fully re-produced.

We strongly urge scientists to provide source code, model configurations and setup at our code repository.

Some ways to reduce the data size:

  1. Regional subset
  2. Parameters subset
  3. Temporal subset
  4. Mixture of complete data for some variables, but reduced for other
  5. Figure data
  6. Summary statistics of raw data (processed data)
  7. Compress data
Temporary extensive model output

Provide temporary extensive model output, generously sub-setted, for the scientific review process, which will be deleted after one year.

How to publish source code

Browse to git.bolin.su.se/bolin, register and upload your code. You can use the web interface or a command line interface, whichever you prefer.

Using git

You can work with your code project, share it with colleagues, and create new revisions in the same way as you may already be familiar with from working with git. Currently, we have a limit of 100 MB for each project.

You can choose to set the project visibility to private or internal. If you want it to be public, we can help you.

Citing source code

If you want to make it possible to cite a particular version of your source code, we can provide a DOI. In order to do so the following steps are done.

  1. Review project — We will curate your project according to our standard.
  2. Clone project — We will clone your project to a user under our ownership in order to ensure that the project and the specific version is permanently accessible. Nobody else can then delete the project.
  3. Tag version — We will create a tag that identifies a specific version. This tag will be named e.g. '1.0.0'. This is to tag or 'freeze' a specific point in a project's history. This 'frozen' point in time can then be cited.
  4. Assign DOI — We will assign a DOI to the specific tag.

DOI

We can provide a DOI to a specific version published by the Bolin Centre.

License

Please make sure to choose a license for your code. We recommend the MIT licence.

README file

Please use our template for a README.md file that briefly describes the code repository and how to use it.

Jupyter Notebooks

We promote and encourage you to publish Jupyter Notebooks along with your code.

More information

There are many places with more information about git on the web, e.g. git-scm.com or a short cheat sheet. There is also a simple tutorial for GitLab.