Description of image

How to Create Datasets for Notebooks

Notebooks are a web-based Jupyter IDE with shared persistent storage for long-term development and inter-notebook collaboration, backed by accelerated compute.


In addition to the available public datasets, you can create your own datasets using the Paperspace console or the CLI.

Create Dataset and Dataset Version Using the Paperspace Console

To create a new dataset, select the Data tab in a project and click Create a Dataset. In the Create a new dataset window, specify a name, an optional description, and a storage provider.

Create new dataset

If the team already has datasets, click Add to create a new dataset. On the next screen, you can drag and drop or upload your files and then click Upload.

Import dataset

Note the ID of the dataset to use elsewhere.

You can also create a dataset in a notebook as described in Connect Data Sources.

Note
Importing data or adding it in some other way such as a Workflow output creates a new version of the dataset.

Create Dataset and Dataset Version Using the CLI

To create a new dataset that does not yet have an ID in the CLI, use the gradient datasets create command:

$ gradient datasets create --name democli --<storage-provider-id> ssfe843ndkjdsnr

The output looks similar to the following:

Created dataset: dsr5zdx0thjhfe2

All Gradient datasets are versioned. To make any changes to data in a dataset, first you need to create a new version of the dataset. Use the following command, replacing <dataset-id> with the ID of the dataset you want to update:

$ gradient datasets versions create --id <dataset-id>

Once the version is created, you can then add files to the dataset version.

$ gradient datasets files put --id dst364npcw6ccok:fo5rp4m --source-path ./some-data/

Once all desired files are uploaded to the version, commit the version to the dataset.

$ gradient datasets versions commit --id dst364npcw6ccok:fo5rp4m

Once the dataset version is committed, the data is available in the Paperspace console for you to reference in Notebooks, Workflows, and Deployments.

View Datasets Using the CLI

To view a list of all datasets, use one of the following commands:

  • gradient datasets list

    The output looks similar to the following:

    +------+-----------------+-------------------------+
    | Name | ID              | Storage Provider        |
    +------+-----------------+-------------------------+
    | test | dst364npcw6ccok | test1 (splgct3arqdh77c) |
    +------+-----------------+-------------------------+
    
  • gradient datasets details --id <dataset-id>

    The output looks similar to the following:

    +-----------------+-------------------------+
    | Name            | test                    |
    +-----------------+-------------------------+
    | ID              | dst364npcw6ccok         |
    | Description     |                         |
    | StorageProvider | test1 (splgct3arqdh77c) |
    +-----------------+-------------------------+
    

View Dataset Files Using the CLI

To view dataset files, use one of the following command:

$ gradient datasets files list --id <dataset-id>

The output looks similar to the following:

+-----------+------+
| Name      | Size |
+-----------+------+
| hello.txt | 12   |
+-----------+------+