`rhino_health.lib.endpoints.dataset.dataset_dataclass`#

Module Contents#

Classes#

`DatasetCreateInput`	Input arguments for adding a new Dataset
`Dataset`

class rhino_health.lib.endpoints.dataset.dataset_dataclass.DatasetCreateInput(**data)#

Bases: BaseDataset

Input arguments for adding a new Dataset

csv_filesystem_location: str | None#: The location the Dataset data is located on-prem. The file should be a CSV.

method: typing_extensions.Literal[DICOM, filesystem] = 'filesystem'#: What source are we importing imaging data from. Either a DICOM server, or the local file system

is_data_deidentified: bool | None = False#: Is the data already deidentified?

image_dicom_server: str | None#: The DICOM Server URL to import DICOM images from

image_filesystem_location: str | None#: The on-prem Location to import DICOM images from

file_base_path: str | None#: The location of non DICOM files listed in the dataset data CSV on-prem

sync: bool | None = True#: Should we perform this import request synchronously.

name: str#: The name of the Dataset

description: str#: The description of the Dataset

base_version_uid: str | None#: The original Dataset this Dataset is a new version of, if applicable

project_uid: typing_extensions.Annotated[str, Field(alias='project')]#: The unique ID of the Project this Dataset belongs to.

workgroup_uid: typing_extensions.Annotated[str, Field(alias='workgroup')]#: The unique ID of the Workgroup this Dataset belongs to .. warning workgroup_uid may change to primary_workgroup_uid in the future

data_schema_uid: typing_extensions.Annotated[Any, Field(alias='data_schema')]#: The unique ID of the DataSchema this Dataset follows

import_args()#

class rhino_health.lib.endpoints.dataset.dataset_dataclass.Dataset(**data)#

property primary_workgroup#

property dataset_info#: Sanitized metadata information about the Dataset.

property data_schema: DataSchema#

Return the DataSchema Dataclass associated with data_schema_uid

Warning

The result of this function is cached. Be careful calling this function after making changes. All dataclasses must already exist on the platform before making this call.

Returns:

data_schema: DataSchema: Dataclass representing the DataSchema

property project: Project#

Return the Project Dataclass associated with project_uid

Warning

The result of this function is cached. Be careful calling this function after making changes. All dataclasses must already exist on the platform before making this call.

Returns:

project: Project: Dataclass representing the Project

property workgroup: Workgroup#

Return the Workgroup Dataclass associated with workgroup_uid

Warning

The result of this function is cached. Be careful calling this function after making changes. All dataclasses must already exist on the platform before making this call.

Returns:

workgroup: Workgroup: Dataclass representing the Workgroup

property creator: User#

Return the User Dataclass associated with creator_uid

Warning

The result of this function is cached. Be careful calling this function after making changes. All dataclasses must already exist on the platform before making this call.

Returns:

creator: User: Dataclass representing the User

uid: str#: The unique ID of the Dataset

version: int | None = 0#: Which revision this Dataset is

num_cases: int#: The number of cases in the Dataset

import_status: str#: The import status of the Dataset

data_schema_uid: str#

name: str#: The name of the Dataset

description: str#: The description of the Dataset

base_version_uid: str | None#: The original Dataset this Dataset is a new version of, if applicable

creator_uid: str#: The UID of the creator of this dataclass on the system

created_at: str#: When this dataclass was created on the system

data_schema_name: str#: The data_schema name

project_name: str#: The project name

workgroup_name: str#: The workgroup name

creator_name: str#: The creator name

run_code(run_code, print_progress=True, **kwargs)#

Create and run code on this dataset using defaults that can be overridden

Warning

This function relies on a dataset’s metadata so make sure to create the input dataset first

Warning

This feature is under development and the interface may change

run_code: str: The code that will run in the container
print_progress: bool = True: Whether to print how long has elapsed since the start of the wait
name: Optional[str] = “{dataset.name} (v.{dataset.version}) containerless code”: Model name - Uses the dataset name and version as part of the default (eg: when using a the first version of dataset named dataset_one the name will be dataset_one (v.1) containerless code)
description: Optional[str] = “Python code run”: Model description
container_image_uri: Optional[str] = {ENV_URL}/rhino-gc-workgroup-rhino-health:generic-python-runner”: Uri to container that should be run - ENV_URL is the environment ecr repo url
input_data_schema_uid: Optional[str] = dataset.data_schema_uid: The data_schema used for the input dataset - By default uses the data_schema used to import the dataset
output_data_schema_uid: Optional[str] = None (Auto generate data schema): The data_schema used for the output dataset - By default generates a schema from the dataset_csv
output_dataset_names_suffix: Optional[str] = “containerless code”: String that will be added to output dataset name
timeout_seconds: Optional[int] = 600: Amount of time before timeout in seconds

Returns:

Tuple: (output_datasets, code_run): output_datasets: List of Dataset Dataclasses code_run: A CodeRun object containing the run outcome

Examples

dataset.run_code(run_code = <df[‘BMI’] = df.Weight / (df.Height ** 2)>)

get_metric(metric_configuration: rhino_health.lib.metrics.base_metric.BaseMetric)#

Queries on-prem and returns the result based on the METRIC_CONFIGURATION for this Dataset.

rhino_health.lib.endpoints.dataset.dataset_dataclass#

Module Contents#

Classes#

`rhino_health.lib.endpoints.dataset.dataset_dataclass`#