Welcome to Rhino SDK’s documentation!#
Why use the SDK?#
The Rhino SDK is a comprehensive Python library that facilitates programmatic interaction with the Rhino API. It enables the user to leverage the functionalities of the FCP within their favorite development environment, including calculating metrics and running code on distributed data in a privacy-preserving fashion.
Simplified Integration: Incorporate FCP functionalities into Python projects to execute tasks, manage projects, and access datasets directly within a coding environment.
Programmatic Control: Automate complex operations, from project creation to workflow orchestration, using Python’s power and flexibility.
Custom Workflows: Develop and manage workflows that align with unique research needs, leveraging the SDK’s robust features.
Data Exploration: Unlock deeper insights by combining the SDK with Python’s analytical tools, enabling seamless data exploration and visualization on distributed datasets.
What can you do with the SDK?#
With the Rhino SDK you can do almost everything that you can do in the Rhino web UI, but in a programmatic manner (making it easier to automate actions). In addition, there are several SDK-only capabilities (e.g. Federated Analytics). Some common actions that can be performed using the Rhino SDK:
Quickstart Guide#
Install the SDK#
Install the Python SDK using pip, the Python package manager. The SDK package includes all the necessary modules to interact with the Rhino API.
pip install rhino-health
Authentication & Project Selection#
Authentication is performed by passing the login email and password that you use when accessing the Rhino web UI.
Rather than hardcode your password into an SDK script, we suggest using getpass().
import rhino_health as rh
from getpass import getpass
# Enter Rhino username and password
my_username = "my_email@example.com" # REPLACE
session = rh.login(username=my_username, password=getpass())
After authentication, you can use the name of your project to retrieve your Project’s UUID and subsequently your Workgroup UUID. Alternatively, you can retrieve each object’s UUID from the Rhino web platform by following these instructions.
# Identify project by name
project = session.project.get_project_by_name('My Project') # REPLACE
workgroup = session.project.get_collaborating_workgroups(project.uid)[0]
# Alternatively, identify project by UID
project_uid = 'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX' # REPLACE
workgroup_uid = session.project.get_collaborating_workgroups(project_uid)[0].uid
Examples#
Please navigate to the Rhino user-resources repository on Github for more examples and notebooks demonstrating the use of the SDK.
Import Data from SQL#
Create data pipelines from your data warehouse to Rhino client; this example automatically creates a dataset on the Rhino FCP by executing a SQL query against a database.
from rhino_health.lib.endpoints.sql_query.sql_query_dataclass import (SQLQueryImportInput,SQLQueryInput,SQLServerTypes,ConnectionDetails)
# Setup database connection
connection_details = ConnectionDetails(
server_user="my_db_user", # REPLACE
password=getpass(),
server_type=SQLServerTypes.POSTGRESQL, # REPLACE (as needed)
server_url="mydb.example.org", # REPLACE
db_name="my_db_name") # REPLACE
# Define a query
my_sql_query = "SELECT * FROM my_schema.my_table" # REPLACE
# Establish project and dataset
import_run_params = SQLQueryImportInput(
session=session,
project=project_uid,
workgroup=workgroup_uid,
connection_details=connection_details,
dataset_name="My SQL Dataset", # REPLACE
timeout_seconds=600,
is_data_deidentified=FALSE,
sql_query=my_sql_query)
# execute query and save data
response = session.sql_query.import_dataset_from_sql_query(import_run_params)
Create + Run Python Code Objects#
Users of the Rhino SDK can create and run Python code objects from either the SDK or UI. During the creation of your CodeObject, wou will give the CodeObject a name, description, project_uid, code_object_type, and the config where you will provide your Python code.
from rhino_health.lib.endpoints.code.code_objects_dataclass import CodeObject,CodeObjectCreateInput,CodeTypes,CodeRunType,CodeObjectRunInput
# Create Python code object
python_code = "print('Hello, World')" # REPLACE
code_object_params = CodeObjectCreateInput(
name = "CODE_OBJECT_NAME", # REPLACE
description = "CODE_OBJECT_DESCRIPTION", # REPLACE
project_uid = project.uid,
code_object_type = CodeTypes.PYTHON_CODE,
input_data_schema_uids=[input_data_schema_uid], # REPLACE
output_data_schema_uids=[output_data_schema_uid], # REPLACE
config = {
"code_run_type": CodeRunType.DEFAULT,
"python_code": python_code
}
)
code_object = session.code.create_code_object(code_object_params)
print(f"Got Code Object '{code_object.name}' with UID {code_object.uid}")
# Configure Code Object Run
code_object_params = CodeObjectRunInput(
code_object_uid = code_object.uid,
input_dataset_uids = [dataset1.uid, dataset2.uid], # REPLACE
output_dataset_naming_templates=['{{ input_dataset_names.0 }}-out'], #REPLACE
)
# Run Python code object
code_run = session.code.run_code_object(code_object_params)
run_result = code_run.wait_for_completion()
print(f"Result status is '{run_result.status.value}', errors={run_result.results_info.get('errors') if run_result.results_info else None}")
Create + Run Generalized Compute Code Objects#
In the Rhino Federated Computing Platform (FCP), the Generalized Compute (GC) Code Object represents a versatile and powerful way to execute pre-built container images within the FCP environment. The SDK can be used to to seamlessly build and push containers into projects from container images that have been uploaded to the Rhino container service.
from rhino_health.lib.endpoints.code_object.code_object_dataclass import CodeObjectCreateInput, CodeTypes,
# Create Generalized Compute code object
my_container_image_name = "container_image_name" # REPLACE
creation_params = CodeObjectCreateInput(
name="My Code Object", # REPLACE
code_type=CodeTypes.GENERALIZED_COMPUTE,
config={"container_image_uri": session.get_container_image_uri(my_container_image_name)},
project_uid=project_uid,
input_data_schema_uids=[input_data_schema_uid], # REPLACE
output_data_schema_uids=[output_data_schema_uid], # REPLACE
)
code_object = session.code_object.create_code_object(creation_params)
# Create Generalized Compute code object
code_object_params = CodeObjectRunInput(
code_object_uid = code_object.uid,
input_cohort_uids = [dataset1.uid, dataset2.uid], # REPLACE
output_dataset_names_suffix = "OUTPUT_SUFFIX" # REPLACE
)
code_run = session.code.run_code_object(code_object_params)
run_result = code_run.wait_for_completion()
print(f"Result status is '{run_result.status.value}', errors={run_result.results_info.get('errors') if run_result.results_info else None}")
Perform Federated Statistical Analyses#
Use Rhino’s federated statistical methods to analyze distributed data and generate insights.
Basic Metrics: To calculate basic numeric metrics such as count, mean, sum, and standard deviation, use the following syntax to create the metric configuration and retrieve the results:
from rhino_health.lib.metrics import Count, Mean, StandardDeviation
dataset_uids = ["dataset_id_1", "dataset_id_2"] # REPLACE
# Calculate Mean
mean_config = Mean(variable="Height")
response_mean = session.project.aggregate_dataset_metric(dataset_uids, mean_config)
# Calculate Standard Deviation
stddev_config = StandardDeviation(variable="Height") # Replace with actual variable name
response_stddev = session.project.aggregate_dataset_metric(dataset_uids, stddev_config)
# Calculate Count
count_config = Count(variable="id")
response_count = session.project.aggregate_dataset_metric(dataset_uids, count_config)
# Make API call for Mean calculation
response_mean = session.project.aggregate_dataset_metric(dataset_uids, mean_config)
Advanced Metrics: More complex metrics with filters and groupings can be calculated when simple metrics aren’t sufficient. The group_by parameter allows users to organize metrics based on specific categorical variables, providing segmentation while the data_filters parameter enables users to refine your analysis by setting conditions and filtering the output by certain criteria.
# Calculate Mean with among persons within a specific Weight range and segment by gender
mean_config = Mean(
variable="Height",
group_by={"groupings": ["Gender"]},
data_filters=[
{
"filter_column": "Weight",
"filter_type": ">",
"filter_value": 50
},
{
"filter_column": "Weight",
"filter_type": "<",
"filter_value": 80
}])
Statistical Testing: A wide breath of statistical algorithms are available for use via the SDK which can be used to generate insight across collaborating sites in a federated computing project.
from rhino_health.lib.metrics.statistics_tests import TTest, ChiSquare
t_test = TTest(numeric_variable="Spo2 Level", categorical_variable="Pneumonia")
session.project.aggregate_dataset_metric(dataset_uids, t_test)
chi_square_config = ChiSquare(variable="id", variable_1="Pneumonia", variable_2="Gender")
result = project.aggregate_dataset_metric(dataset_uids, chi_square_config)
API Reference#
This provides an overview of all public objects, functions and methods. All classes and functions exposed in rhinohealth.* namespace are public.