arthurai.core.data_service.DatasetService#

class arthurai.core.data_service.DatasetService#

Bases: object

Methods

chunk_image_set

rtype

str

chunk_image_set_with_directory_path_or_files

Takes in a directory path with parquet and/or json files containing image attributes.

files_size

rtype

int

send_files_from_dir_iteratively

Sends parquet or json files iteratively from a specified directory to a specified url for a given model

send_files_iteratively

rtype

Tuple[List[Any], Dict[str, Any]]

Attributes

COUNTS

DEFAULT_MAX_IMAGE_DATA_BYTES

FAILURE

FAILURES

MAX_ROWS_PER_FILE

ROW_GROUP_SIZE

SUCCESS

TOTAL

static chunk_image_set_with_directory_path_or_files(image_attribute, directory_path=None, files=None, max_image_data_bytes=300000000)#

Takes in a directory path with parquet and/or json files containing image attributes. Divides images up into 300MB chunks, then zipped, the parquet/json file is also split up to match. The files will have random filename, and image zips will have matching name.

Return type

str

static send_files_from_dir_iteratively(model, directory_path, endpoint, upload_file_param_name, additional_form_params=None, retries=0)#

Sends parquet or json files iteratively from a specified directory to a specified url for a given model

Parameters
  • retries (int) – Number of times to retry the request if it results in a 400 or higher response code

  • model (ArthurModel) – the arthurai.client.apiv2.model.ArthurModel

  • directory_path (str) – local path containing parquet and/or json files to send

  • endpoint (str) – POST url endpoint to send files to

  • upload_file_param_name (str) – key to use in body with each attached file

  • additional_form_params (Optional[Dict[str, Any]]) – dictionary of additional form file params to send along with parquet or json file

Raises

MissingParameterError – the request failed

:returns A list of files which failed to upload

Return type

Tuple[List[Any], Dict[str, Any]]