vardb.metadata_wrangling.oasis package¶

The oasis package is a collection of scripts used for cleaning data from the Oasis database at BCCA, and creating tables for loading to vardb.

Submodules¶

vardb.metadata_wrangling.oasis.demographics module¶

vardb.metadata_wrangling.oasis.demographics.add_biopsy_number_column(dataframe)¶

Add the biopsy_number column to the Demographics dataframe sorted by the biopsy_date in ascending order

Parameters:	dataframe – Demographics dataframe
Returns:	Demographics dataframe with the biopsy_number column added to it

vardb.metadata_wrangling.oasis.demographics.extract_biopsy_columns(dataframe_columns)¶

Extract the Biopsy columns from the Clinical dataframe columns which stores the Biopsy information

Parameters:	dataframe_columns – Clinical dataframe
Returns:	List of columns which stores the Biopsy information

vardb.metadata_wrangling.oasis.demographics.extract_data(dataframe)¶

Extract demographic data from the clinical dataframe

Parameters:	dataframe – Clinical dataframe
Returns:	Demographics dataframe

vardb.metadata_wrangling.oasis.demographics.get_demographics_data(clinical_dataframe)¶

Work with Demographics data

Parameters:	clinical_dataframe – The original clinical dataframe
Returns:	Validated Demographics dataframe

vardb.metadata_wrangling.oasis.demographics.validate_biopsy_date_less_than_or_equal_pog_report_date(row)¶

Iterate over each row of the dataframe to validate biopsy_date <= pog_report_date

Parameters:	row – Each row of the demographics dataframe
Returns:	The error code string for that row

vardb.metadata_wrangling.oasis.demographics.validate_blood_collection_date_less_than_or_equal_pog_report_date(row)¶

Iterate over each row of the dataframe to validate blood_collection_date <= pog_report_date

Parameters:	row – Each row of the demographics dataframe
Returns:	The error code string for that row

vardb.metadata_wrangling.oasis.demographics.validate_consent_date_less_than_or_equal_pog_report_date(row)¶

Iterate over each row of the dataframe to validate consent_date <= pog_report_date

Parameters:	row – Each row of the demographics dataframe
Returns:	The error code string for that row

vardb.metadata_wrangling.oasis.demographics.validate_data(dataframe)¶

Validate Demographics data

Parameters:	dataframe – Demographics dataframe
Returns:	Validated Demographics dataframe

vardb.metadata_wrangling.oasis.demographics.validate_mandatory_columns(row)¶

Iterate over each row of the dataframe to validate the mandatory columns in the demographic data: patient_id sex consent_date consent_age

Parameters:	row – Each row of the demographics dataframe
Returns:	The error code string for that row

vardb.metadata_wrangling.oasis.demographics.validate_pog_report_date_mandatory_columns(row)¶

Iterate over each row of the dataframe to validate the mandatory columns if pog_report_date exists in the demographic data

blood_collection_date biopsy_date bx_loc_radiated prior_primary_tumour biopsy_site post_pog_activities diag_changed re_bx_prog1

Parameters:	row – Each row of the demographics dataframe
Returns:	The error code string for that row

vardb.metadata_wrangling.oasis.demographics.validate_post_pog_treatment_columns_for_null(row)¶

Iterate over each row of the dataframe to validate when post_pog_activies is not “POG informed treatment not given”,

then all of “post_pog_treatment_*” should be null: patient_id post_pog_treatment_deceased post_pog_treatment_sick post_pog_treatment_decision_pt post_pog_treatment_decision_phys post_pog_treatment_na post_pog_treatment_cost post_pog_treatment_travel post_pog_treatment_unknown

Parameters:	row – Each row of the demographics dataframe
Returns:	The error code string for that row

vardb.metadata_wrangling.oasis.demographics.validate_post_pog_treatment_columns_for_ys(row)¶

Iterate over each row of the dataframe to validate when post_pog_activies is “POG informed treatment not given”, then exactly one of the “post_pog_treatment_*” should have a “Y” and the rest should be null

post_pog_treatment_deceased post_pog_treatment_sick post_pog_treatment_decision_pt post_pog_treatment_decision_phys post_pog_treatment_na post_pog_treatment_cost post_pog_treatment_travel post_pog_treatment_unknown

Parameters:	row – Each row of the demographics dataframe
Returns:	The error code string for that row

vardb.metadata_wrangling.oasis.demographics.validate_re_bx_date_for_not_null(row)¶

Iterate over each row of the dataframe to validate when re_bx_prog1, re_bx_prog2, etc is ‘Y’, then re_bx_date1, re_bx_date2, etc should not be null

Parameters:	row – Each row of the demographics dataframe
Returns:	The error code string for that row

vardb.metadata_wrangling.oasis.demographics.validate_re_bx_date_greater_than_or_equal_biopsy_date(row)¶

Iterate over each row of the dataframe to validate re_bx_date1, re_bx_date2, etc. >= biopsy_date

Parameters:	row – Each row of the demographics dataframe
Returns:	The error code string for that row

vardb.metadata_wrangling.oasis.diagnosis module¶

vardb.metadata_wrangling.oasis.diagnosis.extract_data(dataframe)¶

Extract diagnosis data from the clinical dataframe

Parameters:	dataframe – Clinical dataframe
Returns:	Diagnosis dataframe

vardb.metadata_wrangling.oasis.diagnosis.get_diagnosis_data(clinical_dataframe)¶

Work with Diagnosis data

Parameters:	clinical_dataframe – The original clinical dataframe
Returns:	Validated Diagnosis dataframe

vardb.metadata_wrangling.oasis.diagnosis.reshape_diagnosis_data(dataframe)¶

Reshapes the Diagnosis dataframe by applying pandas Wide to Long method

Parameters:	dataframe – Diagnosis dataframe
Returns:	Reshaped Diagnosis dataframe

vardb.metadata_wrangling.oasis.diagnosis.validate_data(dataframe)¶

Validate Diagnosis data

Parameters:	dataframe – Diagnosis dataframe
Returns:	Validated Diagnosis dataframe

vardb.metadata_wrangling.oasis.diagnosis.validate_mandatory_columns(row)¶

Iterate over each row of the dataframe to validate the mandatory columns in the diagnosis data: site_desc tumour_group diagnosis_date age_at_diagnosis

Parameters:	row – Each row of the diagnosis dataframe
Returns:	The error code string for that row

vardb.metadata_wrangling.oasis.drug_map module¶

vardb.metadata_wrangling.oasis.drug_map.drop_comma_separated_drugs_column(drug_treatment_dataframe)¶

Drop the comma_separated_drugs column from the Drug Treatment dataframe

Parameters:	drug_treatment_dataframe – Drug Treatment dataframe

vardb.metadata_wrangling.oasis.drug_map.eliminate_duplicates_and_sort_drug_list(drug_list)¶

Eliminate duplicate drug names Sort the drug names alphabetically

Parameters:	drug_list – The list of drug names
Returns:	The sorted drug list

vardb.metadata_wrangling.oasis.drug_map.get_longest_length_matched_token(matched_tokens)¶

Get the token with the longest length that matched

Parameters:	matched_tokens – List of matched drug tokens
Returns:	Matching token

vardb.metadata_wrangling.oasis.drug_map.get_tokens_matching_the_drug_map(drug_tokens)¶

Get the list of tokens that match to the drug map YAML file

Parameters:	drug_tokens – Drug name tokens
Returns:	List of matching tokens

vardb.metadata_wrangling.oasis.drug_map.insert_original_drug_string_column_before_error_column(drug_treatment_dataframe)¶

Copy the ‘drug_list’ column and rename it to ‘original_drug_string’ Insert the ‘original_drug_string’ column before the ‘error’ column

Parameters:	drug_treatment_dataframe – Drug Treatment dataframe
Returns:	Drug Treatment dataframe with the ‘original_drug_string’ column

vardb.metadata_wrangling.oasis.drug_map.map_oasis_drug_to_ontology(row)¶

Map Oasis drug names to Ontology

Parameters:	row – Each row of the Drug Treatment Dataframe
Returns:	List of Ontology-mapped drugs

vardb.metadata_wrangling.oasis.drug_map.map_oasis_drugs_using_drug_map(drug_treatment_dataframe)¶

Create the Drug Map and update the Drug Treatment table

Parameters:	drug_treatment_dataframe – Drug Treatment dataframe

vardb.metadata_wrangling.oasis.drug_map.split_and_strip_drug_names(drug_names)¶

Split the comma separated drug names into a list Strip out the whitespaces in the beginning and end of the drug name

Parameters:	drug_names – List of drug names from Oasis
Returns:	Processed column cell data filtering out empty strings ‘a’,,’ b’ –> [‘a’,’b’] or Nan for empty cells

vardb.metadata_wrangling.oasis.drug_map.split_drug_names_on_special_characters(oasis_drug_name)¶

Split the drug names on special characters (E.g. ‘ ‘, ‘(‘, ‘)’) E.g. a (b) –> [‘a’, ‘b’]

Parameters:	oasis_drug_name – Oasis drug name
Returns:	List of split drug names

vardb.metadata_wrangling.oasis.drug_map.tokenize_drug_name(oasis_drug_name)¶

Tokenize the Drug names E.g. Input: ‘a b’ Output: [‘a’, ‘b’, ‘ab’]

Parameters:	oasis_drug_name – Drug name from Oasis
Returns:	Drug name tokens

vardb.metadata_wrangling.oasis.error_code module¶

vardb.metadata_wrangling.oasis.error_code.append_error_codes(old_error_code, new_error_code)¶

Append the error code strings together

Parameters:	old_error_code – Old error code new_error_code – New error code
Returns:	Appended error code strings with or without comma(s)

vardb.metadata_wrangling.oasis.error_code.collect_error_codes(error_reporting_dataframe, demographics_dataframe, diagnosis_dataframe, drug_treatment_dataframe, radiation_dataframe, diagnosis_error_dataframe, drug_treatment_error_dataframe, radiation_error_dataframe)¶

Perform operations to generate and process the error code column in the error and individual tables

Parameters:

error_reporting_dataframe – Clinical dataframe where the errors are reported
demographics_dataframe – Demographics dataframe
diagnosis_dataframe – Diagnosis dataframe
drug_treatment_dataframe – Drug Treatment dataframe
radiation_dataframe – Radiation dataframe
diagnosis_error_dataframe – Diagnosis Error dataframe
drug_treatment_error_dataframe – Drug Treatment Error dataframe
radiation_error_dataframe – Radiation Error dataframe

Returns:

Clinical dataframe where the error are reported

vardb.metadata_wrangling.oasis.error_code.concatenate_error_codes(row)¶

Iterate over every row of the error dataframe to concatenate the individual error codes

Parameters:	row – Each row of the error code dataframe
Returns:	Row information

vardb.metadata_wrangling.oasis.error_code.generate_error_dataframe(dataframe, demographics_dataframe, diagnosis_error_dataframe, drug_treatment_error_dataframe, radiation_error_dataframe)¶

Generate the error code reporting dataframe

Parameters:	dataframe – Copy of the Clinical dataframe demographics_dataframe – Demographics error code dataframe diagnosis_error_dataframe – Diagnosis error code dataframe drug_treatment_error_dataframe – Drug Treatment error dataframe radiation_error_dataframe – Radiation error dataframe
Returns:	Generated error code dataframe

vardb.metadata_wrangling.oasis.error_code.group_and_aggregate_error_codes(dataframe, aggregate_column)¶

Group by and aggregate the error codes from individual dataframes

Parameters:	dataframe – Input dataframe aggregate_column – The column on which to perform the aggregate operation
Returns:	Dataframe grouped by the error codes for each patient id and reset index

vardb.metadata_wrangling.oasis.error_code.rename_error_code_column_to_errors(demographics_dataframe, diagnosis_dataframe, drug_treatment_dataframe, radiation_dataframe)¶

Rename the error code columns in the individual tables

Parameters:	demographics_dataframe – Demographics dataframe diagnosis_dataframe – Diagnosis dataframe drug_treatment_dataframe – Drug treatment dataframe radiation_dataframe – Radiation dataframe

vardb.metadata_wrangling.oasis.error_code.replace_nan_with_empty_string(dataframe)¶

Replace string Nan’s with empty string ‘’

Parameters:	dataframe – Error dataframe
Returns:	Error dataframe with string Nan’s replaced with ‘’

vardb.metadata_wrangling.oasis.helpers module¶

vardb.metadata_wrangling.oasis.helpers.extract_column_names_from_base_names(dataframe_columns, base_names, pattern_match)¶

Extracts column names from base names

Parameters:	dataframe_columns – Columns of the dataframe base_names – Base names for that dataframe pattern_match – Matching pattern for that dataframe column name
Returns:	List of column names with patient_id

vardb.metadata_wrangling.oasis.helpers.extract_date_columns(dataframe)¶

Extract the columns from the Clinical dataframe which stores dates

Parameters:	dataframe – Clinical dataframe
Returns:	List of columns which stores dates

vardb.metadata_wrangling.oasis.helpers.reshape_dataframe(dataframe, stub_names, id_variable, sub_observation, separator='', suffix='\\d+')¶

Reshapes a dataframe from wide to long, drops NaN rows and resets the indices

Parameters:

dataframe – The dataframe to be reshaped
stub_names – Column names in the reshaped dataframe
id_variable – Column to use as id variable
sub_observation – Column name that you wish to name your suffix in the long format.
separator – A character indicating the separation of the variable names in the wide format, to be stripped from the names in the long format.
suffix – A regular expression capturing the wanted suffixes.

Returns:

Reshaped dataframe

vardb.metadata_wrangling.oasis.oasis module¶

vardb.metadata_wrangling.oasis.oasis.main()¶

vardb.metadata_wrangling.oasis.oasis.parse_oasis_data(oasis_file_path, output_path)¶

Parameters:	oasis_file_path – Input OASIS file path output_path – The folder path to store the output

vardb.metadata_wrangling.oasis.output module¶

vardb.metadata_wrangling.oasis.output.dataframe_to_tsv(dataframe, file_path, date_stamp)¶

Write the dataframe to a TSV file

Parameters:	dataframe – Input dataframe file_path – File path where to write it date_stamp – YYYYMMDD date format of the file

vardb.metadata_wrangling.oasis.output.filter_non_pediatric_ids(error_dataframe)¶

Filter out non pediatric ids from the error reporting dataframe

Parameters:	error_dataframe – Error reporting dataframe
Returns:	Filtered dataframe without pediatric ids

vardb.metadata_wrangling.oasis.output.output_to_tsv(demographics_dataframe, diagnosis_dataframe, drug_treatment_dataframe, radiation_dataframe, error_dataframe, output_path)¶

Write the output tables to TSV files

Parameters:	demographics_dataframe – Demographics dataframe diagnosis_dataframe – Diagnosis dataframe drug_treatment_dataframe – Drug Treatment dataframe radiation_dataframe – Radiation dataframe error_dataframe – Clinical dataframe where the errors are reported output_path – The folder path to store the output

vardb.metadata_wrangling.oasis.output.write_output_to_tsv(demographics_dataframe, diagnosis_dataframe, drug_treatment_dataframe, radiation_dataframe, error_dataframe, output_path)¶

Convert the error code lists to strings in the individual tables

Parameters:	demographics_dataframe – Demographics dataframe diagnosis_dataframe – Diagnosis dataframe drug_treatment_dataframe – Drug treatment dataframe radiation_dataframe – Radiation dataframe error_dataframe – Error dataframe output_path – The folder path to store the output

vardb.metadata_wrangling.oasis.preprocess module¶

vardb.metadata_wrangling.oasis.preprocess.append_patient_id_column(dataframe)¶

Append a column named ‘patient_id’ which saves the gsc_pog_id value POG 001-GIC as POG001

Parameters:	dataframe – Input dataframe
Returns:	Dataframe with the appended column

vardb.metadata_wrangling.oasis.preprocess.clean_data(clinical_dataframe)¶

Clean the OASIS data

Parameters:	clinical_dataframe – Clinical dataframe
Returns:	(Clinical dataframe, Error reporting dataframe)

vardb.metadata_wrangling.oasis.preprocess.compare_death_dates_to_all_dates(row, date_columns)¶

Compare death date to the rest of the dates to validate death_date > all other dates

Parameters:	row – Each row of the Error Reporting dataframe date_columns – List of date columns from the Error Reporting Dataframe
Returns:	Error code string for that row

vardb.metadata_wrangling.oasis.preprocess.compare_tumour_group_to_pog_tumour_groups(row, pog_tumour_group_columns)¶

Compare the tumour group in the gsc_pog_id column to pog_tumour_groups from the treatment data

Parameters:	row – Each row of the Error reporting dataframe pog_tumour_group_columns – List of pog_tumour_group columns from the treatment data
Returns:	Error reporting dataframe with the error reported

vardb.metadata_wrangling.oasis.preprocess.drop_duplicate_pogs_with_same_biopsy_dates(dataframe)¶

Drop multiple POG biopsies with the same biopsy date. Drop all rows.

Parameters:	dataframe –
Returns:

vardb.metadata_wrangling.oasis.preprocess.drop_duplicate_rows(dataframe)¶

Drop duplicate rows

Parameters:	dataframe – Input dataframe
Returns:	Dataframe without duplicate rows

vardb.metadata_wrangling.oasis.preprocess.drop_empty_pog_id_rows(dataframe)¶

Drop empty rows with only GSC POG ID

Parameters:	dataframe – Input dataframe
Returns:	Dataframe with the dropped rows

vardb.metadata_wrangling.oasis.preprocess.drop_missing_pog_id_rows(dataframe)¶

Drop the rows with missing GSC POG IDs

Parameters:	dataframe – Input dataframe
Returns:	New dataframe excluding the erroneous rows

vardb.metadata_wrangling.oasis.preprocess.identify_empty_pog_ids(row)¶

Iterate over each row of the dataframe to identify empty rows with only GSC POG IDs to label them as ‘Empty’

Parameters:	row – Each row of the Clinical dataframe
Returns:	Error code string for that row

vardb.metadata_wrangling.oasis.preprocess.identify_missing_pog_ids(row)¶

Iterate over each row of the Clinical dataframe to identify missing gsc_pog_ids

Parameters:	row – Each row of the Clinical dataframe
Returns:	Error code string for that row

vardb.metadata_wrangling.oasis.preprocess.read_data(oasis_file)¶

Read the input OASIS Excel file

Parameters:	oasis_file – Input OASIS file
Returns:	Clinical Dataframe

vardb.metadata_wrangling.oasis.preprocess.split_gsc_pog_id_and_drop_it(dataframe)¶

Split the gsc_pog_id column into tumour_group, patient_id and pediatric_id. Drop the gsc_pog_id after that. Move the new columns as the first three columns

Parameters:	dataframe – Input dataframe
Returns:	Dataframe with the new columns and without gsc_pog_id column

vardb.metadata_wrangling.oasis.preprocess.split_strip_and_join_column_data(column_cell_data)¶

Split the comma separated strings Strip out the whitespace Join them together as a comma-separated string

Parameters:	column_cell_data – Data from the column cell
Returns:	Processed column cell data as a comma separated string filtering out empty strings ‘a’,,’ b’ –> ‘a’,’b’ or Nan for empty cells

vardb.metadata_wrangling.oasis.preprocess.strip_1_from_date(dataframe)¶

Strip of a trailing ‘ 1’ from the dates

Parameters:	dataframe – Clinical dataframe
Returns:	Date formatted Clinical dataframe

vardb.metadata_wrangling.oasis.preprocess.strip_whitespace_and_consecutive_commas_from_column_data(dataframe)¶

Strips whitespaces and consecutive commas from the column data i.e. ‘a’,,’ b’ –> ‘a’,’b’

Parameters:	dataframe – Clinical Dataframe
Returns:	Clinical Dataframe with no whitespace in the data and eliminate consecutive commas (,,)

vardb.metadata_wrangling.oasis.preprocess.strip_whitespace_from_beginning_and_end(dataframe)¶

Strips the whitespace from the data for all the columns

Parameters:	dataframe – Input dataframe
Returns:	Dataframe sans whitespaces from the beginning and end

vardb.metadata_wrangling.oasis.preprocess.uniform_date_format(dataframe, date_columns)¶

Format all the dates to a uniform pattern YYYY-MM-DD

Parameters:	dataframe – Clinical dataframe date_columns – List of dataframe columns that store dates
Returns:	Date formatted Clinical dataframe

vardb.metadata_wrangling.oasis.preprocess.validate_data(clinical_dataframe, error_reporting_dataframe)¶

Validate the Clinical dataframe and report errors on the Error Reporting dataframe

Parameters:	clinical_dataframe – Clinical dataframe error_reporting_dataframe – Error Reporting dataframe
Returns:	Updated Clinical dataframe and errors reported on the Clinical dataframe

vardb.metadata_wrangling.oasis.preprocess.validate_death_date(dataframe)¶

Validate death_date > all other date columns (except pog_report_date as that can be reported anytime)

Parameters:	dataframe – Error Reporting dataframe
Returns:	Updated Error Reporting dataframe with the appropriate error code

vardb.metadata_wrangling.oasis.preprocess.validate_duplicate_pogs_with_same_biopsy_dates(dataframe)¶

Identify multiple POG biopsies with the same biopsy date. Flag all rows with the error code.

Parameters:	dataframe – Error dataframe
Returns:	Error Reporting Dataframe updated with multiple POG biopsies with the same biopsy dates identified

vardb.metadata_wrangling.oasis.preprocess.validate_duplicate_rows(dataframe)¶

Identify and iterate over each row of the duplicate dataframe and label them as ‘Duplicate’

Parameters:	dataframe – Error Reporting Dataframe
Returns:	Error Reporting Dataframe updated with duplicate records identified

vardb.metadata_wrangling.oasis.preprocess.validate_empty_pog_ids(error_reporting_dataframe)¶

Identify the empty POG ID rows in the Clinical dataframe Report them in the Error Reporting dataframe Drop them from the Clinical dataframe

Parameters:	error_reporting_dataframe – Error Reporting dataframe
Returns:	Updated Error Reporting dataframe

vardb.metadata_wrangling.oasis.preprocess.validate_missing_pog_ids(error_reporting_dataframe)¶

Identify the missing POG ID rows in the Clinical dataframe Report them in the Error Reporting dataframe Drop them from the Clinical dataframe

Parameters:	error_reporting_dataframe – Error Reporting dataframe
Returns:	Updated Error Reporting dataframe

vardb.metadata_wrangling.oasis.preprocess.validate_tumour_groups(dataframe)¶

Validate same tumour groups in the gsc_pog_id column and treatment columns

Parameters:	dataframe – Error reporting Dataframe
Returns:	Error reporting dataframe with the error code reported

vardb.metadata_wrangling.oasis.radiation module¶

vardb.metadata_wrangling.oasis.radiation.extract_data(dataframe)¶

Extract radiation data from the clinical dataframe

Parameters:	dataframe – Clinical dataframe
Returns:	Radiation dataframe

vardb.metadata_wrangling.oasis.radiation.get_radiation_data(clinical_dataframe)¶

Work with Radiation data

Parameters:	clinical_dataframe – The original clinical dataframe
Returns:	Validated Radiation dataframe

vardb.metadata_wrangling.oasis.radiation.reshape_radiation_data(dataframe)¶

Reshapes the Radiation dataframe by applying pandas Wide to Long method

Parameters:	dataframe – Radiation Dataframe
Returns:	Reshaped Radiation Dataframe

vardb.metadata_wrangling.oasis.treatment module¶

vardb.metadata_wrangling.oasis.treatment.count_non_null_treatment_type_for_treatment_groups(pog_informed_group)¶

For each treatment group (based on patient_id) count the no of non NULL treatment_type entries

Parameters:	pog_informed_group – pog_informed group for Treatment groups (based on patient_id)
Returns:	Count of non NULL treatment_type entries

vardb.metadata_wrangling.oasis.treatment.count_pog_informed_for_treatment_groups(pog_informed_group)¶

For each treatment group (based on patient_id) count the no of pog_informed entries

Parameters:	pog_informed_group – pog_informed group for Treatment groups (based on patient_id)
Returns:	Count of pog_informed entries

vardb.metadata_wrangling.oasis.treatment.extract_data(dataframe)¶

Extract drug treatment data from the clinical dataframe

Parameters:	dataframe – Clinical dataframe
Returns:	Drug Treatment dataframe

vardb.metadata_wrangling.oasis.treatment.get_drug_treatment_data(clinical_dataframe)¶

Work with Drug Treatment data

Parameters:	clinical_dataframe – The original clinical dataframe
Returns:	Validated Drug Treatment dataframe

vardb.metadata_wrangling.oasis.treatment.reshape_treatment_data(dataframe)¶

Reshapes the Drug Treatment dataframe by applying pandas Wide to Long method

Parameters:	dataframe – Drug Treatment Dataframe
Returns:	Reshaped Drug Treatment Dataframe

vardb.metadata_wrangling.oasis.treatment.validate_best_response_should_not_be_null(row)¶

Validate best_response should not be null for pog_informed (Y) entries

Parameters:	row – Each row of the Drug Treatment dataframe
Returns:	The error code string for that row

vardb.metadata_wrangling.oasis.treatment.validate_data(drug_treatment_dataframe, demographics_dataframe)¶

All validations pertaining ot the Drug Treatment dataframe

Parameters:	drug_treatment_dataframe – Drug Treatment dataframe demographics_dataframe – Demographics dataframe supplied for cross validation with treatment table
Returns:	Validated Drug Treatment dataframe

vardb.metadata_wrangling.oasis.treatment.validate_demographics_post_pog_activity_categories(pog_informed_y_dataframe, demographics_dataframe)¶

Validate when pog_informed = ‘Y’ for at least one treatment, Demographics data Post POG activities to be either ‘POG informed out of province’ ‘ST/CT therapy at BCCA’ ‘POG informed compassionate access therapy’ ‘POG informed private pay’

Parameters:	pog_informed_y_dataframe – Filtered Drug treatment dataframe with pog_informed = ‘Y’ demographics_dataframe – Demographics dataframe
Returns:	Demographics dataframe with the error code reported

vardb.metadata_wrangling.oasis.treatment.validate_demographics_with_treatment(demographics_dataframe, drug_treatment_dataframe)¶

Validation performed on a dataframe obtained by merging Demographics with Drug Treatment dataframe and reporting error_codes on the Demographics dataframe

Parameters:	demographics_dataframe – Demographics dataframe drug_treatment_dataframe – Drug Treatment dataframe
Returns:	Demographics dataframe with the error codes reported

vardb.metadata_wrangling.oasis.treatment.validate_drug_treatment_for_bcca_treatment_type(pog_informed_y_dataframe, demographics_dataframe)¶

Validate Drug Treatment data for bcca treatment type

Parameters:	pog_informed_y_dataframe – Filtered Drug treatment dataframe with pog_informed = ‘Y’ demographics_dataframe – Demographics dataframe
Returns:	Demographics dataframe with the error code reported

vardb.metadata_wrangling.oasis.treatment.validate_for_bcca_treatment_type_should_not_be_null(row)¶

Iterate over each row of the merged dataframe to validate when demographics.post_pog_activities is ‘ST/CT therapy at BCCA’ then for at least one entry where drug_treatment.pog_informed = ‘Y’, drug_treatment.treatment_type should not be null

Parameters:	row – Each row of the merged Demographics dataframe
Returns:	The error code string for that row

vardb.metadata_wrangling.oasis.treatment.validate_for_post_pog_activities_bcca_province_compassionate_private(row)¶

Iterate over each row of the dataframe to validate whether Post POG activities is either ‘POG informed out of province’ or ‘ST/CT therapy at BCCA’

Parameters:	row – Each row of the merged Demographics dataframe
Returns:	The error string for that row

vardb.metadata_wrangling.oasis.treatment.validate_mandatory_columns(row)¶

Iterate over each row of the dataframe to validate the mandatory columns in the drug treatment data: tumour_group pog_tumour_group course_begin_on course_end_on drug_list intent treatment_time pog_informed

Parameters:	row – Each row of the demographics dataframe
Returns:	The error code string for that row

vardb.metadata_wrangling.oasis.treatment.validate_progression_documentation_is_not_null(row)¶

Iterate over each row of the dataframe to validate when progression_on is present then,: progression_documentation must also be present

Parameters:	row – Each row of the drug treatment dataframe
Returns:	The error code string for that row

vardb.metadata_wrangling.oasis.treatment.validate_treatment_data(drug_treatment_dataframe)¶

Validate Drug Treatment data

Parameters:	drug_treatment_dataframe – Drug Treatment dataframe
Returns:	Validated Drug Treatment dataframe

vardb.metadata_wrangling.oasis.treatment.validate_treatment_time_either_pre_or_post_pog_report(row)¶

Iterate over each row of the dataframe to validate when course_begin_on date is <= demographics.pog_report_date then treatment_time should be ‘Pre POG Report’; otherwise, treatment_time should be ‘Post POG Report’

Parameters:	row – Each row of the drug treatment dataframe
Returns:	The error code string for that row

vardb.metadata_wrangling.oasis.treatment.validate_treatment_time_for_pre_or_post_pog_report(drug_treatment_dataframe, demographics_dataframe)¶

Validate if course_begin_on date is <= demographics.pog_report_date, then treatment_time should be ‘Pre POG Report’ otherwise, treatment_time should be ‘Post POG Report’.

Parameters:	drug_treatment_dataframe – Drug treatment dataframe demographics_dataframe – Demographics dataframe
Returns:	Drug treatment dataframe with the error code reported

vardb.metadata_wrangling.oasis.treatment.validate_treatment_time_is_post_pog_report(row)¶

Iterate over each row of the dataframe to validate when pog_informed = ‘Y’ then treatment_time= Post POG Report

Parameters:	row – Each row of the drug treatment dataframe
Returns:	The error code string for that row

vardb.metadata_wrangling.oasis.treatment.validate_treatment_with_demographics(drug_treatment_dataframe, demographics_dataframe)¶

Validation performed on a dataframe obtained by merging Drug Treatment with Demographics dataframe and reporting error_codes on the Drug Treatment dataframe

Parameters:	drug_treatment_dataframe – Drug Treatment dataframe demographics_dataframe – Demographics dataframe
Returns:	Drug Treatment dataframe with the error codes reported