vardb.metadata_wrangling.oasis package

The oasis package is a collection of scripts used for cleaning data from the Oasis database at BCCA, and creating tables for loading to vardb.

Submodules

vardb.metadata_wrangling.oasis.demographics module

vardb.metadata_wrangling.oasis.demographics.add_biopsy_number_column(dataframe)

Add the biopsy_number column to the Demographics dataframe sorted by the biopsy_date in ascending order

Parameters:dataframe – Demographics dataframe
Returns:Demographics dataframe with the biopsy_number column added to it
vardb.metadata_wrangling.oasis.demographics.extract_biopsy_columns(dataframe_columns)

Extract the Biopsy columns from the Clinical dataframe columns which stores the Biopsy information

Parameters:dataframe_columns – Clinical dataframe
Returns:List of columns which stores the Biopsy information
vardb.metadata_wrangling.oasis.demographics.extract_data(dataframe)

Extract demographic data from the clinical dataframe

Parameters:dataframe – Clinical dataframe
Returns:Demographics dataframe
vardb.metadata_wrangling.oasis.demographics.get_demographics_data(clinical_dataframe)

Work with Demographics data

Parameters:clinical_dataframe – The original clinical dataframe
Returns:Validated Demographics dataframe
vardb.metadata_wrangling.oasis.demographics.validate_biopsy_date_less_than_or_equal_pog_report_date(row)

Iterate over each row of the dataframe to validate biopsy_date <= pog_report_date

Parameters:row – Each row of the demographics dataframe
Returns:The error code string for that row
vardb.metadata_wrangling.oasis.demographics.validate_blood_collection_date_less_than_or_equal_pog_report_date(row)

Iterate over each row of the dataframe to validate blood_collection_date <= pog_report_date

Parameters:row – Each row of the demographics dataframe
Returns:The error code string for that row

Iterate over each row of the dataframe to validate consent_date <= pog_report_date

Parameters:row – Each row of the demographics dataframe
Returns:The error code string for that row
vardb.metadata_wrangling.oasis.demographics.validate_data(dataframe)

Validate Demographics data

Parameters:dataframe – Demographics dataframe
Returns:Validated Demographics dataframe
vardb.metadata_wrangling.oasis.demographics.validate_mandatory_columns(row)
Iterate over each row of the dataframe to validate the mandatory columns in the demographic data
patient_id sex consent_date consent_age
Parameters:row – Each row of the demographics dataframe
Returns:The error code string for that row
vardb.metadata_wrangling.oasis.demographics.validate_pog_report_date_mandatory_columns(row)

Iterate over each row of the dataframe to validate the mandatory columns if pog_report_date exists in the demographic data

blood_collection_date biopsy_date bx_loc_radiated prior_primary_tumour biopsy_site post_pog_activities diag_changed re_bx_prog1
Parameters:row – Each row of the demographics dataframe
Returns:The error code string for that row
vardb.metadata_wrangling.oasis.demographics.validate_post_pog_treatment_columns_for_null(row)
Iterate over each row of the dataframe to validate when post_pog_activies is not “POG informed treatment not given”,
then all of “post_pog_treatment_*” should be null
patient_id post_pog_treatment_deceased post_pog_treatment_sick post_pog_treatment_decision_pt post_pog_treatment_decision_phys post_pog_treatment_na post_pog_treatment_cost post_pog_treatment_travel post_pog_treatment_unknown
Parameters:row – Each row of the demographics dataframe
Returns:The error code string for that row
vardb.metadata_wrangling.oasis.demographics.validate_post_pog_treatment_columns_for_ys(row)

Iterate over each row of the dataframe to validate when post_pog_activies is “POG informed treatment not given”, then exactly one of the “post_pog_treatment_*” should have a “Y” and the rest should be null

post_pog_treatment_deceased post_pog_treatment_sick post_pog_treatment_decision_pt post_pog_treatment_decision_phys post_pog_treatment_na post_pog_treatment_cost post_pog_treatment_travel post_pog_treatment_unknown
Parameters:row – Each row of the demographics dataframe
Returns:The error code string for that row
vardb.metadata_wrangling.oasis.demographics.validate_re_bx_date_for_not_null(row)

Iterate over each row of the dataframe to validate when re_bx_prog1, re_bx_prog2, etc is ‘Y’, then re_bx_date1, re_bx_date2, etc should not be null

Parameters:row – Each row of the demographics dataframe
Returns:The error code string for that row
vardb.metadata_wrangling.oasis.demographics.validate_re_bx_date_greater_than_or_equal_biopsy_date(row)

Iterate over each row of the dataframe to validate re_bx_date1, re_bx_date2, etc. >= biopsy_date

Parameters:row – Each row of the demographics dataframe
Returns:The error code string for that row

vardb.metadata_wrangling.oasis.diagnosis module

vardb.metadata_wrangling.oasis.diagnosis.extract_data(dataframe)

Extract diagnosis data from the clinical dataframe

Parameters:dataframe – Clinical dataframe
Returns:Diagnosis dataframe
vardb.metadata_wrangling.oasis.diagnosis.get_diagnosis_data(clinical_dataframe)

Work with Diagnosis data

Parameters:clinical_dataframe – The original clinical dataframe
Returns:Validated Diagnosis dataframe
vardb.metadata_wrangling.oasis.diagnosis.reshape_diagnosis_data(dataframe)

Reshapes the Diagnosis dataframe by applying pandas Wide to Long method

Parameters:dataframe – Diagnosis dataframe
Returns:Reshaped Diagnosis dataframe
vardb.metadata_wrangling.oasis.diagnosis.validate_data(dataframe)

Validate Diagnosis data

Parameters:dataframe – Diagnosis dataframe
Returns:Validated Diagnosis dataframe
vardb.metadata_wrangling.oasis.diagnosis.validate_mandatory_columns(row)
Iterate over each row of the dataframe to validate the mandatory columns in the diagnosis data
site_desc tumour_group diagnosis_date age_at_diagnosis
Parameters:row – Each row of the diagnosis dataframe
Returns:The error code string for that row

vardb.metadata_wrangling.oasis.drug_map module

vardb.metadata_wrangling.oasis.drug_map.drop_comma_separated_drugs_column(drug_treatment_dataframe)

Drop the comma_separated_drugs column from the Drug Treatment dataframe

Parameters:drug_treatment_dataframe – Drug Treatment dataframe
vardb.metadata_wrangling.oasis.drug_map.eliminate_duplicates_and_sort_drug_list(drug_list)

Eliminate duplicate drug names Sort the drug names alphabetically

Parameters:drug_list – The list of drug names
Returns:The sorted drug list
vardb.metadata_wrangling.oasis.drug_map.get_longest_length_matched_token(matched_tokens)

Get the token with the longest length that matched

Parameters:matched_tokens – List of matched drug tokens
Returns:Matching token
vardb.metadata_wrangling.oasis.drug_map.get_tokens_matching_the_drug_map(drug_tokens)

Get the list of tokens that match to the drug map YAML file

Parameters:drug_tokens – Drug name tokens
Returns:List of matching tokens
vardb.metadata_wrangling.oasis.drug_map.insert_original_drug_string_column_before_error_column(drug_treatment_dataframe)

Copy the ‘drug_list’ column and rename it to ‘original_drug_string’ Insert the ‘original_drug_string’ column before the ‘error’ column

Parameters:drug_treatment_dataframe – Drug Treatment dataframe
Returns:Drug Treatment dataframe with the ‘original_drug_string’ column
vardb.metadata_wrangling.oasis.drug_map.map_oasis_drug_to_ontology(row)

Map Oasis drug names to Ontology

Parameters:row – Each row of the Drug Treatment Dataframe
Returns:List of Ontology-mapped drugs
vardb.metadata_wrangling.oasis.drug_map.map_oasis_drugs_using_drug_map(drug_treatment_dataframe)

Create the Drug Map and update the Drug Treatment table

Parameters:drug_treatment_dataframe – Drug Treatment dataframe
vardb.metadata_wrangling.oasis.drug_map.split_and_strip_drug_names(drug_names)

Split the comma separated drug names into a list Strip out the whitespaces in the beginning and end of the drug name

Parameters:drug_names – List of drug names from Oasis
Returns:Processed column cell data filtering out empty strings ‘a’,,’ b’ –> [‘a’,’b’] or Nan for empty cells
vardb.metadata_wrangling.oasis.drug_map.split_drug_names_on_special_characters(oasis_drug_name)

Split the drug names on special characters (E.g. ‘ ‘, ‘(‘, ‘)’) E.g. a (b) –> [‘a’, ‘b’]

Parameters:oasis_drug_name – Oasis drug name
Returns:List of split drug names
vardb.metadata_wrangling.oasis.drug_map.tokenize_drug_name(oasis_drug_name)

Tokenize the Drug names E.g. Input: ‘a b’ Output: [‘a’, ‘b’, ‘ab’]

Parameters:oasis_drug_name – Drug name from Oasis
Returns:Drug name tokens

vardb.metadata_wrangling.oasis.error_code module

vardb.metadata_wrangling.oasis.error_code.append_error_codes(old_error_code, new_error_code)
Append the error code strings together
Parameters:
  • old_error_code – Old error code
  • new_error_code – New error code
Returns:

Appended error code strings with or without comma(s)

vardb.metadata_wrangling.oasis.error_code.collect_error_codes(error_reporting_dataframe, demographics_dataframe, diagnosis_dataframe, drug_treatment_dataframe, radiation_dataframe, diagnosis_error_dataframe, drug_treatment_error_dataframe, radiation_error_dataframe)

Perform operations to generate and process the error code column in the error and individual tables

Parameters:
  • error_reporting_dataframe – Clinical dataframe where the errors are reported
  • demographics_dataframe – Demographics dataframe
  • diagnosis_dataframe – Diagnosis dataframe
  • drug_treatment_dataframe – Drug Treatment dataframe
  • radiation_dataframe – Radiation dataframe
  • diagnosis_error_dataframe – Diagnosis Error dataframe
  • drug_treatment_error_dataframe – Drug Treatment Error dataframe
  • radiation_error_dataframe – Radiation Error dataframe
Returns:

Clinical dataframe where the error are reported

vardb.metadata_wrangling.oasis.error_code.concatenate_error_codes(row)

Iterate over every row of the error dataframe to concatenate the individual error codes

Parameters:row – Each row of the error code dataframe
Returns:Row information
vardb.metadata_wrangling.oasis.error_code.generate_error_dataframe(dataframe, demographics_dataframe, diagnosis_error_dataframe, drug_treatment_error_dataframe, radiation_error_dataframe)

Generate the error code reporting dataframe

Parameters:
  • dataframe – Copy of the Clinical dataframe
  • demographics_dataframe – Demographics error code dataframe
  • diagnosis_error_dataframe – Diagnosis error code dataframe
  • drug_treatment_error_dataframe – Drug Treatment error dataframe
  • radiation_error_dataframe – Radiation error dataframe
Returns:

Generated error code dataframe

vardb.metadata_wrangling.oasis.error_code.group_and_aggregate_error_codes(dataframe, aggregate_column)

Group by and aggregate the error codes from individual dataframes

Parameters:
  • dataframe – Input dataframe
  • aggregate_column – The column on which to perform the aggregate operation
Returns:

Dataframe grouped by the error codes for each patient id and reset index

vardb.metadata_wrangling.oasis.error_code.rename_error_code_column_to_errors(demographics_dataframe, diagnosis_dataframe, drug_treatment_dataframe, radiation_dataframe)

Rename the error code columns in the individual tables

Parameters:
  • demographics_dataframe – Demographics dataframe
  • diagnosis_dataframe – Diagnosis dataframe
  • drug_treatment_dataframe – Drug treatment dataframe
  • radiation_dataframe – Radiation dataframe
vardb.metadata_wrangling.oasis.error_code.replace_nan_with_empty_string(dataframe)

Replace string Nan’s with empty string ‘’

Parameters:dataframe – Error dataframe
Returns:Error dataframe with string Nan’s replaced with ‘’

vardb.metadata_wrangling.oasis.helpers module

vardb.metadata_wrangling.oasis.helpers.extract_column_names_from_base_names(dataframe_columns, base_names, pattern_match)

Extracts column names from base names

Parameters:
  • dataframe_columns – Columns of the dataframe
  • base_names – Base names for that dataframe
  • pattern_match – Matching pattern for that dataframe column name
Returns:

List of column names with patient_id

vardb.metadata_wrangling.oasis.helpers.extract_date_columns(dataframe)

Extract the columns from the Clinical dataframe which stores dates

Parameters:dataframe – Clinical dataframe
Returns:List of columns which stores dates
vardb.metadata_wrangling.oasis.helpers.reshape_dataframe(dataframe, stub_names, id_variable, sub_observation, separator='', suffix='\\d+')

Reshapes a dataframe from wide to long, drops NaN rows and resets the indices

Parameters:
  • dataframe – The dataframe to be reshaped
  • stub_names – Column names in the reshaped dataframe
  • id_variable – Column to use as id variable
  • sub_observation – Column name that you wish to name your suffix in the long format.
  • separator – A character indicating the separation of the variable names in the wide format, to be stripped from the names in the long format.
  • suffix – A regular expression capturing the wanted suffixes.
Returns:

Reshaped dataframe

vardb.metadata_wrangling.oasis.oasis module

vardb.metadata_wrangling.oasis.oasis.main()
vardb.metadata_wrangling.oasis.oasis.parse_oasis_data(oasis_file_path, output_path)
Parameters:
  • oasis_file_path – Input OASIS file path
  • output_path – The folder path to store the output

vardb.metadata_wrangling.oasis.output module

vardb.metadata_wrangling.oasis.output.dataframe_to_tsv(dataframe, file_path, date_stamp)

Write the dataframe to a TSV file

Parameters:
  • dataframe – Input dataframe
  • file_path – File path where to write it
  • date_stamp – YYYYMMDD date format of the file
vardb.metadata_wrangling.oasis.output.filter_non_pediatric_ids(error_dataframe)

Filter out non pediatric ids from the error reporting dataframe

Parameters:error_dataframe – Error reporting dataframe
Returns:Filtered dataframe without pediatric ids
vardb.metadata_wrangling.oasis.output.output_to_tsv(demographics_dataframe, diagnosis_dataframe, drug_treatment_dataframe, radiation_dataframe, error_dataframe, output_path)

Write the output tables to TSV files

Parameters:
  • demographics_dataframe – Demographics dataframe
  • diagnosis_dataframe – Diagnosis dataframe
  • drug_treatment_dataframe – Drug Treatment dataframe
  • radiation_dataframe – Radiation dataframe
  • error_dataframe – Clinical dataframe where the errors are reported
  • output_path – The folder path to store the output
vardb.metadata_wrangling.oasis.output.write_output_to_tsv(demographics_dataframe, diagnosis_dataframe, drug_treatment_dataframe, radiation_dataframe, error_dataframe, output_path)

Convert the error code lists to strings in the individual tables

Parameters:
  • demographics_dataframe – Demographics dataframe
  • diagnosis_dataframe – Diagnosis dataframe
  • drug_treatment_dataframe – Drug treatment dataframe
  • radiation_dataframe – Radiation dataframe
  • error_dataframe – Error dataframe
  • output_path – The folder path to store the output

vardb.metadata_wrangling.oasis.preprocess module

vardb.metadata_wrangling.oasis.preprocess.append_patient_id_column(dataframe)

Append a column named ‘patient_id’ which saves the gsc_pog_id value POG 001-GIC as POG001

Parameters:dataframe – Input dataframe
Returns:Dataframe with the appended column
vardb.metadata_wrangling.oasis.preprocess.clean_data(clinical_dataframe)

Clean the OASIS data

Parameters:clinical_dataframe – Clinical dataframe
Returns:(Clinical dataframe, Error reporting dataframe)
vardb.metadata_wrangling.oasis.preprocess.compare_death_dates_to_all_dates(row, date_columns)

Compare death date to the rest of the dates to validate death_date > all other dates

Parameters:
  • row – Each row of the Error Reporting dataframe
  • date_columns – List of date columns from the Error Reporting Dataframe
Returns:

Error code string for that row

vardb.metadata_wrangling.oasis.preprocess.compare_tumour_group_to_pog_tumour_groups(row, pog_tumour_group_columns)

Compare the tumour group in the gsc_pog_id column to pog_tumour_groups from the treatment data

Parameters:
  • row – Each row of the Error reporting dataframe
  • pog_tumour_group_columns – List of pog_tumour_group columns from the treatment data
Returns:

Error reporting dataframe with the error reported

vardb.metadata_wrangling.oasis.preprocess.drop_duplicate_pogs_with_same_biopsy_dates(dataframe)

Drop multiple POG biopsies with the same biopsy date. Drop all rows.

Parameters:dataframe
Returns:
vardb.metadata_wrangling.oasis.preprocess.drop_duplicate_rows(dataframe)

Drop duplicate rows

Parameters:dataframe – Input dataframe
Returns:Dataframe without duplicate rows
vardb.metadata_wrangling.oasis.preprocess.drop_empty_pog_id_rows(dataframe)

Drop empty rows with only GSC POG ID

Parameters:dataframe – Input dataframe
Returns:Dataframe with the dropped rows
vardb.metadata_wrangling.oasis.preprocess.drop_missing_pog_id_rows(dataframe)

Drop the rows with missing GSC POG IDs

Parameters:dataframe – Input dataframe
Returns:New dataframe excluding the erroneous rows
vardb.metadata_wrangling.oasis.preprocess.identify_empty_pog_ids(row)

Iterate over each row of the dataframe to identify empty rows with only GSC POG IDs to label them as ‘Empty’

Parameters:row – Each row of the Clinical dataframe
Returns:Error code string for that row
vardb.metadata_wrangling.oasis.preprocess.identify_missing_pog_ids(row)

Iterate over each row of the Clinical dataframe to identify missing gsc_pog_ids

Parameters:row – Each row of the Clinical dataframe
Returns:Error code string for that row
vardb.metadata_wrangling.oasis.preprocess.read_data(oasis_file)

Read the input OASIS Excel file

Parameters:oasis_file – Input OASIS file
Returns:Clinical Dataframe
vardb.metadata_wrangling.oasis.preprocess.split_gsc_pog_id_and_drop_it(dataframe)

Split the gsc_pog_id column into tumour_group, patient_id and pediatric_id. Drop the gsc_pog_id after that. Move the new columns as the first three columns

Parameters:dataframe – Input dataframe
Returns:Dataframe with the new columns and without gsc_pog_id column
vardb.metadata_wrangling.oasis.preprocess.split_strip_and_join_column_data(column_cell_data)

Split the comma separated strings Strip out the whitespace Join them together as a comma-separated string

Parameters:column_cell_data – Data from the column cell
Returns:Processed column cell data as a comma separated string filtering out empty strings ‘a’,,’ b’ –> ‘a’,’b’ or Nan for empty cells
vardb.metadata_wrangling.oasis.preprocess.strip_1_from_date(dataframe)

Strip of a trailing ‘ 1’ from the dates

Parameters:dataframe – Clinical dataframe
Returns:Date formatted Clinical dataframe
vardb.metadata_wrangling.oasis.preprocess.strip_whitespace_and_consecutive_commas_from_column_data(dataframe)

Strips whitespaces and consecutive commas from the column data i.e. ‘a’,,’ b’ –> ‘a’,’b’

Parameters:dataframe – Clinical Dataframe
Returns:Clinical Dataframe with no whitespace in the data and eliminate consecutive commas (,,)
vardb.metadata_wrangling.oasis.preprocess.strip_whitespace_from_beginning_and_end(dataframe)

Strips the whitespace from the data for all the columns

Parameters:dataframe – Input dataframe
Returns:Dataframe sans whitespaces from the beginning and end
vardb.metadata_wrangling.oasis.preprocess.uniform_date_format(dataframe, date_columns)

Format all the dates to a uniform pattern YYYY-MM-DD

Parameters:
  • dataframe – Clinical dataframe
  • date_columns – List of dataframe columns that store dates
Returns:

Date formatted Clinical dataframe

vardb.metadata_wrangling.oasis.preprocess.validate_data(clinical_dataframe, error_reporting_dataframe)

Validate the Clinical dataframe and report errors on the Error Reporting dataframe

Parameters:
  • clinical_dataframe – Clinical dataframe
  • error_reporting_dataframe – Error Reporting dataframe
Returns:

Updated Clinical dataframe and errors reported on the Clinical dataframe

vardb.metadata_wrangling.oasis.preprocess.validate_death_date(dataframe)

Validate death_date > all other date columns (except pog_report_date as that can be reported anytime)

Parameters:dataframe – Error Reporting dataframe
Returns:Updated Error Reporting dataframe with the appropriate error code
vardb.metadata_wrangling.oasis.preprocess.validate_duplicate_pogs_with_same_biopsy_dates(dataframe)

Identify multiple POG biopsies with the same biopsy date. Flag all rows with the error code.

Parameters:dataframe – Error dataframe
Returns:Error Reporting Dataframe updated with multiple POG biopsies with the same biopsy dates identified
vardb.metadata_wrangling.oasis.preprocess.validate_duplicate_rows(dataframe)

Identify and iterate over each row of the duplicate dataframe and label them as ‘Duplicate’

Parameters:dataframe – Error Reporting Dataframe
Returns:Error Reporting Dataframe updated with duplicate records identified
vardb.metadata_wrangling.oasis.preprocess.validate_empty_pog_ids(error_reporting_dataframe)

Identify the empty POG ID rows in the Clinical dataframe Report them in the Error Reporting dataframe Drop them from the Clinical dataframe

Parameters:error_reporting_dataframe – Error Reporting dataframe
Returns:Updated Error Reporting dataframe
vardb.metadata_wrangling.oasis.preprocess.validate_missing_pog_ids(error_reporting_dataframe)

Identify the missing POG ID rows in the Clinical dataframe Report them in the Error Reporting dataframe Drop them from the Clinical dataframe

Parameters:error_reporting_dataframe – Error Reporting dataframe
Returns:Updated Error Reporting dataframe
vardb.metadata_wrangling.oasis.preprocess.validate_tumour_groups(dataframe)

Validate same tumour groups in the gsc_pog_id column and treatment columns

Parameters:dataframe – Error reporting Dataframe
Returns:Error reporting dataframe with the error code reported

vardb.metadata_wrangling.oasis.radiation module

vardb.metadata_wrangling.oasis.radiation.extract_data(dataframe)

Extract radiation data from the clinical dataframe

Parameters:dataframe – Clinical dataframe
Returns:Radiation dataframe
vardb.metadata_wrangling.oasis.radiation.get_radiation_data(clinical_dataframe)

Work with Radiation data

Parameters:clinical_dataframe – The original clinical dataframe
Returns:Validated Radiation dataframe
vardb.metadata_wrangling.oasis.radiation.reshape_radiation_data(dataframe)

Reshapes the Radiation dataframe by applying pandas Wide to Long method

Parameters:dataframe – Radiation Dataframe
Returns:Reshaped Radiation Dataframe

vardb.metadata_wrangling.oasis.treatment module

vardb.metadata_wrangling.oasis.treatment.count_non_null_treatment_type_for_treatment_groups(pog_informed_group)

For each treatment group (based on patient_id) count the no of non NULL treatment_type entries

Parameters:pog_informed_group – pog_informed group for Treatment groups (based on patient_id)
Returns:Count of non NULL treatment_type entries
vardb.metadata_wrangling.oasis.treatment.count_pog_informed_for_treatment_groups(pog_informed_group)

For each treatment group (based on patient_id) count the no of pog_informed entries

Parameters:pog_informed_group – pog_informed group for Treatment groups (based on patient_id)
Returns:Count of pog_informed entries
vardb.metadata_wrangling.oasis.treatment.extract_data(dataframe)

Extract drug treatment data from the clinical dataframe

Parameters:dataframe – Clinical dataframe
Returns:Drug Treatment dataframe
vardb.metadata_wrangling.oasis.treatment.get_drug_treatment_data(clinical_dataframe)

Work with Drug Treatment data

Parameters:clinical_dataframe – The original clinical dataframe
Returns:Validated Drug Treatment dataframe
vardb.metadata_wrangling.oasis.treatment.reshape_treatment_data(dataframe)

Reshapes the Drug Treatment dataframe by applying pandas Wide to Long method

Parameters:dataframe – Drug Treatment Dataframe
Returns:Reshaped Drug Treatment Dataframe
vardb.metadata_wrangling.oasis.treatment.validate_best_response_should_not_be_null(row)

Validate best_response should not be null for pog_informed (Y) entries

Parameters:row – Each row of the Drug Treatment dataframe
Returns:The error code string for that row
vardb.metadata_wrangling.oasis.treatment.validate_data(drug_treatment_dataframe, demographics_dataframe)

All validations pertaining ot the Drug Treatment dataframe

Parameters:
  • drug_treatment_dataframe – Drug Treatment dataframe
  • demographics_dataframe – Demographics dataframe supplied for cross validation with treatment table
Returns:

Validated Drug Treatment dataframe

vardb.metadata_wrangling.oasis.treatment.validate_demographics_post_pog_activity_categories(pog_informed_y_dataframe, demographics_dataframe)

Validate when pog_informed = ‘Y’ for at least one treatment, Demographics data Post POG activities to be either ‘POG informed out of province’ ‘ST/CT therapy at BCCA’ ‘POG informed compassionate access therapy’ ‘POG informed private pay’

Parameters:
  • pog_informed_y_dataframe – Filtered Drug treatment dataframe with pog_informed = ‘Y’
  • demographics_dataframe – Demographics dataframe
Returns:

Demographics dataframe with the error code reported

vardb.metadata_wrangling.oasis.treatment.validate_demographics_with_treatment(demographics_dataframe, drug_treatment_dataframe)

Validation performed on a dataframe obtained by merging Demographics with Drug Treatment dataframe and reporting error_codes on the Demographics dataframe

Parameters:
  • demographics_dataframe – Demographics dataframe
  • drug_treatment_dataframe – Drug Treatment dataframe
Returns:

Demographics dataframe with the error codes reported

vardb.metadata_wrangling.oasis.treatment.validate_drug_treatment_for_bcca_treatment_type(pog_informed_y_dataframe, demographics_dataframe)

Validate Drug Treatment data for bcca treatment type

Parameters:
  • pog_informed_y_dataframe – Filtered Drug treatment dataframe with pog_informed = ‘Y’
  • demographics_dataframe – Demographics dataframe
Returns:

Demographics dataframe with the error code reported

vardb.metadata_wrangling.oasis.treatment.validate_for_bcca_treatment_type_should_not_be_null(row)

Iterate over each row of the merged dataframe to validate when demographics.post_pog_activities is ‘ST/CT therapy at BCCA’ then for at least one entry where drug_treatment.pog_informed = ‘Y’, drug_treatment.treatment_type should not be null

Parameters:row – Each row of the merged Demographics dataframe
Returns:The error code string for that row
vardb.metadata_wrangling.oasis.treatment.validate_for_post_pog_activities_bcca_province_compassionate_private(row)

Iterate over each row of the dataframe to validate whether Post POG activities is either ‘POG informed out of province’ or ‘ST/CT therapy at BCCA’

Parameters:row – Each row of the merged Demographics dataframe
Returns:The error string for that row
vardb.metadata_wrangling.oasis.treatment.validate_mandatory_columns(row)
Iterate over each row of the dataframe to validate the mandatory columns in the drug treatment data
tumour_group pog_tumour_group course_begin_on course_end_on drug_list intent treatment_time pog_informed
Parameters:row – Each row of the demographics dataframe
Returns:The error code string for that row
vardb.metadata_wrangling.oasis.treatment.validate_progression_documentation_is_not_null(row)
Iterate over each row of the dataframe to validate when progression_on is present then,
progression_documentation must also be present
Parameters:row – Each row of the drug treatment dataframe
Returns:The error code string for that row
vardb.metadata_wrangling.oasis.treatment.validate_treatment_data(drug_treatment_dataframe)

Validate Drug Treatment data

Parameters:drug_treatment_dataframe – Drug Treatment dataframe
Returns:Validated Drug Treatment dataframe
vardb.metadata_wrangling.oasis.treatment.validate_treatment_time_either_pre_or_post_pog_report(row)

Iterate over each row of the dataframe to validate when course_begin_on date is <= demographics.pog_report_date then treatment_time should be ‘Pre POG Report’; otherwise, treatment_time should be ‘Post POG Report’

Parameters:row – Each row of the drug treatment dataframe
Returns:The error code string for that row
vardb.metadata_wrangling.oasis.treatment.validate_treatment_time_for_pre_or_post_pog_report(drug_treatment_dataframe, demographics_dataframe)

Validate if course_begin_on date is <= demographics.pog_report_date, then treatment_time should be ‘Pre POG Report’ otherwise, treatment_time should be ‘Post POG Report’.

Parameters:
  • drug_treatment_dataframe – Drug treatment dataframe
  • demographics_dataframe – Demographics dataframe
Returns:

Drug treatment dataframe with the error code reported

vardb.metadata_wrangling.oasis.treatment.validate_treatment_time_is_post_pog_report(row)

Iterate over each row of the dataframe to validate when pog_informed = ‘Y’ then treatment_time= Post POG Report

Parameters:row – Each row of the drug treatment dataframe
Returns:The error code string for that row
vardb.metadata_wrangling.oasis.treatment.validate_treatment_with_demographics(drug_treatment_dataframe, demographics_dataframe)

Validation performed on a dataframe obtained by merging Drug Treatment with Demographics dataframe and reporting error_codes on the Drug Treatment dataframe

Parameters:
  • drug_treatment_dataframe – Drug Treatment dataframe
  • demographics_dataframe – Demographics dataframe
Returns:

Drug Treatment dataframe with the error codes reported