Machine and deep learning libraries and industry APIs on highly elastic and scalable platform brings transformational change in congnitive computing
Analytics framework overview
Machine learning with supervised and un-supervised techniques were tested and applied for master record match and merge. Several techniques were applied on Phonetic, Edit, Token and domain dependent measurement techniques but machine learning worked the best in prediction.
Machine learning outperformed in efficiency and accuracy in matching and merging the record based on the true positive and false positive rest results.Records were matched based on distance of characters and words for transposition, substitution, weight or length using Levenshtein Distance, Jaro distance, Sorensen Index and Cosine function and dice’s coefficient based on weight or domain for more general purpose. Python library sklearn support vector machine in different dimensions and logistic regression were used for prediction. The algorithms were tested for accuracy based on true positive and false positive ratio.
A cohort study was done on patients with tumours and biopsy was conducted to predict invasive breast cancer risk. Data collected was used to analyse and predict the mortality rate, relapse rate or no curability. BARC1, HER2, Chemotherapy, was done on Asian American race and the model was later applied on other races.
Exclusion and inclusion criteria was incorporated during the data preparation stage and prediction and classification was done using various algorithms. Classification of the patient based on the patient demographics, hyper and hypo stroma, tissue grade based on the test results. A supervised trained model was built to predict using probabilistic model like Naïve Bayes to predict the outcome and categorized the patients.Images were handled using Apache Hipi and KNN models were used to classify the tumour cells. Various algorithms like watermark, edge detection. Decision tree was extrapolated using RPART, C4.5 and J48. Remedial therapies were developed and designed for patients identified as Triple negative, false positive or True negative.
Biological data, drug and genomics data was graphically integrated and analysed using reverse Regression, ANN, KNN, Probabilistic inference, HMM, Bayes Net and LDS and Kalman filters model to extrapolate the relationships to mine and predict possible epidemic in a population. Extracting the relationship and predicting the relationship using Graph and probabilistic approach was implemented.