Random Forest

Explored by Melissa Keller

At first, I fed just the 10 questions into the Random Forest Classifier and got a Training Score of 1.0 and a Testing Score of 0.978. While those are really good results, I wanted to see what would happen if different inputs were used. Factoring in all the demographic information, it brought the Testing Score down to 0.964. Just for fun, I wanted to see what would happen if we used it if the person has Autism to see if they would have a family history of it. And there was a Testing Score of 0.842.

I noticed that, while it varied each time I ran the model, A9 was usually the question that ranked the highest in importance out of all the questions. I decided to compare those who had the answer associated with ASD and were found to have ASD with those who answered the question with the ASD answer but didn't have ASD. A Z-test for proportions was used and the p-value was, 5.984858034448194e-59. Since the p-value is so small, we reject the null hypothesis, meaning that if someone answers with the ASD associated response they are more likely to have ASD.

Click below to view the code.