Description
In this report, I focus on the former algorithm—acceptance vs. denial prediction. To predict case acceptance/denial we use natural language processing (NLP) techniques to convert each litigated patent document into thousands of numeric features. Upon combining these text-based features with patent metadata, we used two primary machine learning algorithms to attempt to classify these documents based on their case acceptance/denial outcome: support vector classification and random forests. In this report, I focus both on the efforts we went through to wrangle the data as well as the hyperparameters we tuned across these two algorithms. We found that we were able to achieve performant algorithms that exhibited classification accuracy slightly better than the base rate data skew, although further room for improvement exists. As the post-grant review process matures, there will be further opportunity to gather more case data, refine the tools we have built over the past year, and increase the confidence associated with post-grant review analytics.