ASReview Model Selection Guide

ASReviewMachine LearningSystematic Reviews

Earlier this year, I wrote a guest blog for the ASReview website on how to choose models within ASReview.
Since model selection is one of the most common questions new users have, I wanted to make sure it’s referenced here as well.

Highlights from the blog

  • ASReview as a Swiss Army knife — it supports multiple feature extractors and classifiers, giving researchers freedom to tailor the pipeline.
  • Feature extractors:
    • TF-IDF → lightweight, fast, and interpretable, but ignores context.
    • Doc2Vec → learns semantics from scratch, more context-aware, but computationally heavy.
    • SBERT → transformer-based, multilingual, powerful on semantics, but memory-intensive.
  • Classifiers:
    • Naive Bayes → very fast, pairs well with TF-IDF.
    • Random Forest → robust ensemble, balances accuracy and stability.
    • SVM → effective in high dimensions, but slower on large datasets.
    • Logistic Regression → solid statistical baseline, efficient.
    • Neural Networks → powerful, but require more data and compute.
  • Processing times vary: TF-IDF embeddings take seconds, Doc2Vec minutes, SBERT hours. Classifiers usually run in under 10 seconds per cycle.
  • Best model? → There isn’t one. The “best” choice depends on your dataset and research question. ASReview offers verified combinations (like TF-IDF + Naive Bayes, or SBERT + XGBoost) as starting points.

👉 You can read the full post on the ASReview Blog.