Presentation at IC2S2 in Copenhagen, July 2023
Based on a paper recently published in Sociological Methods and Research that demonstrated how an expert could train an efficient and automatic classifier in a limited amount of time, and using the Augmented python package, we organized this tutorial around two moments.
- The first one made a case for the wide use of sequential transfer learning in the human and social sciences.
- Reviewing the existing literature on the topic, both classic and recent, we showed the promises of these recently developed approaches. Not only can a social scientist train an algorithm that correctly annotates hundreds of thousands of texts in a limited amount of time, but it often does so better than a human (who can get tired, bored, or inattentive).
- Click here to see the Slides (1. General Presentation).
- How do these results compare to those of models like chatGPT and other zero or few-shot learning approaches? We made an experiment to find out.
- We then demonstrated how to use a BERT algorithm on text data, via a first, hands-on session to experiment with the Augmented Python package. In order to make it easy to follow, we used a Google Colab notebook.
- The second part of the tutorial discussed practical questions that emerge while carrying out annotation: What to annotate (sentences, paragraphs, articles)? How to create a well-defined indicator? When and how to use active learning? How to tune one’s model? What are the classic mistakes one can make and how to avoid them? We will conclude by saying a few words about when to not use transfer learning, to briefly evoke the downsides of this approach. Eventually, we demonstrated, on an extended example, how to use this method for a personal project.
- Click here to see the slides (2. Some ML Notions).
- Click here to access Lab 2 on Google Colab Notebook.
- Click here to see the slides (3. Tips & Tricks).