The NLP & Social Sciences Seminar meets regularly throughout the semester. The sessions where we welcome external speakers are open to the public (via Zoom; for a link, please send an email to felix.lennert[at]ensae.fr). Here is the program for the upcoming semester looks:
25 Oct 2023, 5.15PM (CET) – Pierre-Carl Langlais (Head of Research, OpSic). “De l’opérationnalisation au fine-tuning : créer des LLM pour l’analyse de corpus en sciences sociales” (in French)
Depuis quelques mois, chatGPT est concurrencé par une nouvelle génération de LLM ouverts. Llama, Mistral, Falcon : ces modèles plus compacts peuvent être adaptés à une grande variété de tâches sous réserve d’être entraînés en amont. Cette présentation décrit de premiers essais expérimentaux de fine-tuning pour l’annotation de grands corpus en sciences sociales et en humanités : textes littéraires, expressions sur les réseaux sociaux ou échanges avec le service public. L’élargissement de la fenêtre contextuelle (jusqu’à 3000 mots pour llama) et la sophistication croissante des LLM permet aujourd’hui d’opérationnaliser des catégories d’analyses complexes (sarcasme, complotisme, intertextualité, temps diégétique). Avec ces premiers résultats, nous évoquerons également les enjeux méthodologiques associés à l’entraînement des LLM aujourd’hui, dont le recours de plus en plus fréquent à des données synthétiques.
08 Nov 2023, 5.15PM (CET) – Yiwei Luo (Stanford University): “Othering and low prestige framing of immigrant cuisines in US restaurant reviews and large language models” (with Kristina Gligorić and Dan Jurafsky)
Identifying and understanding implicit attitudes toward food can help efforts to mitigate social prejudice due to food’s pervasive role as a marker of cultural and ethnic identity. Stereotypes about food are a form of microaggression that contribute to harmful public discourse that may in turn perpetuate prejudice toward ethnic groups and negatively impact economic outcomes for restaurants. Through careful linguistic analyses, we evaluate social theories about attitudes toward immigrant cuisine in a large-scale study of framing differences in 2.1M English language Yelp reviews of restaurants in 14 US states. Controlling for factors such as restaurant price and neighborhood racial diversity, we find that immigrant cuisines are more likely to be framed in objectifying and othering terms of authenticity (e.g., authentic, traditional), exoticism (e.g., exotic, different), and prototypicality (e.g., typical, usual), but that non-Western immigrant cuisines (e.g., Indian, Mexican) receive more othering than European cuisines (e.g., French, Italian). We further find that non-Western immigrant cuisines are framed less positively and as lower status, being evaluated in terms of affordability and hygiene. Finally, we show that reviews generated by large language models (LLMs) reproduce many of the same framing tendencies. Our results empirically corroborate social theories of taste and gastronomic stereotyping, and reveal linguistic processes by which such attitudes are reified.
22 Nov 2023, 5.15PM (CET) – Isabelle Augenstein (University of Copenhagen): “Transparent Cross-Domain Stance Detection”
Description: Understanding attitudes expressed in text is an important task for content moderation, market research, or to detect false information online. Stance detection has been framed in many different ways, e.g. targets can explicit or implicit, and contexts can range from short tweets to entire articles. Moreover, datasets differ by domain, and use varying label inventories, annotation protocols, and cover different languages. This requires novel methods that can bridge domains as well as languages. Moreover, to be applied to content moderation, having a model that can provide a reason for a certain stance can be useful.
In this talk, I will present our research on cross-domain as well as cross-lingual stance detection, as well as on methods for creating transparent predictions by additionally providing explanations.
6 Dec 2023, 5.15PM (CET) – Antonin Descampe & Louis Escouflaire (UC Louvain): “Analyzing Subjectivity in Journalism: A Multidisciplinary Discourse Analysis Using Linguistics, Machine Learning, and Human Evaluation”
Description: We present the results of three experiments on subjectivity detection in French press articles. Our research lies at the crossroads of journalism studies and linguistics and aims to uncover the mechanisms of objective writing in journalistic discourse. First, we evaluated a range of linguistic features for a text classification task of news articles and opinion pieces. Then, we fine-tuned a transformer model (CamemBERT) on the same task and compared it with the feature-based model in terms of accuracy, computational cost and explainability. We used model explanation methods to extract linguistic patterns from the transformer model in order to build a more accurate and more transparent hybrid classification model. Finally, we conducted an annotation experiment in which 36 participants were tasked with highlighting “subjective elements” in 150 press articles. This allowed us to compare human-based and machine-derived insights on subjectivity, and to confront these results with journalistic guidelines on objective writing.
Feel free to reach out to us.