Researcher, MediaFutures: Research Centre for Responsible Media Technology & Innovation
- Phone+47 55 58 41 31
- Visitor AddressFosswinckels gate 6Lauritz Meltzers hus5007 BergenRoom516
- Postal AddressPostboks 78025020 Bergen
Samia Touileb is currently a researcher in MediaFutures WP5 on Norwegian Language Technologies. Prior to this she was a Postdoc at the Language Technology Group (LTG), Department of Informatics, at the University of Oslo. She holds a PhD in Natural Language Processing (NLP) from the University of Bergen, and has been working within research in and applications of NLP for almost a decade.
Her main research interests are information extraction, sentiment analysis, bias and fairness in NLP, and applications of NLP and machine learning methods to tasks within social science research. She also mainly works on under- and mid-resourced languages such as Norwegian.
- (2016). ADIOS LDA: When Grammar Induction Meets Topic Modeling. NIKT: Norsk IKT-konferanse for forskning og utdanning.
- (2014). Inducing Information Structures for Data-driven Text Analysis. Association for Computational Linguistics (ACL). Annual Meeting Conference Proceedings.
- (2014). Applying grammar induction to text mining. Association for Computational Linguistics (ACL). Annual Meeting Conference Proceedings. 712-717.
- (2016). Getting to know large newsflows: Automatically induced information structures as keyphrases for news content analysis.
- (2012). Networks of texts and people.
- (2018). Operationalising Diversity for Big Data Policy Research.
- (2017). Finding Voices in the Margins: Computer-Assisted Discovery of Naturally Belonging Names .
- (2015). Computer supported deliberation and argumentation online. Proposing a system for online argumentation.
- (2013). Inducing local grammars from n-grams.
Academic anthology/Conference proceedings
- (2021). Proceedings of the Sixth Arabic Natural Language Processing Workshop. Association for Computational Linguistics.
- (2017). Automatically Inducing Information Structures. A Text Mining Approach Based on the Distributional Hypothesis.
Academic chapter/article/Conference paper
- (2023). Measuring normative and descriptive biases in language models using census data.
- (2022). Occupational Biases in Norwegian and Multilingual Language Models. 12 pages.
- (2022). NorDiaChange: Diachronic Semantic Change Dataset for Norwegian. 10 pages.
- (2022). NERDz: A Preliminary Dataset of Named Entities for Algerian. 7 pages.
- (2022). Measuring Harmful Representations in Scandinavian Language Models. 8 pages.
- (2022). Exploring the Effects of Negation and Grammatical Tense on Bias Probes . 7 pages.
- (2022). EventGraph: Event Extraction as Semantic Graph Parsing. 9 pages.
- (2022). EventGraph at CASE 2021 Task 1: A General Graph-based Approach to Protest Event Extraction. 6 pages.
- (2022). Annotating Norwegian language varieties on Twitter for Part-of-speech. 6 pages.
- (2021). Using Gender- and Polarity-Informed Models to Investigate Bias. 9 pages.
- (2021). The interplay between language similarity and script on a novel multi-layer Algerian dialect corpus. 13 pages.
- (2021). NorDial: A Preliminary Corpus of Written Norwegian Dialect Use. 7 pages.
- (2020). Named Entity Recognition without Labelled Data: A Weak Supervision Approach . 16 pages.
- (2020). LTG-ST at NADI Shared Task 1: Arabic Dialect Identification using a Stacking Classifier. 7 pages.
- (2020). Identifying Sentiments in Algerian Code-switched User-generated Comments. 8 pages.
- (2020). Gender and sentiment, critics and authors: a dataset of Norwegian book reviews. 14 pages.
- (2019). Measuring Diachronic Evolution of Evaluative Adjectives with Word Embeddings: the Case for English, Norwegian, and Russian. 8 pages.
- (2019). Lexicon information in neural sentiment analysis: a multi-task learning approach. 12 pages.
- (2018). NoReC: The Norwegian Review Corpus. 6 pages.
- (2018). Automatic identification of unknown names with specific roles. 9 pages.
- (2014). Constructions: a new unit of analysis for corpus-based discourse analysis . 11 pages.
- (2021). Using Gender- and Polarity-informed Models to Investigate Bias.
- (2018). Automatically identifying names of unrecognized politicians.
- (2015). A computational approach to organize and analyze online communication data.
- (2013). Applying Corpus Techniques to Climate Change Blogs.
More information in national current research information system (CRIStin)
Fields of competence