Samia Touileb's picture

Samia Touileb

Researcher, MediaFutures: Research Centre for Responsible Media Technology & Innovation
  • E-mailsamia.touileb@uib.no
  • Phone+47 55 58 41 31
  • Visitor Address
    Fosswinckels gate 6
    Lauritz Meltzers hus
    5007 Bergen
  • Postal Address
    Postboks 7802
    5020 Bergen

Samia Touileb is currently a researcher in MediaFutures WP5 on Norwegian Language Technologies. Prior to this she was a Postdoc at the Language Technology Group (LTG), Department of Informatics, at the University of Oslo. She holds a PhD in Natural Language Processing (NLP) from the University of Bergen, and has been working within research in and applications of NLP for almost a decade.

Her main research interests are information extraction, sentiment analysis, bias and fairness in NLP, and applications of NLP and machine learning methods to tasks within social science research. She also mainly works on under- and mid-resourced languages such as Norwegian.

Academic article
  • Show author(s) (2016). ADIOS LDA: When Grammar Induction Meets Topic Modeling. NIKT: Norsk IKT-konferanse for forskning og utdanning.
  • Show author(s) (2014). Inducing Information Structures for Data-driven Text Analysis. Association for Computational Linguistics (ACL). Annual Meeting Conference Proceedings.
  • Show author(s) (2014). Applying grammar induction to text mining. Association for Computational Linguistics (ACL). Annual Meeting Conference Proceedings. 712-717.
  • Show author(s) (2016). Getting to know large newsflows: Automatically induced information structures as keyphrases for news content analysis.
  • Show author(s) (2012). Networks of texts and people.
Academic lecture
  • Show author(s) (2018). Operationalising Diversity for Big Data Policy Research.
  • Show author(s) (2017). Finding Voices in the Margins: Computer-Assisted Discovery of Naturally Belonging Names .
  • Show author(s) (2015). Computer supported deliberation and argumentation online. Proposing a system for online argumentation.
  • Show author(s) (2013). Inducing local grammars from n-grams.
Academic anthology/Conference proceedings
  • Show author(s) (2021). Proceedings of the Sixth Arabic Natural Language Processing Workshop. Association for Computational Linguistics.
Doctoral dissertation
  • Show author(s) (2017). Automatically Inducing Information Structures. A Text Mining Approach Based on the Distributional Hypothesis.
Academic chapter/article/Conference paper
  • Show author(s) (2023). Measuring normative and descriptive biases in language models using census data.
  • Show author(s) (2022). Occupational Biases in Norwegian and Multilingual Language Models. 12 pages.
  • Show author(s) (2022). NorDiaChange: Diachronic Semantic Change Dataset for Norwegian. 10 pages.
  • Show author(s) (2022). NERDz: A Preliminary Dataset of Named Entities for Algerian. 7 pages.
  • Show author(s) (2022). Measuring Harmful Representations in Scandinavian Language Models. 8 pages.
  • Show author(s) (2022). Exploring the Effects of Negation and Grammatical Tense on Bias Probes . 7 pages.
  • Show author(s) (2022). EventGraph: Event Extraction as Semantic Graph Parsing. 9 pages.
  • Show author(s) (2022). EventGraph at CASE 2021 Task 1: A General Graph-based Approach to Protest Event Extraction. 6 pages.
  • Show author(s) (2022). Annotating Norwegian language varieties on Twitter for Part-of-speech. 6 pages.
  • Show author(s) (2021). Using Gender- and Polarity-Informed Models to Investigate Bias. 9 pages.
  • Show author(s) (2021). The interplay between language similarity and script on a novel multi-layer Algerian dialect corpus. 13 pages.
  • Show author(s) (2021). NorDial: A Preliminary Corpus of Written Norwegian Dialect Use. 7 pages.
  • Show author(s) (2020). Named Entity Recognition without Labelled Data: A Weak Supervision Approach . 16 pages.
  • Show author(s) (2020). LTG-ST at NADI Shared Task 1: Arabic Dialect Identification using a Stacking Classifier. 7 pages.
  • Show author(s) (2020). Identifying Sentiments in Algerian Code-switched User-generated Comments. 8 pages.
  • Show author(s) (2020). Gender and sentiment, critics and authors: a dataset of Norwegian book reviews. 14 pages.
  • Show author(s) (2019). Measuring Diachronic Evolution of Evaluative Adjectives with Word Embeddings: the Case for English, Norwegian, and Russian. 8 pages.
  • Show author(s) (2019). Lexicon information in neural sentiment analysis: a multi-task learning approach. 12 pages.
  • Show author(s) (2018). NoReC: The Norwegian Review Corpus. 6 pages.
  • Show author(s) (2018). Automatic identification of unknown names with specific roles. 9 pages.
  • Show author(s) (2014). Constructions: a new unit of analysis for corpus-based discourse analysis . 11 pages.
  • Show author(s) (2021). Using Gender- and Polarity-informed Models to Investigate Bias.
  • Show author(s) (2018). Automatically identifying names of unrecognized politicians.
  • Show author(s) (2015). A computational approach to organize and analyze online communication data.
  • Show author(s) (2013). Applying Corpus Techniques to Climate Change Blogs.

More information in national current research information system (CRIStin)