Samia Touileb

Associate Professor, Natural Language Processing

Department of Information Science and Media Studies

E-mailsamia.touileb@uib.no
Visitor Address
Fosswinckels gate 6

Lauritz Meltzers hus

5007 Bergen

Room
516
Postal Address
Postboks 7802

5020 Bergen

Samia Touileb is an Associate Professor in Natural Language Processing (NLP). Prior to this she was a researcher in MediaFutures WP5 on Norwegian Language Technologies, and a Postdoc at the Language Technology Group (LTG), Department of Informatics, at the University of Oslo. She holds a PhD in NLP from the University of Bergen, and has been working within research in and applications of NLP for almost a decade.

Her main research interests are bias and fairness in NLP, information extraction, summarization, and applications of NLP and machine learning methods to tasks within social science research. She also mainly works on under- and mid-resourced languages such as Norwegian.

Academic article

Show author(s) (2023). Learning Horn envelopes via queries from language models. International Journal of Approximate Reasoning. 20 pages.

Show author(s) (2016). ADIOS LDA: When Grammar Induction Meets Topic Modeling. NIKT: Norsk IKT-konferanse for forskning og utdanning.

Show author(s) (2014). Inducing Information Structures for Data-driven Text Analysis. Association for Computational Linguistics (ACL). Annual Meeting Conference Proceedings.
Show author(s) (2014). Applying grammar induction to text mining. Association for Computational Linguistics (ACL). Annual Meeting Conference Proceedings. 712-717.

Lecture

Show author(s) (2023). The Societal and Ethical Implications of Language Models.
Show author(s) (2023). The Ethics of Large Language Models.
Show author(s) (2023). Sosiale og etiske utfordringer med språkmodeller som ChatGPT.
Show author(s) (2023). Når kunstig intelligens inntar redaksjonen.
Show author(s) (2023). Demystifying ChatGPT and language models.
Show author(s) (2023). ChatGPT: teknologien, datasettet, og det vi (ikke) vet.
Show author(s) (2023). ChatGPT: teknologien, datasettet, og det vi (ikke) vet.
Show author(s) (2023). ChatGPT & AI in education.
Show author(s) (2023). Big Science Gullgruve eller fallgruve?
Show author(s) (2023). Benchmarking the societal and ethical implications of large language model.

Show author(s) (2016). Getting to know large newsflows: Automatically induced information structures as keyphrases for news content analysis.

Show author(s) (2012). Networks of texts and people.

Popular scientific lecture

Show author(s) (2023). Store språkmodeller: muligheter og utfordringer.
Show author(s) (2023). Sosiale og etiske utfordringer med språkmodeller .
Show author(s) (2023). Hva er ChatGPT og hvordan fungerer det og lignende verktøy?
Show author(s) (2023). Blir vi overflødige? En samtale om kunstig intelligens og utdanning.

Academic lecture

Show author(s) (2023). Large Language models: What are they, and what are their ethical implications?

Show author(s) (2018). Operationalising Diversity for Big Data Policy Research.

Show author(s) (2017). Finding Voices in the Margins: Computer-Assisted Discovery of Naturally Belonging Names .

Show author(s) (2015). Computer supported deliberation and argumentation online. Proposing a system for online argumentation.

Show author(s) (2013). Inducing local grammars from n-grams.

Academic anthology/Conference proceedings

Show author(s) (2023). Proceedings of the 5th Symposium of the Norwegian AI Society (NAIS 2023). NAIS Norwegian Artificial Intelligence Society.

Show author(s) (2021). Proceedings of the Sixth Arabic Natural Language Processing Workshop. Association for Computational Linguistics.

Feature article

Show author(s) (2023). KI-dyret må mates med varsomhet. M24.
Show author(s) (2023). Chat GPT egner seg dårlig til eksamenssensuren. Morgenbladet.

Doctoral dissertation

Show author(s) (2017). Automatically Inducing Information Structures. A Text Mining Approach Based on the Distributional Hypothesis.

Interview

Show author(s) (2023). Kunstig intelligens: Krever åpenhet og integritet.

Academic chapter/article/Conference paper

Show author(s) (2023). NorBench – A Benchmark for Norwegian Language Models. 16 pages.
Show author(s) (2023). Measuring normative and descriptive biases in language models using census data.
Show author(s) (2023). Making sense of nonsense : Integrated gradient-based input reduction to improve recall for check-worthy claim detection. 13 pages.
Show author(s) (2023). JSEEGraph: Joint Structured Event Extraction as Graph Parsing.
Show author(s) (2023). Identifying Token-Level Dialectal Features in Social Media. 13 pages.
Show author(s) (2023). Automated Claim Detection for Fact-checking: A Case Study using Norwegian Pre-trained Language Models.
Show author(s) (2023). Arabic dialect identification: An in-depth error analysis on the MADAR parallel corpus. 15 pages.

Show author(s) (2022). Occupational Biases in Norwegian and Multilingual Language Models. 12 pages.
Show author(s) (2022). NorDiaChange: Diachronic Semantic Change Dataset for Norwegian. 10 pages.
Show author(s) (2022). NERDz: A Preliminary Dataset of Named Entities for Algerian. 7 pages.
Show author(s) (2022). Measuring Harmful Representations in Scandinavian Language Models. 8 pages.
Show author(s) (2022). Exploring the Effects of Negation and Grammatical Tense on Bias Probes . 7 pages.
Show author(s) (2022). EventGraph: Event Extraction as Semantic Graph Parsing. 9 pages.
Show author(s) (2022). EventGraph at CASE 2021 Task 1: A General Graph-based Approach to Protest Event Extraction. 6 pages.
Show author(s) (2022). Annotating Norwegian language varieties on Twitter for Part-of-speech. 6 pages.

Show author(s) (2021). Using Gender- and Polarity-Informed Models to Investigate Bias. 9 pages.
Show author(s) (2021). The interplay between language similarity and script on a novel multi-layer Algerian dialect corpus. 13 pages.
Show author(s) (2021). NorDial: A Preliminary Corpus of Written Norwegian Dialect Use. 7 pages.

Show author(s) (2020). Named Entity Recognition without Labelled Data: A Weak Supervision Approach . 16 pages.
Show author(s) (2020). LTG-ST at NADI Shared Task 1: Arabic Dialect Identification using a Stacking Classifier. 7 pages.
Show author(s) (2020). Identifying Sentiments in Algerian Code-switched User-generated Comments. 8 pages.
Show author(s) (2020). Gender and sentiment, critics and authors: a dataset of Norwegian book reviews. 14 pages.

Show author(s) (2019). Measuring Diachronic Evolution of Evaluative Adjectives with Word Embeddings: the Case for English, Norwegian, and Russian. 8 pages.
Show author(s) (2019). Lexicon information in neural sentiment analysis: a multi-task learning approach. 12 pages.

Show author(s) (2018). NoReC: The Norwegian Review Corpus. 6 pages.
Show author(s) (2018). Automatic identification of unknown names with specific roles. 9 pages.

Show author(s) (2014). Constructions: a new unit of analysis for corpus-based discourse analysis . 11 pages.

Poster

Show author(s) (2021). Using Gender- and Polarity-informed Models to Investigate Bias.

Show author(s) (2018). Automatically identifying names of unrecognized politicians.

Show author(s) (2015). A computational approach to organize and analyze online communication data.

Show author(s) (2013). Applying Corpus Techniques to Climate Change Blogs.

Chapter

Show author(s) (2024). Large Language Models and their usage in EAL education. 139-160. In:
- Show author(s) (2024). Current Issues in English Teaching. Fagbokforlaget.

More information in national current research information system (CRIStin)

OPINION COST action: https://www.cost.eu/actions/CA21129/

MediaFutures: https://mediafutures.no/2021/01/20/postdoc-samia-touileb/

NorDial: https://github.com/jerbarnes/nordial

Fields of competence

Artificial Intelligence

Computational linguistics

Information Science

Natural Language Processing

Research groups

Intelligent Information Systems (I2S)