Samia Touileb
- E-mailsamia.touileb@uib.no
- Visitor AddressFosswinckels gate 6Lauritz Meltzers hus5007 BergenRoom516
- Postal AddressPostboks 78025020 Bergen
Samia Touileb is an Associate Professor in Natural Language Processing (NLP). Prior to this she was a researcher in MediaFutures WP5 on Norwegian Language Technologies, and a Postdoc at the Language Technology Group (LTG), Department of Informatics, at the University of Oslo. She holds a PhD in NLP from the University of Bergen, and has been working within research in and applications of NLP for almost a decade.
Her main research interests are bias and fairness in NLP, information extraction, summarization, and applications of NLP and machine learning methods to tasks within social science research. She also mainly works on under- and mid-resourced languages such as Norwegian.
- (2023). Learning Horn envelopes via queries from language models. International Journal of Approximate Reasoning. 20 pages.
- (2016). ADIOS LDA: When Grammar Induction Meets Topic Modeling. NIKT: Norsk IKT-konferanse for forskning og utdanning.
- (2014). Inducing Information Structures for Data-driven Text Analysis. Association for Computational Linguistics (ACL). Annual Meeting Conference Proceedings.
- (2014). Applying grammar induction to text mining. Association for Computational Linguistics (ACL). Annual Meeting Conference Proceedings. 712-717.
- (2023). The Societal and Ethical Implications of Language Models.
- (2023). The Ethics of Large Language Models.
- (2023). Sosiale og etiske utfordringer med språkmodeller som ChatGPT.
- (2023). Når kunstig intelligens inntar redaksjonen.
- (2023). Demystifying ChatGPT and language models.
- (2023). ChatGPT: teknologien, datasettet, og det vi (ikke) vet.
- (2023). ChatGPT: teknologien, datasettet, og det vi (ikke) vet.
- (2023). ChatGPT & AI in education.
- (2023). Big Science Gullgruve eller fallgruve?
- (2023). Benchmarking the societal and ethical implications of large language model.
- (2016). Getting to know large newsflows: Automatically induced information structures as keyphrases for news content analysis.
- (2012). Networks of texts and people.
- (2023). Store språkmodeller: muligheter og utfordringer.
- (2023). Sosiale og etiske utfordringer med språkmodeller .
- (2023). Hva er ChatGPT og hvordan fungerer det og lignende verktøy?
- (2023). Blir vi overflødige? En samtale om kunstig intelligens og utdanning.
- (2023). Large Language models: What are they, and what are their ethical implications?
- (2018). Operationalising Diversity for Big Data Policy Research.
- (2017). Finding Voices in the Margins: Computer-Assisted Discovery of Naturally Belonging Names .
- (2015). Computer supported deliberation and argumentation online. Proposing a system for online argumentation.
- (2013). Inducing local grammars from n-grams.
- (2023). Proceedings of the 5th Symposium of the Norwegian AI Society (NAIS 2023). NAIS Norwegian Artificial Intelligence Society.
- (2021). Proceedings of the Sixth Arabic Natural Language Processing Workshop. Association for Computational Linguistics.
- (2023). KI-dyret må mates med varsomhet. M24.
- (2023). Chat GPT egner seg dårlig til eksamenssensuren. Morgenbladet.
- (2017). Automatically Inducing Information Structures. A Text Mining Approach Based on the Distributional Hypothesis.
- (2023). Kunstig intelligens: Krever åpenhet og integritet.
- (2023). NorBench – A Benchmark for Norwegian Language Models. 16 pages.
- (2023). Measuring normative and descriptive biases in language models using census data.
- (2023). Making sense of nonsense : Integrated gradient-based input reduction to improve recall for check-worthy claim detection. 13 pages.
- (2023). JSEEGraph: Joint Structured Event Extraction as Graph Parsing.
- (2023). Identifying Token-Level Dialectal Features in Social Media. 13 pages.
- (2023). Automated Claim Detection for Fact-checking: A Case Study using Norwegian Pre-trained Language Models.
- (2023). Arabic dialect identification: An in-depth error analysis on the MADAR parallel corpus. 15 pages.
- (2022). Occupational Biases in Norwegian and Multilingual Language Models. 12 pages.
- (2022). NorDiaChange: Diachronic Semantic Change Dataset for Norwegian. 10 pages.
- (2022). NERDz: A Preliminary Dataset of Named Entities for Algerian. 7 pages.
- (2022). Measuring Harmful Representations in Scandinavian Language Models. 8 pages.
- (2022). Exploring the Effects of Negation and Grammatical Tense on Bias Probes . 7 pages.
- (2022). EventGraph: Event Extraction as Semantic Graph Parsing. 9 pages.
- (2022). EventGraph at CASE 2021 Task 1: A General Graph-based Approach to Protest Event Extraction. 6 pages.
- (2022). Annotating Norwegian language varieties on Twitter for Part-of-speech. 6 pages.
- (2021). Using Gender- and Polarity-Informed Models to Investigate Bias. 9 pages.
- (2021). The interplay between language similarity and script on a novel multi-layer Algerian dialect corpus. 13 pages.
- (2021). NorDial: A Preliminary Corpus of Written Norwegian Dialect Use. 7 pages.
- (2020). Named Entity Recognition without Labelled Data: A Weak Supervision Approach . 16 pages.
- (2020). LTG-ST at NADI Shared Task 1: Arabic Dialect Identification using a Stacking Classifier. 7 pages.
- (2020). Identifying Sentiments in Algerian Code-switched User-generated Comments. 8 pages.
- (2020). Gender and sentiment, critics and authors: a dataset of Norwegian book reviews. 14 pages.
- (2019). Measuring Diachronic Evolution of Evaluative Adjectives with Word Embeddings: the Case for English, Norwegian, and Russian. 8 pages.
- (2019). Lexicon information in neural sentiment analysis: a multi-task learning approach. 12 pages.
- (2018). NoReC: The Norwegian Review Corpus. 6 pages.
- (2018). Automatic identification of unknown names with specific roles. 9 pages.
- (2014). Constructions: a new unit of analysis for corpus-based discourse analysis . 11 pages.
- (2021). Using Gender- and Polarity-informed Models to Investigate Bias.
- (2018). Automatically identifying names of unrecognized politicians.
- (2015). A computational approach to organize and analyze online communication data.
- (2013). Applying Corpus Techniques to Climate Change Blogs.
- (2024). Large Language Models and their usage in EAL education. 139-160. In:
- (2024). Current Issues in English Teaching. Fagbokforlaget.
More information in national current research information system (CRIStin)
OPINION COST action: https://www.cost.eu/actions/CA21129/
MediaFutures: https://mediafutures.no/2021/01/20/postdoc-samia-touileb/
NorDial: https://github.com/jerbarnes/nordial