Samia Touileb

Førsteamanuensis, Språkteknologi

Institutt for informasjons- og medievitenskap

E-postsamia.touileb@uib.no
Besøksadresse
Fosswinckels gate 6

Lauritz Meltzers hus

5007 Bergen

Rom
516
Postadresse
Postboks 7802

5020 Bergen

Samia Touileb er førsteamanuensis innen språkteknologi (Natural Language Processing på Engelsk). Før dette var hun forsker ved MediaFutures (WP5 -- norsk språkteknologi), og postdoktor ved Språkteknologigruppen (LTG), Institutt for informatikk ved Universitetet i Oslo. Hun har en doktorgrad i språkteknologi fra Universitetet i Bergen.

Hennes hoved forskningsinteresser inkluderer skjevhet og rettferdighet i modeller innen språkteknologi, informasjonsekstraksjon, automatisk generering av sammendrag, og anvendelser av språkteknologiske- og maskinlæringsmetoder innen samfunnsvitenskapelig forskning.

Vitenskapelig artikkel

Vis forfatter(e) (2023). Learning Horn envelopes via queries from language models. International Journal of Approximate Reasoning. 20 sider.

Vis forfatter(e) (2016). ADIOS LDA: When Grammar Induction Meets Topic Modeling. NIKT: Norsk IKT-konferanse for forskning og utdanning.

Vis forfatter(e) (2014). Inducing Information Structures for Data-driven Text Analysis. Association for Computational Linguistics (ACL). Annual Meeting Conference Proceedings.
Vis forfatter(e) (2014). Applying grammar induction to text mining. Association for Computational Linguistics (ACL). Annual Meeting Conference Proceedings. 712-717.

Faglig foredrag

Vis forfatter(e) (2023). The Societal and Ethical Implications of Language Models.
Vis forfatter(e) (2023). The Ethics of Large Language Models.
Vis forfatter(e) (2023). Sosiale og etiske utfordringer med språkmodeller som ChatGPT.
Vis forfatter(e) (2023). Når kunstig intelligens inntar redaksjonen.
Vis forfatter(e) (2023). Demystifying ChatGPT and language models.
Vis forfatter(e) (2023). ChatGPT: teknologien, datasettet, og det vi (ikke) vet.
Vis forfatter(e) (2023). ChatGPT: teknologien, datasettet, og det vi (ikke) vet.
Vis forfatter(e) (2023). ChatGPT & AI in education.
Vis forfatter(e) (2023). Big Science Gullgruve eller fallgruve?
Vis forfatter(e) (2023). Benchmarking the societal and ethical implications of large language model.

Vis forfatter(e) (2016). Getting to know large newsflows: Automatically induced information structures as keyphrases for news content analysis.

Vis forfatter(e) (2012). Networks of texts and people.

Populærvitenskapelig foredrag

Vis forfatter(e) (2023). Store språkmodeller: muligheter og utfordringer.
Vis forfatter(e) (2023). Sosiale og etiske utfordringer med språkmodeller .
Vis forfatter(e) (2023). Hva er ChatGPT og hvordan fungerer det og lignende verktøy?
Vis forfatter(e) (2023). Blir vi overflødige? En samtale om kunstig intelligens og utdanning.

Vitenskapelig foredrag

Vis forfatter(e) (2023). Large Language models: What are they, and what are their ethical implications?

Vis forfatter(e) (2018). Operationalising Diversity for Big Data Policy Research.

Vis forfatter(e) (2017). Finding Voices in the Margins: Computer-Assisted Discovery of Naturally Belonging Names .

Vis forfatter(e) (2015). Computer supported deliberation and argumentation online. Proposing a system for online argumentation.

Vis forfatter(e) (2013). Inducing local grammars from n-grams.

Vitenskapelig antologi/Konferanseserie

Vis forfatter(e) (2023). Proceedings of the 5th Symposium of the Norwegian AI Society (NAIS 2023). NAIS Norwegian Artificial Intelligence Society.

Vis forfatter(e) (2021). Proceedings of the Sixth Arabic Natural Language Processing Workshop. Association for Computational Linguistics.

Kronikk

Vis forfatter(e) (2023). KI-dyret må mates med varsomhet. M24.
Vis forfatter(e) (2023). Chat GPT egner seg dårlig til eksamenssensuren. Morgenbladet.

Doktorgradsavhandling

Vis forfatter(e) (2017). Automatically Inducing Information Structures. A Text Mining Approach Based on the Distributional Hypothesis.

Intervju

Vis forfatter(e) (2023). Kunstig intelligens: Krever åpenhet og integritet.

Vitenskapelig Kapittel/Artikkel/Konferanseartikkel

Vis forfatter(e) (2023). NorBench – A Benchmark for Norwegian Language Models. 16 sider.
Vis forfatter(e) (2023). Measuring normative and descriptive biases in language models using census data.
Vis forfatter(e) (2023). Making sense of nonsense : Integrated gradient-based input reduction to improve recall for check-worthy claim detection. 13 sider.
Vis forfatter(e) (2023). JSEEGraph: Joint Structured Event Extraction as Graph Parsing.
Vis forfatter(e) (2023). Identifying Token-Level Dialectal Features in Social Media. 13 sider.
Vis forfatter(e) (2023). Automated Claim Detection for Fact-checking: A Case Study using Norwegian Pre-trained Language Models.
Vis forfatter(e) (2023). Arabic dialect identification: An in-depth error analysis on the MADAR parallel corpus. 15 sider.

Vis forfatter(e) (2022). Occupational Biases in Norwegian and Multilingual Language Models. 12 sider.
Vis forfatter(e) (2022). NorDiaChange: Diachronic Semantic Change Dataset for Norwegian. 10 sider.
Vis forfatter(e) (2022). NERDz: A Preliminary Dataset of Named Entities for Algerian. 7 sider.
Vis forfatter(e) (2022). Measuring Harmful Representations in Scandinavian Language Models. 8 sider.
Vis forfatter(e) (2022). Exploring the Effects of Negation and Grammatical Tense on Bias Probes . 7 sider.
Vis forfatter(e) (2022). EventGraph: Event Extraction as Semantic Graph Parsing. 9 sider.
Vis forfatter(e) (2022). EventGraph at CASE 2021 Task 1: A General Graph-based Approach to Protest Event Extraction. 6 sider.
Vis forfatter(e) (2022). Annotating Norwegian language varieties on Twitter for Part-of-speech. 6 sider.

Vis forfatter(e) (2021). Using Gender- and Polarity-Informed Models to Investigate Bias. 9 sider.
Vis forfatter(e) (2021). The interplay between language similarity and script on a novel multi-layer Algerian dialect corpus. 13 sider.
Vis forfatter(e) (2021). NorDial: A Preliminary Corpus of Written Norwegian Dialect Use. 7 sider.

Vis forfatter(e) (2020). Named Entity Recognition without Labelled Data: A Weak Supervision Approach . 16 sider.
Vis forfatter(e) (2020). LTG-ST at NADI Shared Task 1: Arabic Dialect Identification using a Stacking Classifier. 7 sider.
Vis forfatter(e) (2020). Identifying Sentiments in Algerian Code-switched User-generated Comments. 8 sider.
Vis forfatter(e) (2020). Gender and sentiment, critics and authors: a dataset of Norwegian book reviews. 14 sider.

Vis forfatter(e) (2019). Measuring Diachronic Evolution of Evaluative Adjectives with Word Embeddings: the Case for English, Norwegian, and Russian. 8 sider.
Vis forfatter(e) (2019). Lexicon information in neural sentiment analysis: a multi-task learning approach. 12 sider.

Vis forfatter(e) (2018). NoReC: The Norwegian Review Corpus. 6 sider.
Vis forfatter(e) (2018). Automatic identification of unknown names with specific roles. 9 sider.

Vis forfatter(e) (2014). Constructions: a new unit of analysis for corpus-based discourse analysis . 11 sider.

Poster

Vis forfatter(e) (2021). Using Gender- and Polarity-informed Models to Investigate Bias.

Vis forfatter(e) (2018). Automatically identifying names of unrecognized politicians.

Vis forfatter(e) (2015). A computational approach to organize and analyze online communication data.

Vis forfatter(e) (2013). Applying Corpus Techniques to Climate Change Blogs.

Faglig kapittel

Vis forfatter(e) (2024). Large Language Models and their usage in EAL education. 139-160. I:
- Vis forfatter(e) (2024). Current Issues in English Teaching. Fagbokforlaget.

Se fullstendig oversikt over publikasjoner i CRIStin.

OPINION COST action: https://www.cost.eu/actions/CA21129/

MediaFutures: https://mediafutures.no/2021/01/20/postdoc-samia-touileb/

NorDial: https://github.com/jerbarnes/nordial

Kompetansefelt

Automatisert tekstanalyse

Computational linguistics

Informasjonsvitenskap

Kunstig intelligens

Natural Language Processing

Språkteknologi

Forskergrupper

Intelligente informasjonssystemer (I2S)