Samia Touileb participates in the NTAP (Networks of Texts and People) project. The project is developing methods and tools to detect, analyse and visualize the distribution, flow and development of knowledge and opinions across online social networks. NTAP is funded by the Norwegian Research Council, VERDIKT program, and runs from January 2012 – July 2015.
Samia will be working on the Natural Language Processing (NLP) part of the project. She will examine how statements and opinions are expressed in blogs and this will be investigated through the development of:
1. methods that will enable the automatic induction of the discursive structures present in blogs, by using some grammar induction techniques. Grammar induction is the process of generating a grammar (usually in the form of re-write rules or productions) from the observations of a given corpus.
2. unsupervised methods to extract data about the occurrence of statements and opinions. Information extraction is the process of extracting automatically pre-specified types of information (set of entities, relations or events) from natural language texts; and to save and represent this information in structures called templates.
Our main idea is to use grammar induction to create statement extraction templates that capture the typical expressions around a topic. We want our technique to be as portable as possible between topics and languages.
This resulting data will form part of visualizations to be developed by others in the project, in order to elucidate the social and epistemological contexts of information in social media, e.g. for general users seeking information, for social science research and for media monitoring companies interested in information diffusion.
More details about the project can be found at www.ntap.no