Capitalizing on genre-based corpora with the use of the AI-powered research tool Notebook LM

Contenido principal del artículo

Belén Labrador
https://orcid.org/0000-0002-3341-1661

Resumen

Abstract: Syntagmatic relations have always been at the core of corpus linguistics, where the most important element to study is the co-text, mainly collocates and patterns identified in concordance lines. Without abandoning this approach, the advent of artificial intelligence (AI) has opened new possibilities for exploiting corpora beyond traditional browser and concordancer functionalities, as AI-powered research tools offer a deeper, more holistic understanding of corpora. This paper explores the potential of Notebook LM, a generative AI (GenAI) application, to complement data retrieved by traditional corpus-analysis toolkits. Our initial hypothesis is that combining corpus and GenAI approaches leads to a more comprehensive understanding of specific genre-based texts and the language used within them. A key advantage of Notebook LM over large language models (LLMs) like ChatGPT is that it processes only uploaded texts, giving researchers precise control over the sources used. As a case study, a comparable English-Spanish corpus of online cheese descriptions was used. It contains 400 Spanish texts (121,461 words) and 600 English texts (111,871 words). This corpus was fed into Notebook LM, and its functions were used to summarize, explain, and retrieve key themes such as historical significance and production methods. The power of prompting within Notebook LM is also demonstrated, suggesting that GenAI tools are more suitable for qualitative than quantitative analysis in this context. Comparing writing conventions in both languages revealed distinct differences: English descriptions tend to be more concise and informative, while Spanish texts use more subjective, evocative language, often incorporating cultural context and a more enthusiastic tone. These findings can assist in second-language writing and translation training by pinpointing the appropriate stylistic characteristics of promotional texts in English and Spanish. In conclusion, Notebook LM provides complementary affordances which, when combined with other corpus-analysis applications, constitute a powerful tool for leveraging genre-based corpora.

Descargas

Los datos de descargas todavía no están disponibles.

Detalles del artículo

Cómo citar
Labrador, B. (2026). Capitalizing on genre-based corpora with the use of the AI-powered research tool Notebook LM. Alfinge. Revista De Filología, 37, pp. 51–75. https://doi.org/10.21071/arf.v37i.18449
Sección
Monografías

Citas

AIJMER, Karin; ALTENBERG, Bengt; JOHANSSON, Mats (eds.), Languages in contrast. Papers from a symposium on text-based cross-linguistic studies. Lund 4-5 March, 1994. Lund: Lund University Press, 1996.

ALONSO-GUISANDE, Miguel Ángel; LÓPEZ FRAILE, Luis Antonio, “Herramientas de inteligencia artificial generativa aplicadas en la producción de podcasts”. In: Edu Review, 12, 2, 2024, pp. 19-32. Accessed at:

https://doi.org/10.62701/revedu.v12.5409 DOI: https://doi.org/10.62701/revedu.v12.5409

BIBER, Douglas; CONRAD, Susan; REPPEN, Randi, Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press, 1998. DOI: https://doi.org/10.1017/CBO9780511804489

CROSTHWAITE, Peter; BAISA, Vit, “Generative AI and the end of corpus-assisted data-driven learning? Not so fast!”. In: Applied Corpus Linguistics, 3, 3, 2023. Accessed at: https://doi.org/10.1016/j.acorp.2023.100066 DOI: https://doi.org/10.1016/j.acorp.2023.100066

CROSTHWAITE, Peter (ed.), Corpora for language learning: Bridging the Research-Practice Divide. London/ New York: Routledge, 2024. DOI: https://doi.org/10.4324/9781003413301

CURRY, Niall; BAKER, Paul; BROOKES, Gavin, “Generative AI for corpus approaches to discourse studies: A critical evaluation of ChatGPT”. In: Applied Corpus Linguistics, 4, 12024, 2024. Accessed at:

https://doi.org/10.1016/j.acorp.2023.100082 DOI: https://doi.org/10.1016/j.acorp.2023.100082

FILLMORE, Charles J., ““Corpus linguistics” or “Computer-aided armchair linguistics””. In: Svartvik, Jan (ed.), Directions in Corpus Linguistics: Proceedings of Nobel Symposium 82 Stockholm, 4-8 August 1991. Berlin/ New York: De Gruyter Mouton, 1992, pp. 35-60. Accessed at:

https://doi.org/10.1515/9783110867275.35 DOI: https://doi.org/10.1515/9783110867275.35

GADSDEN, Amy Dominique; GOEGAN, Lauren. Denise, “Informing Inclusive Practice in Post-Secondary Environments: Perspectives of Post-Secondary Instructors with Learning Disabilities”. In: The Canadian Journal for the Scholarship of Teaching and Learning, 14, 2, 2023. Accessed at:

https://doi.org/10.5206/cjsotlrcacea.2023.2.8020 DOI: https://doi.org/10.5206/cjsotlrcacea.2023.2.8020

HOEY, Michael, Lexical Priming: A New Theory of Words and Language. London/New York: Routledge, 2005.

JOHANSSON, STIG; OKSEFJELL, SIGNE (eds.), Corpora and cross-linguistic research. Theory, method and case studies. Amsterdam/ New York: Rodopi, 1998. DOI: https://doi.org/10.1163/9789004653665

LABRADOR, Belén; RAMÓN, Noelia, “Positive Evaluation in the Translation of Online Promotional Discourse in the Cheese Industry”. In: IEEE Transactions on Professional Communication, 67, 3, 2024, pp. 316-332. Accessed at:

https://doi.org/10.1109/TPC.2024.3417056 DOI: https://doi.org/10.1109/TPC.2024.3417056

——, “‘Perfectly Smooth, Creamy and Full Flavoured’: Online Cheese Descriptions”. In Procedia - Social and Behavioral Sciences, 198, 2015, pp. 226-232. Accessed at:

https://doi.org/10.1016/j.sbspro.2015.07.440 DOI: https://doi.org/10.1016/j.sbspro.2015.07.440

MORIÑA, Anabel; CARBALLO, Rafael; DOMÉNECH, Ana, “Transforming higher education: a systematic review of faculty training in UDL and its benefits”. In: Teaching in Higher Education, 2025, pp. 1-18. Accessed at:

https://doi.org/10.1080/13562517.2025.2465994 DOI: https://doi.org/10.1080/13562517.2025.2465994

MEHTA, Neil; AGRAWAL, Anoop; BENJAMIN, Jennifer; MEHTA, Seysha, MACNEILL, Heather; MASTERS, Ken, “Pedagogy and generative artificial intelligence: Applying the PICRAT model to Google NotebookLM”. In: Medical Teacher, 47, 5, 2024, pp. 788-790. Accessed at:

https://doi.org/10.1080/0142159X.2024.2418937 DOI: https://doi.org/10.1080/0142159X.2024.2418937

OKSEFJELL EBELING, Signe, “Corpus-based contrastive studies and AI-generated translations”. In: Languages in Contrast, 25, 2, 2025, pp. 289-315. Accessed at: https://doi.org/10.1075/lic.00051.ebe DOI: https://doi.org/10.1075/lic.00051.ebe

RAMÓN, Noelia; LABRADOR, Belén, “Describing cheese in English and Spanish: A corpus-based contrastive study”. In: 8th International Conference on Corpus Linguistics (CILC 2016). Málaga, 2-4 March 2016.

—— “Selling cheese online: key nouns in cheese descriptions”. In: Terminology, 2018, pp. 210-235. Accessed at:

https://doi.org/10.1075/term.00019.ram DOI: https://doi.org/10.1075/term.00019.ram

——, “Comparing the expression of quality assurance in English and Spanish online cheese descriptions” Presented in: 16th International Conference on Corpus Linguistics Meaning in Corpus: Tools, Methods and Approaches to 'Aboutness’. Salamanca, 15-17 May 2025.

SANZ-VALDIVIESO, Lucía; LÓPEZ-ARROYO, Belén, “Human vs. ChatGPT corpus annotation: Data augmentation using LLM fine-tuning”. In: Rabadán, Rosa; Ramón, Noelia (eds.), Cross-linguistic mediated communication. Hybrid text production English-Spanish. Berlin: Peter Lang, 2025.

SINCLAIR, John, Corpus, Concordance, Collocation. Oxford: Oxford University Press, 1991.

SVARVTIK, Jan (ed.), Directions in Corpus Linguistics. Proceedings of Nobel Symposium 82 Stockholm, 4-8 August 1991. Berlin/ New York: Mouton de Gruyter, 1992. DOI: https://doi.org/10.1515/9783110867275

TORRENT, Tiago; HOFFMANN, Thomas; LORENZI, Arthur; TURNER Mark, Copilots for Linguists: AI, Constructions, and Frames. Cambridge: Cambridge University Press, 2024. DOI: https://doi.org/10.1017/9781009439190