SIMPA: 100634 Model 5G pp. 1–4 (col. fig: NIL) Software Impacts xxx (xxxx) xxx 1 2345678910111213141516 B 17181920212223242526272829303132Contents lists available at ScienceDirect Software Impacts journal homepage: www.journals.elsevier.com/software-impacts Original Software PublicationgtrendsAPI: An R wrapper for the Google Trends APIRicardo A. Correia ∗ Biodiversity Unit, University of Turku, 20014 Turku, FinlandHelsinki Lab of Interdisciplinary Conservation Science (HELICS), Department of Geosciences and Geography, University of Helsinki, 00014 Helsinki, FinlandHelsinki Institute of Sustainability Science (HELSUS), University of Helsinki, 00014 Helsinki, Finland A R T I C L E I N F O Keywords:Google TrendsSearch engineApplication programming interfaceInternet searchesData access A B S T R A C T Search engine data is a prime source of insights on the human information-seeking behaviour and suchinformation is instrumental for the scientific study of human culture and behaviour. The gtrendsAPI R softwarepackage aims to facilitate programmatic access to data available from the Google Trends API. Here, I introducethe functions available through this software and provide worked examples of how to use it. I also discusssome the potential research applications and caveats of this software and the data available through it. Code metadata Current code version 1.0.0Permanent link to code/repository used for this code version https://github.com/SoftwareImpacts/SIMPAC-2024-32Permanent link to reproducible capsuleLegal code license MIT licenseCode versioning system used GitSoftware code languages, tools and services used RCompilation requirements, operating environments and dependencies jsonlite (>= 1.5), httr (>= 1.4.1)If available, link to developer documentation/manual https://github.com/racorreia/gtrendsAPI/README.mdSupport email for questions raheco@utu.fi 1. Introduction In an increasingly digitized world, the growing availability of datagenerated online is providing new opportunities for scientific research.Estimates suggest about two-thirds of the global population has accessto the internet nowadays [1] and many people use digital onlineplatforms such as search engines, social media and video streamingapplications on a daily basis. These habits are providing a new wealthof data about people’s interests, preferences and opinions that cangenerate unique insights on contemporary societies [2].Search engine data is a particularly useful source of data in thiscontext because it reflects people’s information-seeking behaviour, andis thus associated with a conscious effort to acquire information abouta topic that is considered of interest or for which there is a perceivedinformation gap [3]. While there are multiple search engines available,and several that provide access to data, chief among all is Google’ssearch engine. Google Search has an estimated 80%–90% share of the The code (and data) in this article has been certified as Reproducible by Code Ocean: (https://codeocean.com/). More information on the Reproducibilityadge Initiative is available at https://www.elsevier.com/physical-sciences-and-engineering/computer-science/journals. ∗ Correspondence to: Biodiversity Unit, University of Turku, 20014 Turku, Finland.E-mail address: raheco@utu.fi. global search engine market [4] and is the dominant search engine inthe large majority of countries across the world, with the exception ofa few countries such as Russia or China. The Google Trends platformprovides access to summary data from searches carried out throughGoogle Search. Google Trends data has been widely used for researchin various fields of science, including for example medicine [e.g.,[5,6]], economics [e.g., [7,8]], political studies [e.g., [9,10]], psychol-ogy [e.g., [11,12]], environmental sciences [e.g., [13,14]], biodiversityconservation [e.g., [15,16]] and sustainability [17].Data from Google Trends is most commonly accessed through adedicated website (https://trends.google.com/trends/). This platformallows users to query, visually explore and extract search volume data.Global data is available from 2004 onwards and users can select thetime period, region, and keyword or topic of interest, but only up to fivekeywords or topics can be queried simultaneously. This is a limitationfor work that requires multiple queries for a large number of keywordsPlease cite this article as: R.A. Correia, gtrendsAPI: An R wrapper for the Google Trends API, Software Impacts (2024) 100634, https://doi.org/10.1016/j.simpa.2024.100634. https://doi.org/10.1016/j.simpa.2024.100634Received 5 February 2024; Received in revised form 21 February 2024; Accepted 16 March 2024 2665-9638/© 2024 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license(http://creativecommons.org/licenses/by/4.0/). SIMPA: 100634 R.A. Correia Software Impacts xxx (xxxx) xxx 12345678910111213 14 151617181920212223 t2425262728293031 3233343536373839404142434445464748495051525354555657585960 61 6263Fig. 1. Schematic overview of the gTrendsAPI software structure. The figure outlines the seven functions available in the software, each designed to access a specific API datanode, and how they relate to the output available through the Google Trends website. or topics, as implementing various combinations of searches manuallycan be straining and time consuming. One alternative is to access thedata programmatically through an Application Programming Interface(API). It is possible to request access to the Google Trends API forresearch purposes, but there is a limited number of software packagesallowing access to this data and they are restricted to a few program-ming languages. Here, I introduce the gtrendsAPI software packagefor R programming language which aims to facilitate access to GoogleTrends data through the official API. This programming language iswidely used by scientists in ecology, environmental sciences, and otheraffiliated areas, and the software may be particularly useful to scientistsengaging in culturomics [18] or iEcology [19] research where thesedate are frequently used. 2. Software description The gtrendsAPI software package contains seven functions, each de-signed to interact with a specific Google Trends API data node (Fig. 1).The software includes functions to access a time series of relative searchvolume data for a given query (getGraph function) or the average searchvolume for the terms included in the query (getGraphAverages function)over a time period of interest. It also includes a function to extractrelative search volume data at the country or region level based on aquery of interest (getRegions function). Finally, it also includes functionsto extract the top (getTopTopics function) or rising (getRisingTopics func-ion) topics associated, and the top (getTopQueries function) or rising(getRisingQueries function) queries associated with a query of interest.As depicted in Fig. 1, each function provides access to data that isavailable through the Google Trends dedicated website and thus it ispossible to collect the same data using either approach.Each function in the gtrendsAPI package contains the same setof key parameters that users need to define to query the API. The‘terms’ parameter identifies the keywords or topics to be queried. The Google Trends API accepts both keyword-based queries (e.g., ‘‘ap-ple’’) or topic-based queries using Google Knowledge Graph identifiers(e.g., ‘‘/m/014j1m’’ is the identifier used for the fruit apple). Userscan query the Google Knowledge Graph API, for example through thegkgraph R package [20] to identify the identifiers of their topics of inter-est. The ‘geo’ parameter identifies a country or region of interest for thequery. This parameter is set to NULL by default, which queries the APIfor worldwide data, but it is also possible to use country or region codesto define the scope of the search. A list of valid country and regioncodes is available in the gtrendsAPI package (see Section 3. Illustratedexamples for details). The ‘startDate’ and ‘endDate’ parameters definethe temporal scope of the query. These parameters take year and monthinformation as input, in the form of a ‘‘YYYY-MM’’ string, and bothparameters are set to NULL by default representing January 2004 andthe start date and the current month and year as end date. Finally,the ‘api.key’ parameter defines the user’s API key necessary to obtainaccess to the API. Use of the API is currently restricted to users withapproved access, and at the time of writing access can be requestedfor research purposes through an online form (available here: https://support.google.com/trends/contact/trends_api). All functions containalso the ‘property’ and ‘category’ parameters that users can use toidentify the property of interest or filter results to a specific category.The ‘property’ parameter defaults to web searches, and the ‘category’parameters defaults to an unfiltered query [as recommended; see [21]].Using any of the functions once is equivalent to one API calland thus consumes one API quota. While it is possible to implementmultiple calls in succession, thus allowing for swift and automateddata extraction, users should be mindful of their respective daily andmonthly API quota limits as these will delimit data access. 3. Illustrative example To use the gtrendsAPI package, users must first install the packagefrom its Github repository. This is possible to do directly from within 2 SIMPA: 100634 R.A. Correia Software Impacts xxx (xxxx) xxx 1234567891011121314151617181920212223242526272829303132333435363738394041424344 45 46474849505152 5354555657585960616263646566676869707172737475767778798081828384 85 868788899091929394959697 98 99100Table 1Example of the summary output obtained by calling function getGraph. The output includes information of the relative search volume (column‘value’) for a given search term (column ‘keyword’), time (column ‘date’) and geography (column ‘geo’). The remaining columns hold informationabout the scope of the call, including the requested time period of interest (column ‘time’), property (column ‘gprop’) and category (column‘category’). Note that the figures representing search volume in the example below were randomly generated and do not represent actual datafrom Google Trends.Value Date Geo Time Keyword Gprop Category 31 2004-01-01 world 2004-01 2024-01 apple web All categories71 2004-02-01 world 2004-01 2024-01 apple web All categories32 2004-03-01 world 2004-01 2024-01 apple web All categories69 2004-04-01 world 2004-01 2024-01 apple web All categories95 2004-05-01 world 2004-01 2024-01 apple web All categories the R environment by running the following code (please not that youmay need to install also the R package devtools beforehand):R> # install.packages(‘‘devtools’’)R> devtools::install_github(‘‘racorreia/gtrendsAPI’’, build_vignettes = T)It is then possible to load the package by calling the following code:R> library(gtrendsAPI)After loading the package, users will also need to record theirGoogle Trends API key to allow access to the API. This is possible to doso directly in the code by creating an object with the key, or by creatinga environmental variable in the system. The latter approach is advisedas it allows the user to share their code without revealing the API key.These approaches are exemplified below:R> # Create Object with API keyR> key <- ‘‘YOUR_API_KEY’’R> # Create an environmental variable with the API keyR> Sys.setenv(GTkey = ‘‘YOUR_API_KEY’’)R> # Call the key from the set of environmental variables and store itin an objectR> key <- Sys.getenv(‘‘GTkey’’)It is then possible to call any of the functions in the gtrendsAPIpackage by simply defining the search terms and API key. For example,if a user was interested in getting a time series of relative search volumefor the term ‘‘apple’’ it would be possible to do so using the followingcode:R> # Get time series of search volume for the keyword ‘‘apple’’ andstore in an objectR> apple_keyword <- getGraph(terms = ‘‘apple’’, api.key = key)R> # Check the output of the callR>head(apple_keyword)The above code would print the first lines of the function output asstored in the object, which is exemplified below in Table 1:It is also possible to call the function using an identifier, or a listof up to five keywords and/or identifiers, and to define additionalparameters as exemplified below:R> # Get time series of search volume for the topic ‘‘apple (fruit)’’ forthe year 2020 and store in an objectR> apple_topic <- getGraph(terms = ‘‘/m/014j1m’’, startDate = ‘‘2020-01’’, endDate = ‘‘2020-12’’, api.key = key)R> # Get time series of search volume for the keywords ‘‘apple’’ and‘‘pear’’ targeting news and store in an objectR> apple_vs_pear <- getGraph(terms = c(‘‘apple’’, ‘‘pear’’), property =‘‘news’’, api.key = key)The same approach can be used to customize the call of any otherfunction included in the gtrendsAPI R package. 4. Impact The use of Google Trends data in scientific research has grownsteadily since the platform first became available, and has expandedto numerous fields of inquiry covering the natural and social sci-ences [22]. It has for example been widely used after the onset of theCOVID-19 pandemic to explore disease incidence dynamics [5,6] andthe impact of the pandemic on people’s interests and well-being [12,23, programmatic access to the Google Trends API, thereby streamliningdata access and enabling the integration of data extraction and analysispipelines. It contains a set of custom functions that provide access to theexisting Google Trends API nodes and allow access to data pertaining totemporal and spatial patterns of search interest for a query of interest,and to topics and terms associated with that query. These features are anovel functionality in R software as currently there seems to be no othersoftware package available that is designed to interact directly withthe Google Trends API, but they parallel the functionalities of softwareavailable for other programming languages such as Python. Some ofthe functions available in the gtrendsAPI package have already been tosupport scientific research, for example to obtain data on search interesttowards hundreds of animal and plant species [e.g., [25,26]].Still, there are a few limitations that users of this software shouldtake into account. Unlike other software packages that exploit the APIthat was designed for the Google Trends website, this package wasdesigned to interact with a separate and dedicated API but is onlyavailable to users that have secured access to it. Users of this softwareshould therefore abide by the terms of use of the Google Trends APIand must be mindful of any limitations they impose, for examplewith regards to commercial applications or sharing any data that weredirectly derived from the API. Time series of relative search volumedata are particularly suited to the analysis of temporal trends in searchinterest, but users should take care in validating the data represent theirtopic of interest [27] and in interpreting the observed trends [28,29].It should also be noted that Google supports and maintains a set offunctions designed to interact with the Google Trends API in Pythonand that users should refer to this software for future updates. Finally,this package is not designed to provide access only to the Google TrendsAPI and not to the Google Health Trends API which provides accessto a different dataset. Users interested in the latter API can see forexample [30]. 5. Conclusions As access to the internet continues to grow worldwide, search en-gine data is likely to remain a prime source of insights on the interests,opinions, and preferences of people from across the world as reflectedthrough their information-seeking behaviour. This information can beof great utility to a broad range of research areas, particularly thoserelated to the scientific study of human culture and behaviour [31]. ThegtrendsAPI package aims to facilitate and further stimulate research inthese fields by providing programmatic access to data available fromthe Google Trends API using the R programming language. These datarepresent searches carried out through Google Search, the most widelyused search engine across the world, and are thus uniquely suited toexplore large-scale patterns of information-seeking behaviour. CRediT authorship contribution statement Ricardo A. Correia: Writing – review & editing, Writing – origi-nal draft, Visualization, Validation, Software, Resources, Methodology,24]. This package aims to further stimulate its uptake by facilitating Funding acquisition, Conceptualization. 101 3 SIMPA: 100634 R.A. Correia Software Impacts xxx (xxxx) xxx 1 234 5 6789 10 1112131415161718192021222324252627282930313233343536 373839404142434445464748495051525354555657585960616263646566676869707172737475767778Declaration of competing interest The authors declare that they have no known competing finan-cial interests or personal relationships that could have appeared toinfluence the work reported in this paper. Acknowledgements RAC acknowledges support from the Academy of Finland (Grantagreement #348352) and the KONE Foundation (Grant agreement#202101976). The author would also like to thank Google for makingthe Google Trends tool and API available to researchers. References [1] International Communication Union, Measuring Digital Development, Facts andFigures 2022, International Communication Union, 2022.[2] R.A. Correia, et al., Digital data sources and methods for conservationculturomics, Conserv. Biol. 35 (2021) 398–411.[3] D.O. Case, L.M. Given, Looking for Information: A Survey of Research onInformation Seeking, Needs, and Behavior, fourth ed., Emerald, 2016.[4] StatCounter, Market share of leading desktop search engines worldwidefrom 2015 to 2023, 2023, Statista. [Online]. Available: https://www.statista.com/statistics/216573/worldwide-market-share-of-search-engines/. (Accessed 2January 2024).[5] L.B. Amusa, et al., Modeling COVID-19 incidence with Google Trends, Front.Res. Metr. Anal. 7 (2022).[6] A. Mavragani, K. Gkillas, COVID-19 predictability in the United States usingGoogle Trends time series, Sci. Rep. 10 (20693) (2020).[7] D. Borup, E.C.M. Schütte, In search of a job: Forecasting employment growthusing google trends, J. Bus. Econom. Statist. 40 (2022) 186–200.[8] H. Choi, H. Varian, Predicting the present with google trends, Econ. Rec. 88(2012) 2–9.[9] J. Mellon, Internet search data and issue salience: The properties of google trendsas a measure of issue salience, J. Elect. Public Opin. Parties 24 (2014) 45–72.[10] J. Mellon, Where and when can we use google trends to measure issuesalience? PS: Political Sci. Politics 46 (2013) 280–290.[11] R.A. Correia, S. Mammola, The searchscape of fear: A global analysis of internetsearch trends for biophobias, People Nat. (2023).[12] E.A. Halford, et al., Google searches for suicide and suicide risk factors in theearly stages of the COVID-19 pandemic, PLoS One 15 (2020) e0236777. [13] W.R.L. Anderegg, G.R. Goldsmith, Public interest in climate change over thepast decade and the effects of the ‘climategate’ media event, Environ. Res. Lett.9 (2014) 054005.[14] J. Kam, et al., Monitoring of drought awareness from google trends: A case studyof the 2011–17 california drought, Weather Clim. Soc. 11 (2019) 419–429.[15] G.H. de Oliveira Caetano, et al., Evaluating global interest in biodiversity andconservation, Conserv. Biol. 37 (2023) e14100.[16] L.T.P. Nghiem, et al., Analysis of the capacity of google trends to measureinterest in conservation topics and the role of online news, PLoS One 11 (2016)e0152802.[17] R.A. Correia, E. Di Minin, Tracking worldwide interest in sustainabledevelopment goals using culturomics, PLOS Sustain. Transf. 2 (2023) e0000070.[18] R.J. Ladle, et al., Conservation culturomics, Front. Ecol. Environ. 14 (2016)269–275.[19] Jarić I., et al., iEcology: Harnessing large online resources to generate ecologicalinsights, Trends Ecol. Evol. (2020) http://dx.doi.org/10.1016/j.tree.2020.03.003.[20] R. Correia, gkgraphR: Accessing the official google knowledge graph API, in: RPackage Version 1.0.2, Zenodo, 2021.[21] A. Van Huynh, Use of google trends categories in conservation culturomics,Conserv. Biol. 37 (2023) e14103.[22] S.-P. Jun, et al., Ten years of research change using Google Trends: From theperspective of big data utilizations and applications, Technol. Forecast. Soc.Change 130 (2018) 69–87.[23] A. Brodeur, et al., COVID-19, lockdowns and well-being: Evidence from GoogleTrends, J. Public Econ. 193 (2021) 104346.[24] C.N. Souza, et al., No visit, no interest: How COVID-19 has affected publicinterest in world’s national parks, Biol. Cons. 256 (2021) 109015.[25] M. Adamo, et al., Dimension and impact of biases in funding for species andhabitat conservation, Biol. Cons. 272 (2022) 109636.[26] S. Mammola, et al., Towards a taxonomically unbiased European unionbiodiversity strategy for 2030, Proc. R. Soc. Lond. [Biol] 287 (2020) 20202166.[27] R.A. Correia, Google trends data need validation: Comment on Durmuşoğlu(2017), Hum. Ecol. Risk. Assess. 25 (2019) 787–790.[28] R.A. Correia, et al., Inferring public interest from search engine data requirescaution, Front. Ecol. Environ. 17 (2019) 254–255.[29] G.F. Ficetola, Is interest toward the environment really declining? The complexityof analysing trends using internet search data, Biodivers. Conserv. 22 (2013)2983–2988.[30] J.E. Raubenheimer, Google trends extraction tool for google trends extended forhealth data, Softw. Impacts 8 (2021) 100060.[31] Albuquerque U.P., et al., Exploring large digital bodies for the study of humanbehavior, Evol. Psychol. Sci. 9 (2023) 385–394.4