Counting Tourists

The holy grail of quantitative tourism would be a near-objective measure of “too many” tourists. Such information would allow airlines, tour providers, and municipalities to direct and redirect tourists to an optimal experience. However, there is no easy measure for tourist volumes. 

Simple metrics, such as counting tourists per capita, are too crude to serve as a measure. Many national parks contain no permanent residents, so how many tourists per tree is too many? Measuring tourist flows relative to available infrastructure, such as the number of beds, leads to estimation issues, such as accounting for visitors that do not spend the night. 

Without access to a more sophisticated dataset, I can only attempt to proxy tourist flows with limited effect. My data comes mainly from the German Government Office of Statistics, Destatis, and other Government Sources. The statistics are available at the German county level of administration, though not every country has every statistic available.

The code for all of the graphs can be seen in my GitHub repository. I tried to optimize them as much as possible for mobile view, but some will be better seen in desktop mode.


 To measure actual tourism, I have several statistics available:

  • Overnight Stays (2019)
    • With foreign and domestic categories
  • Arrivals (2019)
    • With foreign and domestic categories
  • Average Length of Stay (2019)
    • With foreign and domestic categories
  • Instagram Tags (2022)
    • Available only for Cities
  • Tourist “Momentum”
    • The one and five-year difference in tourist overnight stays. The idea is to capture the effect “of hype” on tourism.

Overnight Stays and Arrivals are almost the same statistics. Both measure tourist visits to hotels and related businesses, though the latter requires that the tourist stay overnight. This value excludes people visiting spas or conferences that might also be offered at the hotel or resort.

Instagram tags are both an outcome and a predictor of tourism, and I will examine both aspects. Social media both promote locations by framing mental images in foreign audiences and filling the conspicuous role in conspicuous consumption.

Tourist momentum should play an essential role in creating the virtual identity of a location on the internet. Rather like a snowball, the more people that travel to a location and post pictures that successfully signal the desired status, the more people will follow. A destination’s growth rate should translate to more social media presence, and more overnight stays.

Tourist Infrastructure

Some statistics can also measure tourist infrastructure, which may imply a certain level of actual tourism:

  • Number of Hotels (2019)
    • With Hotel, Resort, and Pension subcategories
  • Number of Beds (2019)
  • Geographic Size of the City/County (Km2)
  • GDP Per Capita
  • Number of Airports per Region
  • Total Length of Rail Track (Km)
    • Aggregated over each German county using QGIS

These are some standard statistics that indicate how many tourists a particular region can support. The number of hotels includes three categories of establishments, two of which are irrelevant to this discussion. We must treat leisure travel and tourism as separate phenomena, as they serve different markets and have different underlying motivations. For analyses in this series, I will use only hotels as a measure of tourist infrastructure.

Tourist Attraction

  • Monuments
    • This variable counts cultural and historical monuments designated as especially important by the German government. The count was done in QGIS over the German county administrative regions.
  • Wartime Destruction
    • This statistic is the percentage of a city destroyed by allied bombing raids in WW2 and was collated from various sources.
  • Old Buildings (2019)
    • This data comes from an administrative survey counting the number of buildings in each county with a title that predates 1918. 
  • National Parkland
    • This is the percent of each county designated as a national park or protected nature preserve, summed over the counties in QGIS.
  • Vineyards
    • This is the percent of each county set aside for viniculture, which usually means vineyards.
  • Capital City
    • A binary variable that indicates if a city is a regional capital or not
  • City
    • This is the percentage area of a county classified as “Urban” by the German government, summed up in QGIS.
  • Coastal Province
    • This binary variable indicates if the area is on the ocean. Beaches may attract tourists interested in leisure travel rather than active tourism.

These statistics will be discussed in more detail in later articles.

Data Manipulations

The data presented here has several undesirable characteristics for the analyses I want to conduct. Most notably, many of the data points are driven by outliers, particularly the city of Berlin. Since the outliers are essential to the analysis, I want to keep them. Instead, I have adjusted the data to align with a more normal distribution. The final variables follow:

  • Ln(Overnights)
  • Ln(Overnights per Capita)
  • Ln(Hotels per Capita)
  • Ln(Gdp per Capita)
  • Ln(Railroad per Km2)
  • Monuments
  • Number of Airports in a Region
  • Old Buildings per Capita
  • Parks as a percentage of county
  • Vineyards as a percentage of the county
  • Cities as a percentage of county

The first significant manipulation is calculating most of the statistics per capita. Removing the effect of population disparities between regions will help isolate the desired analysis on tourist outcomes and remove the need for an additional variable. I also tried calculating the values per Km2 but logically, per capita values make the most sense.

The other change was taking the natural logarithm of each variable to reduce the effect of outliers. While this is a common transformation in econometrics, it reduces the external validity of the model. Interpreting the meaning of the coefficients is less intuitive, as it implies a relative change in one variable, causing a relative change in the outcome.

In this case, though, I am not especially interested in the exact magnitude of the coefficient, as the data is already an imprecise proxy. I am more interested in examining the direction, strength, and significance of the relationships between these variables.

When are some tourists too many?

When looking at the primary variable of interest, overnight stays, there are different ways of interpreting it. The absolute numbers tell of overall tourist travel, but nothing about whether or not it is too many. Larger cities have more things to do, so the population is an important statistic, but only some cities are worth visiting as tourists. Ideally, I need a measure that encapsulates the effect of tourists per unit of tourist infrastructure, as this is more important than the number of permanent residents. I will explore this in a different article, so I will stick with the crude measure of overnight stays per capita for now.

Below are the results of a linear regression model with the base variables. If we use overnight stays per capita as our variable to represent relative tourist volumes, then the model tells us some superficial facts. The obvious one is that beaches attract tourists, and beaches have few permanent residents. With this accounted for, tourist infrastructure is the most important determinant. Railway density, hotels per capita, and the existence of famous monuments are all significant contributors to variation in overnight stays. If you build it, they will come, it seems.

Log Overnights per Capita
Y Var:OvernightsR-Squared:0.779
Model:OLSAdj. R-Squared:0.774
No. Obs:384Covar. Type:HC3
X VarCoefStd ErrtP>|t|
Momentum 1Yr0.51890.4911.0570.291
Momentum 5Yr*0.35270.1642.1480.032
Log Railroad p.Km20.00380.0570.0680.946
Log GDP p.C.*0.17820.0712.5110.012
Log Hotels p.C.***1.09270.03333.5960.000
Urban Area (%)***0.01000.0024.8040.000
Log Foreign Overnights per Capita
Y Var:OvernightsR-Squared:0.599
Model:OLSAdj. R-Squared:0.590
No. Obs:384Covar. Type:HC3
X VarCoefStd ErrtP>|t|
Momentum 1Yr-0.60860.797-0.7630.446
Momentum 5Yr***1.09020.2793.9040.000
Log Railroad p.Km20.01660.1020.1630.870
Log GDP p.C.***0.98580.1317.5280.000
Log Hotels p.C.***0.98200.06714.6670.000
Urban Area (%)***0.01550.0034.7420.000

More interesting is “tourist momentum,” my proxy variable for the social media effect on travel. At least, they seem more important in determining relative tourist flows than absolute ones. I hypothesize that social media serves as a self-fulfilling prophecy for destinations that do well. More tourists will increasingly crowd popular destinations as these places have an extensive reach on social media. Large population centers already have a significant presence via their residents, so the social media effect would be more potent for locations with fewer people. When we account for the population in a per capita statistic, momentum should play a more prominent role in explaining variance between destinations.

Also relevant to the social media effect is GDP per capita. While I consider the wealth of a travel destination to reflect more on the infrastructure available, it may also serve as a travel incentive. If tourist travel is mostly about conspicuous consumption, wealthier locales will attract more tourists as they have more opportunities to flaunt wealth on Instagram.

The initial conclusion is that wealthy destinations with extensive travel infrastructure that have experienced rapid growth in tourist volumes over the preceding year will be more likely to have more overnight stays per capita on average. This sounds obvious, so the next step will be unpacking the information in this dataset to see if we can draw other conclusions about travel patterns. If we can understand more precisely where tourists are going, then I can build a better scoring system.

Returning to authenticity, I surmise that a destination with fewer tourists per capita will offer greater “objective authenticity.” By this, I mean that the tourist experience will be more immersive and enjoyable in a place where ordinary people live. However, the idea of “existential authenticity,” or Instagram likes per photo, is not linked to any notion of over-tourism. It will be interesting to see what the data tells us about this concept.

Articles in this Series

Ranking the Regions

Ranking the Regions

The website uses a simple ranking methodology to help categorize travel destinations into various categories. People travel for different reasons and have different expectations. Some travelers do so with a special interest in mind, maybe a castle or a local…

Counting the Hidden Gems

Counting the Hidden Gems

The final step in this model-building process is ranking the German counties and the aggregation into my geographic schema. The goal is to build a metric that might help me determine how “over-traveled” a region or destination might be. The…

If You Build It – Will They Come?

If You Build It – Will They Come?

“If you build it – they will come” is a quote often used satirically to deride investors in white elephant projects. However, supply is easier to measure than demand, which puts analysts in a predicament. Supply of a good typically…

Do Tourists like Nature?

Do Tourists like Nature?

Do tourists care about national parks? I think the answer is obviously yes, as the global success of the American National Park system suggests. The draw for travelers is obvious for the United States, with its incredible and uniquely well-preserved…

Place des Vosges Oldest Planned Square in Northern Europe

Is Anything Authentic Anymore?

Have you ever been to a tourist trap or a location with so many people that the entire trip felt pointless or disappointing? Maybe you walked into a local shop only to find it full of cheap products purchased from…

Leave a Reply