Counting the Hidden Gems

The final step in this model-building process is ranking the German counties and the aggregation into my geographic schema. The goal is to build a metric that might help me determine how “over-traveled” a region or destination might be. The results indicate, in general, that rural regions are more easily overwhelmed by tourists and that foreign tourists aggregate primarily within urban areas.

The conclusion is paradoxical but expected. While seeking out lesser-known natural wonders is advisable, visiting cities presents a dichotomy. The most prominent tourist attractions are those capable of handling the tourist load, whereas mediocre destinations can be overwhelmed. So sticking to the most popular or to the unknown will give the best travel experience.

To reach this conclusion, I used the linear regression from my analysis of tourist infrastructure. The model relates overnight stays to various predictor variables that make logical sense. You can see the other articles in this series for more information about the model and the variables.

The regression has only middling explanatory power, but the explained variance is sufficient to stratify locations based on their residuals. There are two models here: one where the primary variable is overnight stays per hotel and the other where the variable is overnight stays per hotel by foreigners. The difference between the models is as expected, with physical infrastructure being more significant than in the general case. 

The code for all of the graphs can be seen in my GitHub repository. I tried to optimize them as much as possible for mobile view, but some will be better seen in desktop mode.

Log Overnights p.H. and Infrastructure
Y Var:OvernightsR-Squared:0.423
Model:OLSAdj. R-Squared:0.407
No. Obs:384Covar. Type:HC3
X VarCoefStd ErrtP>|t|
Constant6.66150.9636.9200.000
Momentum 1Yr0.49670.5060.9820.327
Momentum 5Yr*0.36750.1652.2240.027
Log Railroad p.Km2-0.01400.058-0.2410.810
Log GDP p.C.**0.21460.0762.8420.005
Monuments***0.01780.0044.4820.000
Old Buildings p.C.0.0000.000-0.2810.779
Urban Area (%)***0.00790.0023.8430.000
Coast***0.71160.1066.6900.000
Vineyard Area (%)*0.0000.000-1.7470.082
Airports0.10320.0661.5540.121
Log Foreign Overnights p.H. and Infrastructure
Y Var:OvernightsR-Squared:0.579
Model:OLSAdj. R-Squared:0.567
No. Obs:384Covar. Type:HC3
X VarCoefStd ErrtP>|t|
Constant1.58331.7920.8840.377
Momentum 1Yr0.10820.8180.1320.895
Momentum 5Yr*0.67950.3102.1930.029
Log Railroad p.Km20.13050.1011.2950.196
Log GDP p.C.***0.64400.1304.9730.000
Monuments**0.01990.0063.2440.001
Old Buildings p.C.***-0.00120.000-6.6550.000
Urban Area (%)**0.01100.0033.5010.001
Coast-0.12230.212-0.5780.564
Vineyard Area (%)0.0000.0000.4610.645
Airports*0.28470.1102.5930.010

Overall Rankings

Ranking destinations based on the model residuals is intuitive. In this case, the residual is simply the difference between the observed value and the model’s predicted value. If the model predicts a city to have a certain number of overnight stays, but the actual value is much higher, then the destination is overrated. If the reverse is true, then it is underrated. Because the distances are spread out randomly, the extent to which a destination is overrated or underrated allows for a rudimentary ranking.

Below is the residual/fitted plot. Perfectly predicted values will have a residual of 0 and appear on the x-axis. “Over-touristy” destinations will be above the line, and hidden gems will be below. The graph also offers some insight into the quality of the regression, as any discernable pattern in the residuals implies bias from missing variables or a misspecified functional form. In this case, the spread of the residuals is quite wide due to the lack of explanatory power, but they are dispersed randomly, indicating that the model meets the linear assumptions of OLS.

RankTop OverratedStd. ResidualTop UnderratedStd. Residual
1St. Wendel4.43Emden-2.79
2Bad Kissingen2.89Mülheim an der Ruhr-2.30
3Heidekreis2.42Neuwied-2.22
4Vulkaneifel2.27Neustadt a.d.Waldnaab-2.18
5Potsdam2.27Forchheim-2.14
6Oder-Spree2.14Neuburg-Schrobenhausen-1.94
7Grafschaft Bentheim2.01Sömmerda-1.92
8Merzig-Wadern1.91Landshut-1.89
9Freiburg im Breisgau1.91Steinburg-1.81
10Leipzig1.90Cologne-1.94
RankTop OverratedStd. ResidualTop UnderratedStd. Residual
1Potsdam2.27Mülheim an der Ruhr-2.30
2Freiburg im Breisgau1.91Cologne-1.94
3Leipzig1.90Herne-1.65
4Münster1.88Solingen-1.47
5Dresden1.64Leverkusen-0.92
6Frankfurt am Main1.51Kiel-0.84
7Wiesbaden1.28Hamm-0.56
8Mainz1.13Osnabrück-0.48
9Oberhausen1.13Bremen-0.48
10Mannheim0.99Mönchengladbach-0.40

My predictions for the top and lowest 10 German counties in the residual ranking are mixed. The model has singled out popular destinations for outdoor activities, such as Heidekreis, Vulkaneifel, Oder-Spree, and Merzig-Wadern. This doesn’t make that much sense from a tourist perspective, especially as it is missing much more popular national parks such as Saxon-Switzerland and Garmisch-Partenkirchen. However, since the model captures tourists per hotel, it is plausible that these destinations have better infrastructure to handle the tourists. Aside from Grafschaft Bentheim, which makes no sense, the others on the list are popular but small cities.

More interesting is the most underrated list, as the city of Cologne makes a prominent appearance at the bottom. This makes sense, though, as on paper, Cologne has everything it takes to be a top-tier destination. It has historical monuments, an airport, many hotels, and a brand name. However, anyone who has ever been to Cologne will tell you that it’s underwhelming and not worth the time. Capturing that qualitative aspect may be beyond this model, but I can believe that Cologne does not get as many tourists as it theoretically should.

A brief look at the cities models tells us what we already know: big popular cities have too many tourists, and big cities without a brand name have too few. In this case, we see seven out of the ten cities in the underrated list are not tourist destinations. The only cities worth visiting would be Bremen, Osnabrück, and possibly Cologne. The rest had their historical and cultural heritage destroyed in WW2 and are now sleepy industrial towns. Bremen is the only city on the list that I would consider a genuinely “underrated” destination. This list is thus not especially helpful, and the statistics may be driven by local tourism for reasons not visible in the data, e.g., a festival or market. Let’s use foreign tourist overnight stays instead and see if that gives more meaningful results.

Foreign Rankings

Using the second model yield essentially the same results. There is still a strong focus on regions with lesser-known national parks, such as the Vulkaneifel, Birkenfeld, and Eifelkreis. Other places have at least a plausible, if not unsatisfactory, explanation for their inclusion. Trier-Saarburg and Cochem-Zell are popular Vineyard regions, Baden-Baden is a famous resort town, Kaiserslautern hosts the America Airforce base at Ramstein, and Schleswig-Flensburg has beaches… I suppose.

The underrated destination list is slightly better, with at least a few remarkable destinations. Again, though, the regions are famous for natural beauty, not originally something captured in the model. Regen, Ammerland, Elbe-Elster, Spree-Neiße, Südliche Weinstraße, and Forcheim are all rural regions with remarkable natural landscapes. Emden, Delmenhorst, and Aurich are unremarkable cities in northern Germany, not especially helpful.

RankTop OverratedStd. ResidualTop UnderratedStd. Residual
1St. Wendel3.94Emden-2.43
2Vulkaneifel3.93Regen-2.20
3Grafschaft Bentheim3.14Herne-2.13
4Trier-Saarburg2.89Ammerland-2.02
5Schleswig-Flensburg2.68Forchheim-2.01
6Kaiserslautern2.57Elbe-Elster-2.00
7Birkenfeld2.54Delmenhorst-1.95
8Baden-Baden2.44 Spree-Neiße-1.88
9Cochem-Zell2.44Südliche Weinstraße-1.97
10Eifelkreis2.43Aurich-1.89
RankTop OverratedStd. ResidualTop UnderratedStd. Residual
1Baden-Baden2.44Cologne-1.19
2Freiburg im Breisgau1.94Kassel-1.10
3Heidelberg1.40Braunschweig-0.82
4Frankfurt am Main1.37Wolfsburg-0.78
5Berlin1.27Osnabrück-0.76
6Offenbach am Main1.27Erfurt-0.74
7Aachen1.22Halle-0.71
8Dresden1.19Bremen-0.52
9Mainz1.15Neustadt a.d.Aisch-0.37
10Lübeck1.13Koblenz-0.33

Now I restrict the view to only the cities, and we get a much more exciting and valuable set of results. The top-ten list is nearly spot-on. Only Offenbach stands out, likely as it is a suburb of Frankfurt, though it is hard to see why foreign overnight stays here are so high.

The list highlights how important infrastructure is for a destination. Absent from the list is Germany’s second most popular tourist destination: the city of Munich. As our model controls for various factors, Munich is better prepared than Berlin for the tourists it receives. The implication is that Munich will likely be a more enjoyable, i.e., immersive, experience for a foreign tourist than Berlin.

On the flip side, the most underrated cities list includes several uninteresting cities such as Cologne, Kassel, and Wolfsburg. For the most part, though, the list seems accurate enough. Erfurt is one of the best destinations in Germany and is underrated by all subjective measures. Braunschweig, Osnabrück, Bremen, and Koblenz are all cities destroyed in the war but retain enough historical or cultural heritage to be worth visiting in a spare afternoon. Halle and Neustadt are charming but small-ish towns I would consider underrated, but maybe not that underrated.

Rating my Regions

The next step would be to apply this model to my disaggregation of Germany. To do this, I used QGIS to aggregate the variables over my regions and used them as an out-of-sample input for the linear model I trained previously.

This approach is not technically correct, as the information aggregated differently may stratify our samples in a way that changes the variance structure of our data from the original data set. Nevertheless, I have too few data samples from which to train a new model, so this will suffice.

The first graph shows the residual plot for the total out-of-sample overnight stays and the second for the overnight stays by foreign guests.

Region NameNorm. Percentile
Mosel Valley98.00%
Upper-Franconia94.23%
Upper Palatinate91.11%
Eastphalia85.23%
Electoral Saxony83.85%
Holstein8.64%
Lake Constance6.71%
Sleswig2.98%
Brandenburg1.25%
Uckermark0.44%

The next step would be to apply this model to my disaggregation of Germany. To do this, I used QGIS to aggregate the variables over my regions and used them as an out-of-sample input for the linear model I trained previously.

This approach is not technically correct, as the information aggregated differently may stratify our samples in a way that changes the variance structure of our data from the original data set. Nevertheless, I have too few data samples from which to train a new model, so this will suffice.

The first graph shows the residual plot for the total out-of-sample overnight stays and the second for the overnight stays by foreign guests.

Unlike the previous models, the results between the two graphs indicate structural differences between the out and in-sample data sources. However, the results are still attractive, and I think they can be helpful, even if they lack the external validity that I was looking for.

The general model shows results focused on rural regions with well-known natural parks. None of the top five contenders have a major city besides Electoral Saxony, which is home to Leipzig and Dresden but is also home to Saxon Switzerland and the Erzgebirge National Parks. At the bottom of the list, we have Holstein (Hamburg), Brandenburg (Berlin), and three other rural regions. The general impression is that rural regions are the most exposed to over-tourism than major cities.

Region NameNorm. Percentile
Westphalia87.08%
Weser-Engern86.71%
Württemberg83.08%
Vorpommern82.23%
Upper-Franconia81.68%
Electoral Palatinate5.97%
Frisia5.09%
Upper Bavaria1.64%
Altmark0.70%
Thüringen0.08%

When limiting the model to foreign tourist stays, the pictures are entirely different, though several significant outliers indicate an improperly specified model. Westphalia is a region with rich cultural and natural beauty, but it is not a location I would consider overrated. The result may be due to the lack of hotels and infrastructure relative to other regions. Württemberg is home to Stuttgart and part of the Black Forest. Vorpommern has German’s best beaches. Upper-Franconia and Weser-Engern are mostly rural regions with natural parks.

The most ‘underrated’ destinations are more interesting, as I find them all on-point. Thuringia deserves its position as one of German’s most beautiful regions that nobody really knows about. Throughout the analysis, Munich has come across as well built for tourists, and given that few tourists venture out of the city, Upper Bavaria also deserves an underrated rating. In contrast, Berlin has much less to offer the average tourist, and there is much less to do around it in Brandenburg.

Conclusion

The lack of external validity of the primitive linear regression model used here severely limits the ability to make general conclusions about a region’s relative level of tourism. I designed the model for an out-of-sample analysis, where we predicted the number of overnight stays per hotel and compared it to the actual value. It seems consistent across all models that there are some rural regions with national parks or other attractions that the available infrastructure cannot cope with. In terms of cities, there is much less evidence to support any particular conclusion. Generally, it seems that foreign tourists congregate towards a specific subset of cities and do not diversify. As a result, cities that have built infrastructure to host and entertain them do well, e.g., Munich, and those that do not suffer, e.g., Berlin or Stuttgart.

The rating is less profound than I had hoped, but I will continue expanding it as I gather more data. France, Belgium, and Netherlands also offer similar datasets with slightly different definitions. I will return to this page as I continue to update the model. The final article in this series will outline the final rating model that I will use on this site.

Articles in this Series

Ranking the Regions

Ranking the Regions

The website uses a simple ranking methodology to help categorize travel destinations into various categories. People travel for different reasons and have different expectations. Some travelers do so with a…

If You Build It – Will They Come?

If You Build It – Will They Come?

“If you build it – they will come” is a quote often used satirically to deride investors in white elephant projects. However, logically, the most critical determinant in measuring tourist…

Do Tourists like Nature?

Do Tourists like Nature?

Do tourists care about national parks? I think the answer is obviously yes, as the global success of the American National Park system suggests. The draw for travelers is obvious…

Counting Tourists

Counting Tourists

The holy grail of quantitative tourism would be a near-objective measure of “too many” tourists. Such information would allow airlines, tour providers, and municipalities to direct and redirect tourists to…

Is Anything Authentic Anymore?

Is Anything Authentic Anymore?

Have you ever been to a tourist trap or a location with so many people that the entire trip felt pointless or disappointing? Maybe you walked into a local shop…

Leave a Reply