German Bundestag: Speeches from 2009-2013 (17. EP) now searchable

Project Updates
Timeline of electoral periods and photos of politicians speaking in parliament

As something of a by-product of our preparations for integrating new parliaments, we were able to make over 20,000 additional speeches accessible. This means that 17 years of Bundestag debates (more than 100,000 speeches) are now searchable word by word on Open Parliament TV.

We have long been looking for ways to also make older debates accessible, for which the Bundestag currently provides no machine-readable proceedings (these have only existed since the 18th electoral period; before that, only PDF files exist). Because even though the video recordings have been available on the Bundestag media library since 2009 (17th electoral period), without the proceedings we lack the textual basis for making the speeches searchable.

PolMine Project: Proceedings in ParlaMint Format

Since a few months now, however, a corpus of plenary proceedings of the German Bundestag exists, that is not only machine-readable, but also published in an internationally recognised format. The PolMine project has for many years maintained and continuously developed a comprehensive corpus of plenary proceedings under the name GermaParl. These proceedings are now available – explicitly as a beta version – in the ParlaMint XML format: https://github.com/PolMine/ParlaMint-DE_beta.

The ParlaMint format was developed to make parliamentary proceedings from different countries interoperable (see ParlaMint: Comparable and Interoperable Parliamentary Corpora). Since we are already working intensively with the ParlaMint format as part of our efforts to integrate new parliaments, we spontaneously experimented with the data and found that we can integrate it very well into our workflows.

Just like the original dataset, the data we have derived from it should be understood as “beta” version. But after many tests, corrections to the data processing, and several audits of the speeches using automated speech recognition (Whisper), we are confident that the data is good enough for integration into the platform.

Thanks to the PolMine project for its ongoing work and continued development of the GermaParl corpus, and of course to the Bundestag media library, where all speeches since 2009 are available. The media library is also gradually publishing the historical debates (already online: 1st–5th electoral period, 1949–1969): https://www.bundestag.de/mediathek/plenarsitzungen. We naturally have our eye on these debates too ;).

Enjoy browsing the debates at

https://de.openparliament.tv