Deep Learning for Language Analysis Summer School at UoC – #DL4LASS

This September, the Summer School on Deep Learning for Language Analysis took place at the University of Cologne for the second time. This year, I was able to participate in the 5 day-long program. Since it was organized by people from different research areas and promoted through the channels of multiple departments[^1], it attracted diverse participants with various academic backgrounds and interests. The range of topics that were offered followed the same pattern: While all the workshops revolved around language, each of them focused on a different representation of it. After an introduction to deep learning (DL) that brought everyone to approximately the same level, each participant had to choose one of the three parallel sessions: 1) DL for NLP and argument mining, 2) DL with audio and speech data and 3) DL in text recognition (OCR/HTR). So, you could decide to work with either textual, audio or graphic data. The tracks were shortly introduced on the second day of the summer school either by the lecturers or the organizers. As expected, the NLP and argument mining track attracted the most students. Thanks to the limited number of participants though (50 in total), students had enough time to experiment and ask questions, even in the fullest class.

I opted for the audio and speech analysis track since there are fewer opportunities to learn about this topic than, for example, argument mining, and I didn’t see myself needing text recognition skills any time soon. The audio and speech analysis class was taught by Abdullah from Fraunhofer IAIS. I particularly appreciated his openness in presenting the DL approach. He pointed out the shortcomings of the method and noted that in many cases a non-machine learning solution might still be more suitable.

For me, it was very exciting to learn about analyzing language in audio format since my understanding of this domain was limited to the basic knowledge I had obtained in the introductory phonetics classes at my university. In the computational context, I had only worked with textual data – which is the case for many computational linguists. What struck me the most, however, was how the same algorithms and workflows can be applied to a variety of data categories. The parts that differ accordingly and require domain knowledge are, of course, the data preparation step and, if applicable, the feature engineering. I was fascinated to hear that one of the people in my class was planning to use the methods for biological data analysis; since he had noticed some structural similarities between the audio data and his research data and was curious to know if this approach would lead to interesting results (don’t ask me about the details though). So, creative and interdisciplinary thinking was definitely in the air.

I have to admit, as a bachelor student, it was sometimes difficult to understand some details of the algorithms and to know exactly which parameter to tweak to get the results I want, so it was reassuring to know that 1) I’m in the same boat with other students who e.g. have a pure linguistics background and have never dealt with DL in depth and 2) that there are enough experts around me who are using DL in their research projects and whom we newbies can annoy with our questions; again, thanks to the diverse backgrounds and educational levels of the participants. However, it was very enriching to expand the theoretical knowledge I had about machine learning and neural networks and complement it with some entry-level practical experience. I do plan to use the tools in future projects, including my bachelor thesis and used this opportunity to engage with the topic a bit more. Moreover, I can always refer to my notes and the programming examples whenever I need some machine learning inspiration.

There is still much left to say about the summer school: from the meet and greet with industry professionals to the social events including the stimulating conversations with other students and scientists or how Docker prevented many technical problems, yet not all of them. But instead of expanding on these points here, I suggest that you come to the next one in August/September 2020 yourself. Follow the twitter account of the Institute of Digital Humanities at UoC or check out the website to stay up to date on the event details!

[^1]: The summer school was organized in the framework of the University of Cologne’s Competence Area III (CA3: Quantitative Modeling of Complex Systems) by Jürgen Hermes, Claes Neuefeind (Institute of Digital Humanities, UoC), and Felix Rau (Institute of Linguistics, UoC).