Introduction to programming and natural language processing (NLP) for the Humanities

We are pleased to announce a six-day course providing a hands-on introduction to Python programming and textual data analysis. We will hold this course on-site at ETH Zürich (OCT in Oerlikon and ETZ in Zentrum):

Part 1: Introduction to programming in Python
Monday, 04 March and Wednesday, 06 March 2024

Part 2: Working with text data
Wednesday, 17 April and Monday, 22 April 2024

Part 3: Machine learning for text data
Monday, 06 May and Wednesday, 15 May 2024

Dr Agnieszka Ilnicka and Dr Tarun Chadha from the Scientific IT Services of ETH Zürich will lead the course.

The course is supported by the project “Exploring Collections as Data”, funded by external pageSwissuniversities as part of the Open Research Data Grants.

The course aims to introduce the Python Programming Language to people with no prior programming knowledge who plan to do research with digitised texts. The course will introduce basic concepts and the packages used for natural language processing and machine learning. In-depth hands-on sessions will provide practical guidance on using programming in different contexts, from small automation tasks to analysing text data using machine learning methods. The main focus of the course will be on practical aspects.

The course consists of three blocks of two full days each. On all six days, the course will start at 9.00 and is expected to end around 17.00 with a lunch break around 12.30 and two coffee breaks (in the morning and afternoon).

Part 1. Introduction to programming in Python:

Monday, 04 March 2024 in OCT E36 (Oerlikon)

  • what is programming
  • introduction to Jupyter notebooks
  • basics of Python: variables, types, functions
  • flow control: conditions and loops

Wedenesday, 06 March 2024 in ETZ E81 (Zentrum)

  • compound data types (such as lists and dictionaries)
  • creating functions
  • working with files
  • Python packages

Part 2. Working with text data:

Wednesday, 17 April 2024 in OCT E36

  • string manipulation and regular expressions
  • pandas package and working with DataFrames
  • optical character recognition and assessing its quality
  • managing text files (formatting, merging, dividing)

Monday, 22 April 2024 in OCT E 36 (Oerlinkon)

  • preprocessing of text (such as lemmatisation and tokenisation)
  • introduction to NLTK and SpaCy
  • basic text data representations (bag of words, term frequency-inverse document frequency)
  • visualisation of data analysis with Matplotlib

Part 3. Machine learning for text data:

Monday, 06 May 2024 in OCT E36 (Oerlikon)

  • what is machine learning
  • word embeddings
  • topic modelling

Wednesday, 15 May 2024 in OCT E36 (Oerlikon)

  • document similarity
  • named entity recognition

All participants must bring their own laptops to the course. The laptop should have a reasonably recent Mac, Windows or Linux operating system and at least 20GB of free disk space. Before the course starts, the software that facilitates Python programming must be installed. Installation instructions (for Windows and Mac) will be provided to participants after successful registration.

We will assign homework after each course block. Completing the homework exercises is a prerequisite to attending the next block.

We will issue a certificate of attendance for the course. Attendance to all the course blocks and submission of homework assignments is necessary to obtain a certificate of attendance.

Registration

The course is aimed at people who do not have prior programming knowledge and plan to process and analyse digitised textual data.

Registration is now closed.

The course is free of charge.

FAQ

We expect full attendance at the course. Otherwise, we cannot grant a certificate of attendance.

No. The course is held on-​site without the option to join remotely.

The course is offered as a whole; no course blocks should be missed. During the registration process, preference will be given to people who want to attend all three parts. In exceptional cases, we can agree to skip the first part (the homework will still be mandatory to confirm the required programming skills). To discuss this option, please contact Dr Agnieszka Ilnicka ().

Unfortunately, we are not able to provide a computer.

We cannot award ECTS for this course. However, each participant who attends the entire course and submits homework assignments will receive a certificate of attendance.

Contact

Dr Agnieszka Ilnicka
Course content
Christiane Sibille
Course organisation
JavaScript has been disabled in your browser