Connecting People and Words in Medieval Documents, with Python!

Splash Image for the Anglo-Saxon Networks
2,500 Anglo-Saxons linked by appearances as witnesses in grants dating from 7th-10th c. AD.

The Goods:

Welcome to my relaunch! This first post is mostly just a link to my project on GitHub, but it is a doozy. You can download the repo and run it locally or run it in-browser by clicking going to the readme and clicking the Run in Binder link, or just click here.

This repository scrapes data from two websites about hundreds of Medieval documents involving thousands of people from Anglo-Saxon England, then stores it in a local database before performing network and text analysis.

I will be adding more images, commentary, and conclusions directly into this post, so check back later if the notebook overwhelms you.

I can’t give you the data directly for all kinds of legitimate copyright reasons, but with Binder/Jupyter I can give you the tools to exactly recreate my steps!

Historical Source:

Nearly 500 charters from Anglo-Saxon England, c. 600-900 and over 2,500 people who appear on them. Charters were elaborate documents containing grants of land, property, et.c. that always have a large number of very important witnesses which helped guaranteed its legitimacy. These charters, in aggregate, contain a wealth of information about reciprocity, relationships, and social display among medieval elites.

Data Source:

Two databases. (1) The Anglo-Saxon Charters Database (ASC), the focus of our study. It contains the full text of hundreds of charters along with metadata. For this study, we will limit our purview to only these charters and only the individuals who appear in them. However, we need to round out our metadata on the charters and individuals. For that we will use (2) The Prosopography of Anglo-Saxon England Databases (PASE). Between these two we can get a significant amount of information on texts, people, and relationships.

Methods and Packages:

Web Scraping: BeautifulSoup4
Database ORM: SQLAlchemy
Network Analysis: networkx
Text Analysis: nltk & cltk

Skills Required:

  • Beginner Python to run the notebook and get data
  • Intermediate/Advanced Python to understand and master every skill

To FULLY understand everything, you probably need an intermediate Python skill. BUT, even if you have a beginners skill, you will be able to learn some useful tricks, work out larger strategies, and see the potential of Python for historical insight! If you want to learn how something is done, pay attention to the code documentation, or look up the official documentation on the package websites listed above, or at the tutorials I link inside the notebook. I hope this is useful to everyone from beginners to experts.

Directions:

Simply follow the link to the GitHub repository and follow the directions there, or just click this link to launch a new virtual server to power the demonstration in Jupyter Notebook using a free service called Binder.

If you run it through Binder, it will take a minute to start up. Binder is creating a virtual server just for you, installing linux on it, putting essential packages, downloading this reposistory and then installing any Python dependencies. Once it is finished it will start Jupyter.

If you install it locally, go to the directory of the repo and type pip install -r requirements.txtto download all Python package dependencies. Then type jupyter notebook and it should open a browser tab.

In either case, once the server is launched, go inside the /notebooks folder and launch the first notebook, 1-scrape.ipynb . Click inside each code block and click the run button. Note that some cells, especially those that run scraping, network analysis, or text analysis operations can take awhile. Be patient and let them finish before moving to the next cell.

Read More