This is a write-up of a script I wrote for my RA work, demonstrating how regular expressions in Python can be used to clean and process OCR text with many errors in order to generate a workable dataset. The goal in this specific example is to clean US Senate testimony to make a dataset listing the speaker in one column with their testimony in the next column. I also show how to categorize the comments by the section of testimony they are in and how to give an index for those sections. The script is available on GitHub.
The script as written requires an input file called "V1". In this case, the file is OCRd text of Senate testimony from 1913. The text delineates the speaker at the start of each comment (e.g. "Senator Gallinger") and then gives the comment. There are also various section breaks (e.g. "TESTIMONY OF TRUMAN G. PALMER—Continued."). These comments and section breaks are the text data we are interested in...
For my RA work this summer, I developed this Python script which links a historical firm to its suggested Wikipedia information in order to predict the firm's founding date. Here is a quick write-up on the script, which is available on GitHub.
The script as written requires an input file called "CompanyList". This file should be utf-16 encoded .txt file. The file should contain a list of the companies to be searched, with one company on each line.
The output file will be a utf-8 encoded .txt file called "WikipediaFoundingDates". You can initialize this file beforehand by creating a blank .txt file of this name. After the script runs, the updated file will have a semicolon-separated list that can be easily imported into other software for analysis...
Exactly a year ago, I was lucky enough to meet Sir Roger Penrose after a talk he gave in Oxford on his book Fashion, Faith, and Fantasy in the New Physics of the Universe. This was a huge deal for me because I've been so inspired by Penrose over the years. I was recently digging through some old files and found this biographical essay I wrote on Penrose in 2013, which I thought I would share in honor of the occasion. Funnily enough, I contacted him when writing this essay to see if I could get some quotes but never got a response!
If the manymultitudesof LaTeX bibtex* bibliography styles don't suit you, never fear! It's easy and exciting to make your own bibliography style (.bst) in just a couple of minutes, or even hours if you really get into it! Here's how I did it:
WHAT YOU'LL NEED
A Unix computer (e.g. a Mac)
With MacTeX installed (MacTeX is the free LaTeX distribution for Macs)
In late March 2017, I flew alone to Kathmandu and then to Lukla to complete the Everest Base Camp trek in 13 days. I wrote this quick guide for others interested in the trek, particularly for those considering trekking independently (no porter, no guide) and solo (no trekking partners). hope it can serve as a resource for you as you prepare for your adventure.
One of the most difficult aspects of the transition from undergraduate work to graduate work has been the massive increase in papers and books I have to read for class. After talking with my coursemates and exploring methods suggested on the internet, I developed this simple note-taking "grid" to use to standardize the information I collect from the various readings I do. The main point of the grid is to make reading more efficient (i.e. to prevent me from taking notes on every little thing, which is my default), to focus attention on deeper questions about the work, and to neatly keep track of the connections between readings. Feel free to download and edit this for your own use if you think it might be helpful!
I am currently serving as one of the bar managers for the student-run bar at my Oxford college. We needed some new stock-keeping books, so I developed this system after reading online about some best practices. I wanted to share this template I made in case other bar managers need a similar Excel sheet and don't want to reinvent the wheel!
The basic idea is to record a stock inventory at the beginning of a time period, to note any purchases made throughout the period, and then to record stock at the end of the period. The sheet calculates the "Usage Cost" of the goods purchased as well as the expected revenue from the sales. For each period, use a new Excel sheet leaving the basic information (in green) unchanged. Annual totals can then be calculated across sheets in the workbook.