Blog

This is an update to my November 21, 2022 post

Unfortunately, after several unsuccessful tries, I discovered that the function MAX (and any other function) does not work in Excel for dates before 1900. In summary, Excel reads dates as numbers, and it is created to read dates between 1900 and 9999. It's so incredibly useful to have 9999 and not 1899 (sarcasm). The program does not work well for analyzing historical dates written in the format "mmm dd, yyyy" and ANY other format with a year value of less than 1900.  According to several blogs, there is a way around it: Transforming the year by adding 1000 (or 2000 according to others for some issue regarding leap years that I refuse to try to understand). In this way, 1876 becomes 2876 or 3876. Now, for my specific purpose (finding the more recent days among a series and automatically pasting it in a specific cell) this is too much of a complication. 

So, now, not only I did not save any time in building my database but I even lost 5 hours trying to figure out it was not possible. Nevertheless, it was worth trying.

In the past, I used historical data in my databases (the 900 members of the labor union in Galveston, and the 5,000 deserters among Connecticut Union soldiers), but at that time I was using Numbers (Mac spreadsheet) instead of Excel. I wonder if I should use Numbers and then save the database as an Excel file. I will make some tests over the Thanksgiving break.  

After the consultation with Catherine Barber, she came up with the idea of using the function MAX to automatically copy either the date of discharge, pardon, escape, or transfer and paste it as the last date of work at a given work camp. It seems like a minor achievement, but it can amount at some hours when we think of doing the task manually 40,000 times. It is not just the typing time, it is also the scrolling time in a spreadsheet that has columns from A to UM, and retreaving the date in the original document a second time because I cannot trust my memory. Overall, I think I can save about 110 hours by using that little trick (at a rate of 6 dates per minute). That stimulates me to think about more ways I can make my life easier by using excel functions well before the data analysis begins.

When I applied the formula, I realized that several date columns were improperly formatted (somehow Excel read the dates as text although formatted as dates properly). Not only that caused the formula to fail, but the problem is relevant to me because it prevents other calculations (like time duration of events) that I want to perform when the database is completed. I had to reformat the cells properly, choosing a custom format that allows me to write the dates exactly as they appear in the documents (short month day, year: e.g., Mar 23, 1878). I had to re-learn (bad memory sucks) that Excel uses a code for dates based on the repetition of the basic symbol for day, month, and year–"d," "m," "y"–to make Excel understand which format I want: "d," for example, is for the format 1-31,"dd" is for the format 01-31; while "mmmm" is the format for the full name of the month, "mmm" is the one for the abbreviate name (the one I am using: e. g., Mar and not March), and so on. The format I am currently using is: [$-en-US]mmm d, yyyy;@ --> I am writing it here so I won't forget it, and I can apply it again if I introduce new columns on the way.

Attached below, the chart that Microsoft Support posts (follow the link for more) 

Formatting Dates in Excel

 

Today, I met with Catherine Barber on Zoom to

  • discuss my project with a data specialist since I am not
  • understand if there are any changes I should make in the way I gather data to be able to analyze them proficiently
  • and see if and how Rice can support me through the process. 

THESE WERE THE QUESTIONS I BROUGHT TO THE TABLE

For the analysis, I could select different typologies of convicts to answer different questions. Just as an example: role of race in punishment and allocation of the convicts; mass escapes v. Individual escapes; the time between acts of pardon and effective discharge; change over time of type and length of convictions by county; comparison between counties in punishing crimes; assessing re-habilitative claims by looking at re-incarceration rates;  etc.


These are the problems I see:
1) I have 12 Excel files, one for every ledger of the prison, for a total of about 40,000 convicts. I prefer 12 files to one file with 12 sheets just in case something bad happens. At some point, I will need to combine those spreadsheets into one with 40,000 names. I already see that Excel is lagging when I am making changes to files with only 3,000-3,500 names. Is there a better program than Excel for the scope?
2) I would benefit from some shortcuts in inserting the data. It does not seem Excel is really helpful for what I would like to automatize, but maybe it is because I do not know the program well. Some help or advice on that?
3) I would like to make the database accessible to researchers, when my dissertation is done, by uploading it (or linking it) on a dedicated website. Will it be ever possible to do so or this requires platforms too expensive for me to afford? 
4) I want to map on GIS the location of the work camps AND their change over time. Moreover, I would like to map the convicts' birthplace, place of residence, place of conviction, and movement after discharge and represent those data in a meaningful way. Any advice?
The meeting went great, and I have now a clear path to follow.
Catherine suggested:
1) continuing to keep my Excel spreadsheet in separate files with identical variables so that it will be easy to merge them when needed. Although Excel does not offer many shortcuts, it is still a pretty solid software for my needs
2) make multiple backups of my files
3) start preparing now for the time I start analyzing the data. She suggested two open-source programs for data analysis: Python or R. While R is a better tool for statistical analysis, Python is more flexible and better helps to transform, filter, select, and visualize big data. Python runs several apps and this makes it very adaptable.
4) a code in Python is shareable, thus is it possible to link the data analysis/visualization to any website/wiki page
5) start learning Python. Rice offers classes online (I already enrolled for a beginner class at the end of November; one two-days full immersion class will be available in February). She will share with me her list of resources for self-teaching.
6) when the database is ready, Fondren Library can register ut with a data object identifier that protects my intellectual property by making it easier for other researchers to cite the database
7) look into courses for learning GIS (I did a couple in the past but I did not have data ready for mapping, so I just forgot everything).
The conversation was exciting and very useful. I am really grateful for the time and this tremendous resource that Rice offers. Good to know that I am on track. 
Why this space

Why am I creating this space, and why now, already five months into dissertation work?

First of all, the task of building this database is overwhelming: The long hours in front of the computer working on a massive Excel spreadsheet are paying a toll on my mental and physical health. Reading primary sources is a wonderful way for self-motivating, but the reading comes too soon into the process: I have no doubt I will forget sources and questions if I am not organized (which is tough for someone with ADHD), and I do not want to waste time re-reading or searching for a source "I know it's somewhere but where?" I want to gather notes on the sources in a way that will allow me to easily retrieve those pieces of information at the right time--when I start writing, maybe one year from now--and use them proficiently.

Second, the pandemic has made "virtual" the norm, so I feel that limiting my commuting time and work in my home office, is not just acceptable, but also the best use of my time. This comes with an annoying feeling of isolation from my professors and my peers. Ideally, this space will allow me to share my findings and processes, and maintain a proficuous conversation with my advisors, W. Caleb McDaniel and Randal Hall.

Third, I hope that by uploading my findings and questions and pinning down my curiosity, I will keep myself accountable to my advisors, my peers, and foremost myself.