This is an update to my November 21, 2022 post.
Unfortunately, after several unsuccessful tries, I discovered that the function MAX (and any other function) does not work in Excel for dates before 1900. In summary, Excel reads dates as numbers, and it is created to read dates between 1900 and 9999. It's so incredibly useful to have 9999 and not 1899 (sarcasm). The program does not work well for analyzing historical dates written in the format "mmm dd, yyyy" and ANY other format with a year value of less than 1900. According to several blogs, there is a way around it: Transforming the year by adding 1000 (or 2000 according to others for some issue regarding leap years that I refuse to try to understand). In this way, 1876 becomes 2876 or 3876. Now, for my specific purpose (finding the more recent days among a series and automatically pasting it in a specific cell) this is too much of a complication.
So, now, not only I did not save any time in building my database but I even lost 5 hours trying to figure out it was not possible. Nevertheless, it was worth trying.
In the past, I used historical data in my databases (the 900 members of the labor union in Galveston, and the 5,000 deserters among Connecticut Union soldiers), but at that time I was using Numbers (Mac spreadsheet) instead of Excel. I wonder if I should use Numbers and then save the database as an Excel file. I will make some tests over the Thanksgiving break.
After the consultation with Catherine Barber, she came up with the idea of using the function MAX to automatically copy either the date of discharge, pardon, escape, or transfer and paste it as the last date of work at a given work camp. It seems like a minor achievement, but it can amount at some hours when we think of doing the task manually 40,000 times. It is not just the typing time, it is also the scrolling time in a spreadsheet that has columns from A to UM, and retreaving the date in the original document a second time because I cannot trust my memory. Overall, I think I can save about 110 hours by using that little trick (at a rate of 6 dates per minute). That stimulates me to think about more ways I can make my life easier by using excel functions well before the data analysis begins.
When I applied the formula, I realized that several date columns were improperly formatted (somehow Excel read the dates as text although formatted as dates properly). Not only that caused the formula to fail, but the problem is relevant to me because it prevents other calculations (like time duration of events) that I want to perform when the database is completed. I had to reformat the cells properly, choosing a custom format that allows me to write the dates exactly as they appear in the documents (short month day, year: e.g., Mar 23, 1878). I had to re-learn (bad memory sucks) that Excel uses a code for dates based on the repetition of the basic symbol for day, month, and year–"d," "m," "y"–to make Excel understand which format I want: "d," for example, is for the format 1-31,"dd" is for the format 01-31; while "mmmm" is the format for the full name of the month, "mmm" is the one for the abbreviate name (the one I am using: e. g., Mar and not March), and so on. The format I am currently using is: [$-en-US]mmm d, yyyy;@ --> I am writing it here so I won't forget it, and I can apply it again if I introduce new columns on the way.
Attached below, the chart that Microsoft Support posts (follow the link for more)
Today, I met with Catherine Barber on Zoom to
- discuss my project with a data specialist since I am not
- understand if there are any changes I should make in the way I gather data to be able to analyze them proficiently
- and see if and how Rice can support me through the process.
THESE WERE THE QUESTIONS I BROUGHT TO THE TABLE
For the analysis, I could select different typologies of convicts to answer different questions. Just as an example: role of race in punishment and allocation of the convicts; mass escapes v. Individual escapes; the time between acts of pardon and effective discharge; change over time of type and length of convictions by county; comparison between counties in punishing crimes; assessing re-habilitative claims by looking at re-incarceration rates; etc.
Why am I creating this space, and why now, already five months into dissertation work?
First of all, the task of building this database is overwhelming: The long hours in front of the computer working on a massive Excel spreadsheet are paying a toll on my mental and physical health. Reading primary sources is a wonderful way for self-motivating, but the reading comes too soon into the process: I have no doubt I will forget sources and questions if I am not organized (which is tough for someone with ADHD), and I do not want to waste time re-reading or searching for a source "I know it's somewhere but where?" I want to gather notes on the sources in a way that will allow me to easily retrieve those pieces of information at the right time--when I start writing, maybe one year from now--and use them proficiently.
Second, the pandemic has made "virtual" the norm, so I feel that limiting my commuting time and work in my home office, is not just acceptable, but also the best use of my time. This comes with an annoying feeling of isolation from my professors and my peers. Ideally, this space will allow me to share my findings and processes, and maintain a proficuous conversation with my advisors, W. Caleb McDaniel and Randal Hall.
Third, I hope that by uploading my findings and questions and pinning down my curiosity, I will keep myself accountable to my advisors, my peers, and foremost myself.