Today, I met with Catherine Barber on Zoom to

  • discuss my project with a data specialist since I am not
  • understand if there are any changes I should make in the way I gather data to be able to analyze them proficiently
  • and see if and how Rice can support me through the process. 

THESE WERE THE QUESTIONS I BROUGHT TO THE TABLE

For the analysis, I could select different typologies of convicts to answer different questions. Just as an example: role of race in punishment and allocation of the convicts; mass escapes v. Individual escapes; the time between acts of pardon and effective discharge; change over time of type and length of convictions by county; comparison between counties in punishing crimes; assessing re-habilitative claims by looking at re-incarceration rates;  etc.


These are the problems I see:
1) I have 12 Excel files, one for every ledger of the prison, for a total of about 40,000 convicts. I prefer 12 files to one file with 12 sheets just in case something bad happens. At some point, I will need to combine those spreadsheets into one with 40,000 names. I already see that Excel is lagging when I am making changes to files with only 3,000-3,500 names. Is there a better program than Excel for the scope?
2) I would benefit from some shortcuts in inserting the data. It does not seem Excel is really helpful for what I would like to automatize, but maybe it is because I do not know the program well. Some help or advice on that?
3) I would like to make the database accessible to researchers, when my dissertation is done, by uploading it (or linking it) on a dedicated website. Will it be ever possible to do so or this requires platforms too expensive for me to afford? 
4) I want to map on GIS the location of the work camps AND their change over time. Moreover, I would like to map the convicts' birthplace, place of residence, place of conviction, and movement after discharge and represent those data in a meaningful way. Any advice?
The meeting went great, and I have now a clear path to follow.
Catherine suggested:
1) continuing to keep my Excel spreadsheet in separate files with identical variables so that it will be easy to merge them when needed. Although Excel does not offer many shortcuts, it is still a pretty solid software for my needs
2) make multiple backups of my files
3) start preparing now for the time I start analyzing the data. She suggested two open-source programs for data analysis: Python or R. While R is a better tool for statistical analysis, Python is more flexible and better helps to transform, filter, select, and visualize big data. Python runs several apps and this makes it very adaptable.
4) a code in Python is shareable, thus is it possible to link the data analysis/visualization to any website/wiki page
5) start learning Python. Rice offers classes online (I already enrolled for a beginner class at the end of November; one two-days full immersion class will be available in February). She will share with me her list of resources for self-teaching.
6) when the database is ready, Fondren Library can register ut with a data object identifier that protects my intellectual property by making it easier for other researchers to cite the database
7) look into courses for learning GIS (I did a couple in the past but I did not have data ready for mapping, so I just forgot everything).
The conversation was exciting and very useful. I am really grateful for the time and this tremendous resource that Rice offers. Good to know that I am on track. 
Write a comment…