In 1998, when I was completing an M.A. thesis on Classical farmsteads, I compiled hundreds of relevant Greek and Latin texts on handwritten 4 x 6” notecards. Running searches on Greek keywords for farms and rural life via the CD-ROM produced by the Thesaurus Linguae Graecae, a comprehensive library of all Greek literary texts produced before 1453, I meticulously copied out notes or transcribed translations of relevant evidence from Thucydides, Demosthenes, Xenophon, etc…
While very little has changed in my method of research (consulting ancient texts to make arguments), it’s amazing how much the basic process of conducting research has shifted in the last two decades. By the time I started work on my doctoral dissertation on the Late Roman Corinthia, I had switched to a laptop and was dumping translations and notes about ancient literary texts directly into Microsoft Word documents. Mining the comprehensive digital library of Greek texts via the TLG and the somewhat less comprehensive collection of Latin texts in the PHI Latin Library, I created a complete list of ancient literary citations related to the island of Kythera and the sites of the Corinthia. As Greek and Latin references to the Corinthia number over 5,000, I was able to type out or copy English translations and notes on a tiny portion of these. I have used these documents for much of the work on Corinth I’ve completed since 2006, but they are pretty messy documents running dozens, if not hundreds, of pages long. One can search these texts via Windows Explorer or Control-F, but they are not easy to browse or search in complex ways.
Last year, I became interested in how to use freeware like Zotero (or commercial software like EndNote) to organize, tag, and annotate large bodies of ancient literary sources. My ambitious plan was to look up and analyze all the Greek and Latin references to isthmuses in antiquity, and I needed a better system for organizing my findings and translations than word processing software. I was particularly interested in creating a large body of English translations that would facilitate complex word searches. I was also interested in creating a library of Corinthia-related texts that would serve the public.
I decided to go with Zotero because it was free and because I was already using it for the Corinthian Studies Library. As an experiment, I timed how long it would take to manually add the ancient citations from my earlier word documents into Zotero. I experimented for two hours (mind you, I was at the start of a sabbatical). By the second hour, I was able to do about 50 records per hour. Assuming some time for distraction, breaks for coffee, facebook, changing diapers (of my 3-month old), I figured that I might maintain a rate of 30 records per hour, which would require about 100 hours of work to create records for all the ancient citations for the key word “isthmus” relevant to the Roman period. Add another 100 hours for references to Archaic to Hellenistic and Late Antiquity. Thus, just to create the records would require hundreds of hours of work.
If my research project had depended on a few dozen citations, it would have been easier to enter these all manually into Zotero, or simply take notes in Word, consulting English translations at Perseus, LacusCurtius, and the Internet Archive, or translating passages most important to my work. But the project I had in mind was to study the history of the Isthmus of Corinth (and isthmus generally) through a large-scale compilation of texts, to look at patterns in use over time, and to return frequently to the translations. And I was interested in making some of these English translations public in the end.
I was surprised that there was so little information online about creating massive citation libraries with either Zotero or EndNote. With a little snooping, I discovered the problems with importing bibliography / texts created in Word documents into either Zotero or EndNote (see discussion here and here) and knew I’d need to clean up the records in Excel (I don’t know programming).
After a day or two of experimentation and failure, I figured out how to convert a standard list of citations generated via the TLG database or the PHI Latin database into EndNote and Zotero. Once I figured out the steps, it took me about only a day to import 5,800 records and edit and standardize the citations—but that long mainly because I ran into additional problems. Still, even a day was faster than entering them all manually. Now that I’ve done it, and learned the functions in Excel, I can bring additional records into Zotero in an hour or two (longer if the base lists are messy).
Since others may have major research projects in mind (like dissertations or theses) that require mining large quantities of ancient texts, I figured there might be some interest in creating one of these citation libraries. Others may have different methods for doing so, especially if they can program, so this is just the way that I did it.
I’ll write today about how I imported PHI records into Zotero, and in future posts, will cover TLG and English translations.
To replicate this process, you will need MS Word, Excel 2010 (or something comparable), a copy of EndNote (30-day trial versions available for download), and Zotero if you plan to use Zotero in addition to or instead of EndNote. I couldn’t figure out how to import data to Zotero without the aid of EndNote, so that was a necessary middle step. I’ll also assume you have some knowledge of PHI, and know how to write basic functions in MS Excel version 10 or an equivalent spreadsheet program. If you know how to program (I don’t), a program like Python may simplify the following steps.
*Warning and Disclaimer: Given different operating systems, program versions, etc…., I cannot guarantee the following steps will work successfully. I’ll be curious to see if anyone can replicate the steps—please comment if you do. Remember to save your documents as you go along.
Step 1: Select and Copy Latin Citations via Concordance Feature in PHI
The texts of PHI Latin are freely available and searchable online and can be queried and copied provided that you “use this web site only for personal study and not to make copies except for my personal use under “Fair Use” principles of Copyright law.” If you have the Latin CD-ROM and a program like Silver Mountain Software, the steps are comparable to the online version of PHI. I’ll use the online version as the example.
In PHI, the concordance search returns Author-Work-Citation-Latin text format. Run a keyword search using the Concordance.
Copy the Texts by selecting and scrolling down while holding the shift key, right click – Copy.
Paste into MS Excel (Paste to “Match Destination Formatting” to eliminate the hyperlinks and color). You should end up with something like this:
If there are additional Latin keywords you’d like to include (e.g., Cenchreae, Lechaeum, Isthm-), just repeat this step and dump into Excel.
Step 2: Concatenate Fragmented Latin Text
Excel divides the copied text into three columns, incorrectly deducing a break at the bold word “Corinth”. You’ll need to convert the texts in two ways now: separate author and work-citation from one another via commas, and combine the two columns into a single text using the Concatenate function of Excel (I’m assuming you will want to keep the Latin text).
Start with the latter: Concatenate cells B1 and C1, then copy and paste the same formula to all cells in the fourth column
This results in:
Copy this new column 4. Paste “Values” into a new, fifth column (you do this because Column 4 is a Formula that produces values dependent on the other two columns. Once you delete those columns, the formula will not work). Delete Columns 2-4. Result is that we’ve reconnected the Latin, which Excel originally separated:
Step 3: Separate Author from Work via Comma
Now, the other conversion is to separate the author from the work + citation. Since the PHI database has output this in a standard way, with the first “.” in column A as separating the author from the work, we just need to change this period to a comma, so that, for example, “Liv.Perioch.1b.9” becomes “Liv,Perioch.1b.9”. That comma will become the basis for delimiting the two in the next step.
Insert a new column to the right of column A. Then, use the SUBSTITUTE Function to replace the first period of A1 with a comma. Here’s the formula you would enter: =SUBSTITUTE(A1,”.”,”,”,1). Looks like this:
Then copy that function to all the other cells, and you will have commas after each author name. Copy new Column B and Paste Value into new Column C (again, for reasons noted above, so that you can delete the old columns).
Then, delete Columns A and B, and insert a new blank Column to the right of Column A. Once you delimit the “work” from the “author,” the Work will occupy this new column.
Step 4: Delimit Author from Work in Separate Columns
Select Column A. Select the Data tab and then “Text to Columns” option. Where it says “Choose the File Type,” select the Delimited Button.
Select “Next” and check the box next to Comma as your Delimiter:
Hit “Next” and then “Finish”. Result is:
Step 5: Edit Authors and Titles
This is your chance to edit the text before you import it into EndNote and Zotero. It’s much easier to edit all the records now than edit records individually in Zotero or EndNote. For example, you may want to use Replace All to change name abbreviations “Cic” to “Cicero”, or work titles like “Sat.” to “Satyricon”. Or you may want to sort by author or work, and add a Year column for the work (I’ve simply inserted “1” as the year for the sake of explaining this in the image below). Or a Keywords Column with value like PHI: Corinth.
Insert a new row 1 at top of spreadsheet with the key heading words shown in the image below. EndNote will use these headers to interpret where the values go during the import. The spelling of these headers must be exact or there will be problems in importing.
Finally, insert a new column 1 and title it “Reference Type.” Beneath this, for all the records, paste the value “Ancient Text” like the following (“Book” will also work as a recognized value):
When you are finished editing, save as a Text (Tab Delimited File).
Step 6: Clean Up in Word
I am not sure this step is necessary, but this YouTube tutorial video suggests you need to clean up the text by eliminating or replacing all quotations, apostrophes, wildcards, and the word “and”. I was able to import texts successfully without this steps—so if you have problems in Step 7, return to Step 6 and see if it makes a difference. Note that replacing the word “and” with \\ as the video recommends will affect some words: e.g., “Periandrus” would become Peri\\drus”.
Step 7: Import to EndNote
To import into EndNote, select “File” tab –> “Import” –> “File.” Select your tab-delimited text file. For Import Option, select “Tab Delimited.” Duplicates: “Import All”. Text Translation: “No Translation.”
You should end up with something like the following. You can tinker with the options at top to display abstract and title.
Step 8: Export to Zotero
For this step, Zotero has provided documentation here.
To export to Zotero, click on Edit –> Output Styles –> Open Style Manager. Make sure RefMan (RIS) Export is selected. Close the Style Manager. Another acceptable export Style is BibTeX.
Select File –> Export. Select file. Save as type: Select “Text” (give file a new name). Output Style: RefMan (RIS) Export. Uncheck the box “Export Selected Records,” and EndNote will assume you want to export all records. Click “Save.”
EndNote exports it as a text file in a new format.
Step 9: Import to Zotero
Last step is open up Zotero for Firefox, or Zotero Stand-Alone.
Click on File –> Import –> select file and click “Open”.
Import should begin immediately.
This is how the records appear afterwards in the Zotero for Firefox version:
And in the Stand-Alone version:
Problems in Importing
Note that there are some bugs with moving RIS files between EndNote and Zotero. The new version of Zotero Stand-Alone sometimes stalls out for users so that the import never completes. See recent discussion at Zotero here, here, and here.
If this happens to you, as it did to me yesterday as I tried to replicate my steps from last fall, you can either try importing the RIS file into Zotero on another computer, or download Firefox and Zotero for Firefox and repeat the import. I did the latter and successfully imported the file right away. I can sync my Zotero Stand-Alone and Zotero for Firefox.
Good luck! Please comment here if you try this with or without success.
Reformatting and converting text in Excel is always fun. For a project I am working on, I am taking care to load primary sources as PDFs into Zotero, which can do a full text search in them. It’s not perfect, but it is useful to ba able to search them all at once. I also am scanning archival sources into Evernote, which is remarkably good at handwriting decipherment. I’m picking up references to place names and people this way. So in theory, if you had pdfs of all the ancient sources in Zotero, you could search by ‘Corinth’ and pull them up. That is, of course, not the same as a citation index, but could be useful for someone. There is a lot of power in these tools for people working with a limited corpus.
Richard, great idea, and I’ll mention it as I continue this series about building a library of texts. Have you had any experience using Paper Machines?
I’ll give Paper Machines a whirl. Thanks for the tip.