Another time sink: Project Gutenberg

TalkTalk about LibraryThing

Join LibraryThing to post.

Another time sink: Project Gutenberg

1bnielsen
Edited: Sep 16, 4:42 pm

Back in this thread Joeb1934 had some interesting ideas about tag mirrors:
https://www.librarything.com/topic/343419

This made me explore my own tag mirrors (i.e. tag mirrors for some of the works in my library). And tags like "Gutenberg" was quite useful. Some of the books tagged with Gutenberg by other LT users couldn't be found on the main Project Gutenberg site which made me explore a bit more. So I found quite a few books / stories on Gutenberg Australia and Canada that are only released there (due to differences in the copyright laws for US/Canada/Australia).

Just wanted to share this here, in case others like me might like to have the text of say, Brave New World, handy while reading a physical copy of the book.

As an aside I found that "Büchergilde Gutenberg" is a publisher and "Gutenberg Books, Rochester" is a book shop :-)

The next part of my time sink project was to add a comment on my books like "Gutenberg, bind 71372" and write a script that looks up 71372 in Project Gutenberg and returns the title giving me a list like this:


...
Gutenberg, bind 70815 I Jordens Indre
Tarzan at the Earth's core
Gutenberg, bind 70815 Tarzan finder en ny Verden
Tarzan at the Earth's core
Gutenberg, bind 71370 Tarzan og hans Dobbeltgænger
Tarzan and the Lion Man
Gutenberg, bind 71372 Tarzan, den sejrende
Tarzan triumphant
...


Where the Gutenberg lines are from my LibraryThing catalogue and the next line is what Project Gutenberg calls the corresponding title.

The current Gutenberg catalog is renewed every day and can be found here:
https://www.gutenberg.org/cache/epub/feeds/pg_catalog.csv

The format is csv with embedded newlines, so I had to use the csvformat command (to avoid writing a script myself) to give me a file like:


Text# Title Authors
1 The Declaration of Independence of the United States of America Jefferson, Thomas, 1743-1826
2 The United States Bill of Rights\nThe Ten Original Amendments to the Constitution of the United States United States
3 John F. Kennedy's Inaugural Address Kennedy, John F. (John Fitzgerald), 1917-1963
...

2bnielsen
Edited: Sep 16, 4:53 pm

The csvformat command I used is here in case others might find it useful:


cat pg_catalog.csv | tr -d "\r" | csvformat -M '@' -T | tr '\n@' '@\n' | sed -e 's/@/\\n/g' | cut -f1,4,6