4

I am looking for a corpus of fiction book blurbs.

With at least 10,000 entries. Preferably with a lot more.

I do not care if it is a single publisher (etc) or many.

It should be annotated with author, publisher etc. Ideally would be annotate with Genre and Subgenre as well.

1 Answers1

3

Blurbs, or short descriptive material to promote a book, may be impossible to legally share do to individual copyrights of the authors or publishers.

The Goodreads API has many endpoints, and they include this note:

Book cover images, descriptions, and other data from third party sources might be excluded, because we do not have a license to distribute these data via our API.

In contrast, the book metadata can be shared (ISBN, author, publisher, etc). See, for example, the Book-Crossing Dataset.

So, in order to get a big data set of blurbs, you'd have to contact (large) publishers and ask for access. I noticed you have university affiliation, so you should mention that it is for non-commercial purposes.

philshem
  • 17,647
  • 7
  • 68
  • 170