1

I started this with a class project but I want to add items to my movie award database. I got super excited when I saw this: Where to get IMDb datasets but there wasn't birthday information in any of the data sets. Can someone help me find where i can get these data from IMDB?

I also saw this ftp.fu-berlin.de/pub/misc/movies/database/frozendata and I appreciate the bio data but can't weed through all of that for just Birthday and (maybe) location.

Thank you

csk
  • 1,355
  • 6
  • 20
  • 1
    I saw this ftp://ftp.fu-berlin.de/pub/misc/movies/database/frozendata/ and I appreciate the bio data but can't weed through all of that for just Birthday and (maybe) location. – tangerine7199 May 21 '19 at 12:14

1 Answers1

1

The R Code below worked. Might be a bit cumbersome but it did the trick:

library(tidyverse)
# reading files in
con <- gzcon(url(paste("ftp://ftp.fu-berlin.de/pub/misc/movies/database/frozendata/",
                       "biographies.list.gz", sep="")))
txt <- readLines(con)
biographies <- read.csv(textConnection(txt), quote = "", skip = 1, header = FALSE)

# Extracting lines with name (NM) and birthdate (DB)
test_2 <- biographies %>% rownames_to_column("biographies") %>% filter(stringr::str_detect(V1, 'NM:|DB:') )

# Create column to deliniate if its a name or a bday
test_2$starts_with <- substr(test_2$V1, 0, 2)

# Create a grouping variable. 
test_2$group = cumsum(test_2$starts_with == "NM")

# Dataframe of just names
test_3 <- subset(test_2, starts_with == "NM")

# Dataframe of just birthdates
test_4 <- subset(test_2, starts_with == "DB")

# Merging them together by group
birthday <- merge(x=test_3, y=test_4, by.x="group", by.y="group", all.x = TRUE)

rm(test, test_2, test_3, test_4)