Scrape Textbooks, Save Money

by th0tnet

The school year is a time period too often accompanied by high expenses.  Education should not be exclusive to the privileged!  Projects like Alexandra Elbakyan's Sci Hub (@sci_hub) have done well to liberate millions of excellent research papers from paid, closed-sources.  A typical American student may spend hundreds of dollars on textbooks per semester, but it is hard to disrupt an industry that does well to ensure its products are sold en masse to schools.

Lots of textbooks are available as Kindle e-books.  What's great about Kindle is that it's cross-platform, so you can read books with a native Mac OS X app.  What is also great about Kindle is they often offer trials of unlimited reading, and sometime trials of entire books.  This means for a handful of days, you can browse an entire textbook for free.  And if you can browse it, you can scrape it.

So!

Below is an AppleScript that will open up the Kindle.app application on your Mac OS X system and proceed to photograph every page of your textbook.  The screenshots of these pages will be saved into a folder on the file system.  Make sure you have the textbook ready on your Kindle app, and make sure not to mess with the computer while the script is running!  It needs some time to do its thing uninterrupted.  Once done, you can easily convert all the PNG screenshots of the textbook's pages into PDFs, then combine all the pages into a single textbook PDF.

That last part is a little wonky, so feel free to reach out anytime!

display dialog "enter osx username" default answer ""
set uname to text returned of result

display dialog "enter number of pages" default answer ""
set pnum to text returned of result

tell application "Finder"
  activate
  make new folder at folder "Desktop" of folder uname of folder "Users" of startup disk with properties {name:"textbook"}
end tell

set counter to "0"
tell application "System Events"
  activate application "Kindle.app"
  repeat pnum times
    set counter to counter + 1
    do shell script "screencapture -t pdf /Users/" & uname & "/Desktop/textbook/" & counter & ".pdf"
    tell application "System Events" to key code 124
    delay 0.3
  end repeat
end tell

Code: Scrape_Textbooks.scpt

Return to $2600 Index