Legal Issues

Many sites have implemented mechanisms that attempt to detect and prevent web scraping. InternAloha is in an interesting situation in which we are attempting to increase the visibility of internship opportunities to students. Thus, we are confident that any company who is offering an internship would be happy for InternAloha to make that opportunity more visible to students. Furthermore, we are using internship position data in a non-commercial manner.

What kinds of controls can be put on web scraping is unclear.

For our purposes, the takeaway from these sites seems to be:

  1. Internship listings that we can access without logging in are "public" and our specific use is probably protected under "fair use".

  2. We must avoid violating the "Trespass to Chattels" law. This means we need to limit our activities on the site, scrape it at a moderate ("human") rate, and not visit the site too frequently (even while doing development).

  3. If a site prohibits scraping under its terms of service, and we need to login to access the data, then we need to obtain explicit permission from the service.

  4. It's not a bad idea for us to contact all of the sites anyway. It's not like we're doing anything bad. In fact, we're doing something they should want to support.