Review (2021-11-08)
This page provides a written and video summary of a review of scraper code from October 8, 2021.
Choose between super.getValues() and super.getValue().
If you expect a list of matching elements, use super.getValues(). If you are looking for only a single match, then use super.getValue().
Don't use super.getValues() when you only need a single element.
Don't navigate to pages unnecessarily.
Future developers reading your code are trying to understand why each line of code is there. Don't put in lines of code that don't mean anything.
Don't override superclass methods unnecessarily.
Don't include code like this:
async login() {
super.login();
}
A future developer will not understand why this code was written.
Don't propogate the "weird for-loop".
There are a few instances of code like this from the previous scraper implementation:
for (let i = 0; i < 5; i++) {
results.push(await super.getValues( 'h2[class="top-card-layout__title topcard__title"]', 'innerHTML'));
results.push(await super.getValues('a[class="topcard__org-name-link topcard__flavor--black-link"]', 'innerHTML'));
results.push(await super.getValues('span[class="topcard__flavor topcard__flavor--bullet"]', 'innerHTML'));
results.push(await super.getValues('span.topcard__flavor--metadata.posted-time-ago__text', 'innerHTML'));
results.push(await super.getValues('div[class="show-more-less-html__markup show-more-less-html__markup--clamp-after-5"]', 'innerHTML'));
}
This for-loop makes no sense.
Use vertical space efficiently
Don't use up half a screen of code on logging and unused code:
this.log.debug('Got data:');
this.log.debug('position', position);
this.log.debug('company', company);
this.log.debug('location', location);
this.log.debug('posted', posted);
this.log.debug('description', description);
// let state = '';
// if (!location.match(/([^,]*)/g)[2]) {
// state = 'United States';
// } else {
// state = location.match(/([^,]*)/g)[2].trim();
// }
Be kind to future developers by not filling up a half a screen when one line could do the trick.
Avoid eslint-disable-next-line if possible
And get rid of the comment if it's no longer doing anything:
// eslint-disable-next-line prefer-const
const position = await super.getValues('h1[class="topcard__title"]', 'innerText');
Understand the page navigation patterns
Page navigation can occur in two ways, and you need to understand the differences. See Understand Page Navigation Patterns for details.