Using Node.js to find a new apartment

Introduction

A while ago I decided to look for a new apartment to rent in Zagreb, Croatia. The problem is, finding an apartment in any capital city is very hard. Many apartment owners are switching to services like Airbnb because it can be more profitable for them. Therefore any new apartment whose advertisement pops up on Croatian advertisement sites such as oglasnik.hr or njuskalo.hr gets taken VERY fast.

I have a friend who was looking for an apartment and he decided to write a script so that the server would go every few minutes to a particular njuskalo.hr url which contained filters in its params (example: ?max_price=2500 etc.) and checked for new apartment ads. If there were any new ones, it would notify him by mail. He was a Ruby developer so he wrote that script in Ruby. I am primarily a JavaScript frontend developer with more or less basic knowledge of Node.js. So after a dozen or so telephone calls that resulted in apartment already being taken, I decided to write a similar script myself, this time for oglasnik (although I plan to expand my script to support Njuškalo too).

First attempts

First instance of such a script was a simple frontend React app with one component. On component mount I would start an interval where a function that would get HTML response using axios was dispatched. When getting the response, I was comparing it to current state and if they weren’t the same (literal === comparison of two HTML-like strings) I would throw a simple alert and that’s it.

Obviously, this solution had flaws:

  • Initial state was empty, so any first response resulted in alert (not a big flaw but still).
  • In order for code to run I had to have the app running in browser tab all the time. Otherwise it wasn’t checking.
  • It was throwing an error at first (CORS requests error), fixed that with Allow CORS: Access-Control-Allow-Origin Chrome extension.
  • It was just comparing a HTML-like string, wasn’t parsing it for new ads or anything.

Instead, I wanted my solution to be able to:

  • Check for new ads all the time, not just when the app is running in browser.
  • actually parse HTML, detect new ads and send them to me via email.

So I decided to play around with Node.js a bit. At the end of it all, I was surprised at how easy it was. Sure, there were some roadblocks with AWS because I’ve never worked with it before but overall, I would describe the entire process as easy.

Creating base concept for the real thing

I decided fairly quickly that I might want to have the ability for more than one user looking for the new ads. Also, one user might be looking for multiple different ads. So, for start, I made this structure:

const users = [
  {
    email: "exampleFirstEmail@gmail.com",
    oglasnikURLs: ["https://www.oglasnik.hr/exampleUrl1"]
  },
  {
    email: "exampleSecondEmail@gmail.com",
    oglasnikURLs: [
      "https://www.oglasnik.hr/exampleUrl2",
      "https://www.oglasnik.hr/exampleUrl3"
    ]
  }
];

module.exports = users;

So the final goal is to have one main function that will run every 5 minutes and do the following:

  1. Get the “old” state
    • Before the very first run this will be just an empty array
    • The state is meant to contain the last known list of ads and will be saved in a json file. For local testing purposes we can use the classic fs.readFileSync to get it. However, for production purposes we will use AWS so we will have to define an S3 bucket and then use AWS’s getS3Object
  2. Get the new state
    • We will call all the URL’s, parse their data and get the new state which is structurally the same like the old state
  3. Check if old state and new state are equal
    • If they are, it means there is no new ads. Do nothing then (or just log it in the console).
  4. If old state and new state are not equal
    • get their differences - which will be the new ads
    • send emails containing new ads
    • overwrite old state file with the new state - similar thing with reading json files, locally we can use fs.writeFileSync but for production purposes we will need AWS’s writeS3Object

Making it work locally

Before going into the main function, it would be best if you installed npm package serverless globally, created a new folder and ran serverless create my-app-name. That will create a few files we will need later for setting up AWS.

Now, let’s create our function which we will (for now) run in node locally.

// app.js
function run() {
}

run()

Let’s create a file called state.json and make it just an empty array for start. Read the file in our function:

const oldStateFile = fs.readFileSync("./state.json");
const oldState = JSON.parse(oldStateFile);

Now we have to get the new state. We are going to create a function that will take in our users array from the start and then go to oglasnik URL’s and give us the new state:

async function getAdsForAllUsers(users) {
  const adsForAllUsers = users.map(async user => {
    const oglasnikURLs = user.oglasnikURLs.map(async url => ({
      url,
      ads: await getListOfAdsFromOglasnik(url)
    }));

    return {
      email: user.email,
      oglasnikURLs: await Promise.all(oglasnikURLs)
    };
  });

  return Promise.all(adsForAllUsers);
}

getListOfAdsFromOglasnik uses axios for http requests and cheerio for parsing the HTML:

const axios = require("axios");
const cheerio = require("cheerio");

async function getListOfAdsFromOglasnik(url) {
  try {
    const response = await axios.get(url);
    const $ = cheerio.load(response.data);

    const listOfAds = $("#ads-list a")
      .get()
      .map(el => ({
        title: $("h3.classified-title", el).text(),
        price: $("span.price-kn", el).text(),
        info: $("div.info-wrapper", el)
          .text()
          .replace(/  +/g, " ")
          .replace(/\n/g, " "),
        link: $(el).attr("href"),
        image: `https://www.oglasnik.hr${$("div.image-wrapper", el).data(
          "src"
        )}`,
        createdDate: $("span.date", el).text()
      }));
    return listOfAds;
  } catch (error) {
    throw new Error(error);
  }
}

As you can see, we are getting the title, price, info, link and image for every ad we got in the response.

Here is an example of how new state will look like:

[
  {
    "email": "exampleSecondEmail@gmail.com",
    "oglasnikURLs": [
      {
        "url": "https://www.oglasnik.hr/exampleUrl2",
        "ads": [
          {
            "title": "Example apartment 2 title",
            "price": "2.500 kn",
            "info": "Apartment 2 info",
            "link": "https://www.oglasnik.hr/stanovi-najam/link2",
            "image": "https://www.oglasnik.hr/repository/images/image2.jpg",
            "createdDate": "26.11.2018."
          },
          ...
        ]
      },
      {
        "url": "https://www.oglasnik.hr/exampleUrl3",
        "ads": [
          {
            "title": "Example apartment 3 title",
            "price": "2.100 kn",
            "info": "Apartment 3 info",
            "link": "https://www.oglasnik.hr/stanovi-najam/link3",
            "image": "https://www.oglasnik.hr/repository/images/image3.jpg",
            "createdDate": "26.11.2018."
          },
          ...
        ]
      }
    ]
  }
]

Ok, so now we have old state (which is empty for now) and a new state. Let’s check if they are equal first.

const _ = require("lodash");

...

    if (!_.isEqual(oldState, newState)) {
      ...
    } else {
      console.log("New state and old state are equal, not sending any emails.");
    }

If they are not equal, we will get the differences between the two states and send emails with new ads.

Let’s compare them and get their differences:

function compareState(oldState, newState) {
  return newState.reduce((prev, user) => {
    const sameUserInOldState = oldState.find(u => user.email === u.email);
    if (!sameUserInOldState) {
      return [...prev, user];
    }

    const userWithJustNewAds = getUserWithJustNewAds(user, sameUserInOldState);

    return [...prev, userWithJustNewAds];
  }, []);
}

For start we can just check if there is a user in new state which is not found in old state. If there isn’t, it means that all the ads in all the URL’s are new. If we don’t want to treat all the ads as “new” ads, we can delete this condition. Best thing in my opinion would be to create a config file that would control this feature.

If the user already exists, we will get just new ads:

function getUserWithJustNewAds(user, sameUserInOldState) {
  const userWithNoAds = {
    ...user,
    oglasnikURLs: []
  };

  return user.oglasnikURLs.reduce((prev, oglasnikURL) => {
    const sameURLInOldState = sameUserInOldState.oglasnikURLs.find(
      oldStateURL => oldStateURL.url === oglasnikURL.url
    );

    if (!sameURLInOldState) {
      return {
        ...prev,
        oglasnikURLs: [...prev.oglasnikURLs, oglasnikURL]
      }
    }

    const newAdsInURL = _.differenceBy(
      oglasnikURL.ads,
      sameURLInOldState.ads,
      "link"
    );

    const URLWithJustNewAds = {
      ...oglasnikURL,
      ads: newAdsInURL
    };

    return {
      ...prev,
      oglasnikURLs: [...prev.oglasnikURLs, URLWithJustNewAds]
    };
  }, userWithNoAds);
}

So when getting only new ads, we are first defining an user with nothing in his oglasnikURLs array which we will then fill up with the new ads we find in the user that comes from new state. Similary to checking for new user, we can check if there is an URL in new state which isn’t in an old one. If there isn’t, we’ll just add every ad because it’s a new URL and all the ads are new.

And finally, if the same URL exists in both states, we just check for new ads using lodash’s differenceBy, we are checking for differences between arrays that are based on link (since we know that every ad has unique link). Below you can see an example result of comparing two states:

example oldState:

[
  {
    email: email1@gmail.com
    oglasnikURLs: [
      {
        url: "url1",
        ads: [{ ad1 }, { ad2 }]
      }
    ]
  }
]

example newState:

[
  {
    email: email1@gmail.com
    oglasnikURLs: [
      {
        url: "url1",
        ads: [{ ad1 }, { ad2 }, { ad3 }]
      },
      {
        url: "url2",
        ads: [{ ad4 }, { ad5 }]
      }
    ]
  },
  {
    email: email2@gmail.com
    oglasnikURLs: [
      {
        url: "url3",
        ads: [{ ad6 }, { ad7 }]
      }
    ]
  }
]

example stateDifferences:

[
  {
    email: email1@gmail.com
    oglasnikURLs: [
      {
        url: "url1",
        ads: [{ ad3 }]
      },
      {
        url: "url2",
        ads: [{ ad4 }, { ad5 }]
      }
    ]
  },
  {
    email: email2@gmail.com
    oglasnikURLs: [
      {
        url: "url3",
        ads: [{ ad5 }, { ad6 }]
      }
    ]
  }
]

Notice how stateDifferences doesnt have ad1 and ad2. That’s because it was already in the old state.

All we have to do now is send emails:

function sendEmailsWithNewAds(stateDifferences) {
  stateDifferences.forEach(user => {
    user.oglasnikURLs.forEach(url => {
      if (url.ads.length > 0) {
        const mailContent = generateMailContent(url);

        const mailOptions = {
          from: "myEmail@gmail.com",
          subject: "New oglasnik ads found!",
          html: mailContent,
          to: user.email
        };

        transporter.sendMail(mailOptions, (err, info) => {
          if (err) {
            console.log("FAILED SENDING EMAIL", mailOptions);
            console.log("FAILED SENDING EMAIL ERROR", err);
          } else {
            console.log(
              "FOLLOWING EMAIL SENT SUCCESSFULLY:",
              mailOptions,
              "------INFO-------",
              info
            );
          }
        });
      }
    });
  });
}

Two things to notice in the function above, first one is generateMailContent which will just return an HTML-like string:

function generateMailContent(url) {
  let mailContent = `<p>There are new oglasnik ads found on this page: <a target='_blank' href='${
    url.url
  }'>Link</a></p>`;

  url.ads.forEach(ad => {
    const adInMail = `<h3>${ad.title}</h3><h4>${ad.info}</h4><img src='${
      ad.image
    }' /><h5>${ad.price}</h5><p>Ad created date: ${
      ad.createdDate
    }</p><p><a target='_blank' href='${ad.link}'>Link to this ad</a></p><hr />`;

    mailContent += adInMail;
  });

  return mailContent;
}

And the second one is the transporter which is just nodemailer’s transport with AWS’s Simple Email Service (SES):

const nodemailer = require("nodemailer");
const aws = require("aws-sdk");

const ses = new aws.SES();

const transporter = nodemailer.createTransport({ SES: ses });

module.exports = transporter;

Note that this can’t work localy since we are using AWS’s SES services which have to run…well, on AWS.

And the last thing left to do is to overwrite the old state with the new one:

fs.writeFileSync(
  "./src/mocks/oldState.json",
  JSON.stringify(newState, null, 2)
);

Setting it up for AWS

Before going into this, I feel like I have to repeat what I’ve said earlier. I am primarily a frontend developer and I am still very unfamiliar with AWS. Therefore, this part was fairly tricky for me and I probably shouldn’t be the one teaching you about the AWS. That’s kinda the whole point though. You can relatively easily make this even if you’re not an experienced backend developer.

So I’m just going to list out the things you have to do, as well as the changes in your code you have to make in order for it to work.

If you remember, in the start I told you to create a serverless project that created a few files. The files we will be looking into now are serverless.yml and handler.js.

Go to serverless.yml and change the name of the service to whatever name you want. Then scroll down and look for something like this:

functions:
  hello:
    handler: handler.hello

You can also change the name of that handler, I didn’t purely because it was my first time with AWS and I didn’t dare to touch too many things.

Next, make sure that you uncomment events and schedule and set it to this:

    events:
      - schedule: rate(5 minutes)

This will make your lambda function run every 5 minutes.

Next, delete the run() call in your main function. Now, import that main function in your handler.js and run it there.

const run = require("./app");

module.exports.hello = (event, context, callback) => {
  run();
  callback(null);
};

For the next part, you’re gonna have to register on AWS and setup the following services:

  • Simple Email Service (SES) for sending emails containing new ads
  • S3 access for getting old state and everwriting it with new state

When creating a S3 bucket, you’re gonna have to pick a unique name, as well as setup the permissions for it. You have to be able to read and write files. Also, you’re gonna have to replace fs.readFileSync and fs.writeFileSync:

const AWS = require("aws-sdk");

const S3 = new AWS.S3();

function writeS3Object(bucket, key, body) {
  return S3.putObject({
    Bucket: bucket,
    Key: key,
    Body: body,
    ContentType: "application/json"
  }).promise();
}

function getS3Object(bucket, key) {
  return S3.getObject({
    Bucket: bucket,
    Key: key,
    ResponseContentType: "application/json"
  })
    .promise()
    .then(file => file.Body.toString("utf-8"))
    .catch(() => writeS3Object(bucket, key, "[]"));
}

module.exports = {
  BUCKET: "name-of-my-bucket",
  OBJECT_KEY: "state.json",
  getS3Object,
  writeS3Object
};

So, instead of fs.readFileSync and fs.writeFileSync:

const { getS3Object, writeS3Object, BUCKET, OBJECT_KEY } = awsFs;

    const file = await getS3Object(BUCKET, OBJECT_KEY);
    const oldState = JSON.parse(file);
    ...
    writeS3Object(BUCKET, OBJECT_KEY, JSON.stringify(newState, null, 2));

And, that’s it. You can check out the whole code on https://github.com/WhitePointX/OglasnikServerless. Let me know what you think or how can I improve my code :)

ALL POSTS

NEXT

Make your project accessible to...