Building an OpenAI-Powered Product Review Analysis Web App: A Step-by-Step Guide

thumbnail

In this tutorial, we’ll walk through the process of creating a web application that scrapes product reviews from a website and utilizes OpenAI’s API to generate a summarized review opinion. This application can be a valuable tool for consumers looking to quickly understand the sentiments surrounding a product based on existing reviews.

Important Disclaimer: This tutorial is for educational purposes only. Scraping data from websites may violate their terms and conditions (T&Cs). It’s crucial to always check the T&Cs of any website before scraping data.

Here we will demonstrate by scraping product page reviews on my website which I built specifically for this post, so you can also scrape this website. Below is the page link

https://codewithmarish.com/playground/scrape-reviews

Prerequisites

  • You need an OpenAI API key, for this please refer to the official docs for creating and setting up the key for your project

https://platform.openai.com/

Step 1: Writing the Backend Code

Let’s start by creating a Node js project npm init in your project directory and installing the dependencies express, cors, openai, puppeteer

Importing Dependencies

import cors from "cors";
import express from "express";
import puppeteer from "puppeteer";
import OpenAI from "openai";
  • cors: Middleware for handling cross-origin resource sharing.
  • express: Node.js framework for building web applications.
  • puppeteer: It is a Node.js library that allows you to automate tasks such as web scraping, form submission, UI testing, and website interaction.
  • openai: Node.js client for interacting with the OpenAI API.

Initializing OpenAI Client and Defining Prompt

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const prompt =
  "Below are the reviews for a product, provide your review in a short paragraph";
  • Initialize the OpenAI client with your API key.
  • Define a prompt that will be used to generate a summarized review opinion.

Web Scraping Function

const crawlWebsite = async (url) => {
  // Puppeteer code for scraping product reviews
  const browser = await puppeter.launch({ headless: "new" });
  const page = await browser.newPage();
  console.log("Page loading...");
  await page.setViewport({ width: 1280, height: 968, deviceScaleFactor: 1 });
  await page.goto(url, {
    waitUntil: "networkidle0",
  });
  console.log("scraping...");
  const data = [];

  let filterSel = await page.$("select");
  await filterSel?.scrollIntoView();
  const filters = ["neutral", "positive", "negative"];
  for (let j = 0; j < filters.length; j++) {
    await filterSel?.scrollIntoView();
    await filterSel?.type(filters[j]);
    //Based on the html structure our reviews div is a sibling of a div whose div has a child h2 with text Customer Reviews
    let customerReviewSelector = `//div[h2[text()='Customer Reviews']]/following-sibling::div`;
    await page.waitForXPath(customerReviewSelector);
    let [firstel, secondel] = await page.$x(customerReviewSelector);
    await (secondel as ElementHandle<Element>).scrollIntoView();

    let paginationButtons = await secondel.$$("button");
    await (firstel as ElementHandle<Element>).scrollIntoView();
    for (let k = 0; k < paginationButtons.length; k++) {
      await paginationButtons[k].scrollIntoView();
      await paginationButtons[k].click();
      let reviewsComp = await firstel.$$("div");
      for (let i = 0; i < reviewsComp.length; i++) {
        try {
          if (!(await reviewsComp[i].isVisible())) {
            await reviewsComp[i].scrollIntoView();
          }
          let childDivs = await reviewsComp[i].$$("p");

          if (childDivs) {
            let review = await childDivs[0]?.evaluate((t) => {
              return t.innerText;
            });
            let rating = await childDivs[1]?.evaluate((t) => {
              return t.innerText;
            });
            data.push(`${review} with ${rating}.`);
          }
        } catch (err) {
          console.log("Error: ", err, i, j, k);
        }
      }
    }
  }
  await browser.close();

  return data;  
};

This function is responsible for scraping product reviews from a given URL using Puppeteer, a headless browser automation library. Let's get into it step by step:

  1. Launching Browser:
  • Puppeteer’s launch method is used to launch a new browser instance.
  • { headless: "new" } indicates that a new browser window should be launched in headless mode.

2. Creating Page:

  • browser.newPage() creates a new page instance within the browser.
  • await page.setViewport({ width: 1280, height: 968, deviceScaleFactor: 1 }) sets the viewport size of the page.

3. Navigating to URL:

  • page.goto(url, { waitUntil: "networkidle0" }) It tells the browser to go to the specified URL and wait until the page finishes loading. This ensures that all the content on the page is available before proceeding.

4. Scraping Reviews:

  • const filters = ["neutral", "positive", "negative"] defines an array of review filters. A loop iterates over each filter to fetch reviews of different sentiments.
  • page.$("select") finds the <select> element containing the review filter options. Brings the selectelement into view using the scrollIntoView()method to ensure that before we interact with it, it comes into view otherwise we could face exceptions/errors when interacting with it, and selects the current filter in the loop using filterSel[j].type(filters[j])
  • customerReviewSelector=”//div[h2[text()=’Customer Reviews’]]/following-subling::div This is the XPath selector that we have used to select the reviews, as per our HTML structure we don’t have specific class names or IDs to identify if it is review text or not so we are extracting based on the HTML structure such that div element wrapping the reviews is the sibling of div which has child element h2 with text content “Customer Reviews”. This selector helps in getting our review element and the pagination as both are siblings of h2 ‘s parent. This technique will be helpful, as most of the websites have dynamically generated class names or don't have specific class names for each element or don’t have specific ids to uniquely identify.
  • page.waitForXPath(customerReviewSelector) waits for the presence of the element matching the XPath expression //div[h2[text()='Customer Reviews']]/following-sibling::div.
  • page.$x(customerReviewSelector) finds the elements matching the XPath expression.
  • We have also called the scrollIntoView method for the elements Before calling scrollIntoView we can also use any one of the methods such as element.isIntersectingViewport() , element.isVisible() or element.isHidden() to see if the element is in view or not.
  • It then finds the section of the page containing customer reviews and extracts the review text and rating for each review. If there are multiple pages of reviews, it iterates through the pagination buttons and clicks each one paginationButtons[k].click()to load and scrape reviews from each page.
  • During each iteration, it collects the review text and rating for each review and stores them in an array called data.
  • Finally, the browser instance is closed using browser.close().

Generating Review Summary Function

const getAIReview = async (data: string[]) => {
  const completion = await openai.chat.completions.create({
    messages: [
      {
        role: "user",
        content: `${prompt} ${data}`,
      },
    ],
    model: "gpt-3.5-turbo",
  });
  let message = completion.choices[0].message.content;
  return message;
};

The getAIReview function utilizes OpenAI's GPT-3.5 model to generate an AI-generated review based on the provided review data. The function calls openai.chat.completions.create() to generate an AI response based on the provided review data. We pass the prompt and data in the messages array. The model property specifies the version of the GPT model to be used ("gpt-3.5-turbo" in this case). Upon receiving the completion from the OpenAI API, the generated AI response message is extracted from completion.choices[0].message.content and returns the response.

Express App Setup

const app = express();
app.use(cors());
app.use(express.json());
app.post("/reviews-ai-scanner", async (req, res) => {
  try {
    const data = await crawlWebsite(req.body.url);
    const message = await getAIReview(data);
    return res.json({ summary: message });
  } catch (error) {
    console.error("Error:", error);
    return res.status(500).json({ error: "An error occurred" });
  }
});

Set up an Express application and apply middleware for handling CORS and parsing JSON request bodies. Defines a POST endpoint /reviews-ai-scanner to handle requests for analyzing product reviews. It calls the crawlWebsite function to scrape reviews and the getAIReview function to generate a review summary using OpenAI.

Final code

import cors from "cors";
import express from "express";
import puppeter, { ElementHandle } from "puppeteer";
import OpenAI from "openai";

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const prompt =
  "Below are the reviews for a product, provide your review in a short paragraph";

const crawlWebsite = async (url: string) => {
  const browser = await puppeter.launch({ headless: "new" });
  const page = await browser.newPage();
  console.log("Page loading...");
  await page.setViewport({ width: 1280, height: 968, deviceScaleFactor: 1 });
  await page.goto(url, {
    waitUntil: "networkidle0",
  });
  console.log("scraping...");
  const data = [];

  let filterSel = await page.$("select");
  await filterSel?.scrollIntoView();
  const filters = ["neutral", "positive", "negative"];
  for (let j = 0; j < filters.length; j++) {
    await filterSel?.scrollIntoView();
    await filterSel?.type(filters[j]);
    //Based on the html structure our reviews div is a sibling of a div whose div has a child h2 with text Customer Reviews
    let customerReviewSelector = `//div[h2[text()='Customer Reviews']]/following-sibling::div`;
    await page.waitForXPath(customerReviewSelector);
    let [firstel, secondel] = await page.$x(customerReviewSelector);
    await (secondel as ElementHandle<Element>).scrollIntoView();

    let paginationButtons = await secondel.$$("button");
    await (firstel as ElementHandle<Element>).scrollIntoView();
    for (let k = 0; k < paginationButtons.length; k++) {
      await paginationButtons[k].scrollIntoView();
      await paginationButtons[k].click();
      let reviewsComp = await firstel.$$("div");
      for (let i = 0; i < reviewsComp.length; i++) {
        try {
          if (!(await reviewsComp[i].isVisible())) {
            await reviewsComp[i].scrollIntoView();
          }
          let childDivs = await reviewsComp[i].$$("p");

          if (childDivs) {
            let review = await childDivs[0]?.evaluate((t) => {
              return t.innerText;
            });
            let rating = await childDivs[1]?.evaluate((t) => {
              return t.innerText;
            });
            data.push(`${review} with ${rating}.`);
          }
        } catch (err) {
          console.log("Error: ", err, i, j, k);
        }
      }
    }
  }
  await browser.close();

  return data;
};

const getAIReview = async (data: string[]) => {
  const completion = await openai.chat.completions.create({
    messages: [
      {
        role: "user",
        content: `${prompt} ${data}`,
      },
    ],
    model: "gpt-3.5-turbo",
  });
  console.log(completion.choices[0].message);
  let message = completion.choices[0].message.content;
  return message;
};
const app = express();
app.use(cors());
app.use(express.json());
app.post("/reviews-ai-scanner", async (req, res) => {
  console.log(req.body);
  const data = await crawlWebsite(req.body.url);
  const message = await getAIReview(data);
  return res.send({
    summary: message,
  });
});

app.listen(3001, () => {
  console.log("Server running on 3001");
});

Step 2: Creating the Frontend

"use client";
import React, { useState } from "react";

const ReviewsScan = () => {
  const [urlInput, setUrlInput] = useState("");
  const [output, setOutput] = useState("");
  const [loading, setLoading] = useState(false);

  const handleSubmit = async (e: any) => {
    setLoading(true);
    e.preventDefault();
    var data = new FormData();
    data.append("url", urlInput);
    const res = await fetch("http://localhost:3001/reviews-ai-scanner", {
      method: "post",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ url: urlInput }),
    });

    const respData = await res.json();

    setOutput(respData.summary);

    setLoading(false);
  };
  return (
    <div className="max-w-5xl px-4 container mx-auto flex flex-col">
      <form className="mb-4 flex flex-col text-center" onSubmit={handleSubmit}>
        <label className="mb-6 text-2xl" htmlFor="url">
          Enter the URL
        </label>
        <input
          required
          className="px-3 py-2 text-center outline-none"
          onChange={(e) => {
            setUrlInput(e.target.value);
          }}
          placeholder="http://codewithmarish.com/playground/scrape-reviews"
          name="url"
          type="url"
        />
        <button
          type="submit"
          className="px-4 py-2 bg-black rounded text-white w-fit tracking-widest self-center mt-4 uppercase"
        >
          Submit
        </button>
      </form>
      {loading ? (
        <p className="text-center">Loading...</p>
      ) : (
        <p className="text-center font-light">
         </p>
      )}
      <div className="mt-6 h-72 bg-white rounded overflow-auto">
        <p className="font-light tracking-widest p-2">{output}</p>
      </div>
    </div>
  );
};

export default ReviewsScan;

Let's create a new component named ReviewsScan which contains a form for accepting URL as input and a div for showing AI responses.

State Variables:

  • urlInput: State variable to store the input URL provided by the user.
  • output: State variable to store the output summary generated by the backend.
  • loading: State variable to track whether the data is being loaded or not.

Handle Submit Function:

  • handleSubmit: This function is called when the form is submitted. It sends a POST request to the backend server with the user's URL input. Upon receiving a response, it updates the output state variable with the summary generated by the backend.

JSX:

  • The JSX contains a form with an input field for the user to enter the URL and a submit button. It uses the handleSubmit function as the onSubmit event handler. If loading is true, it displays a "Loading..." message to indicate that data is being fetched. Finally, we have a container to display the output summary generated by the backend. The summary is stored in the output state variable.

Now you can start your node js server and next js application, enter the URL (https://codewithmarish.com/playground/scrape-reviews), and see it in action.

Congratulations now you have successfully built an AI-powered web application for analyzing product reviews., also we’ve explored the fascinating intersection of web development and artificial intelligence by building a reviews analyzer web app with OpenAI, Node JS, and Next.js. Happy Coding!.

C️️odeWithMarish