README.txt - Submission Screenshot Processor & PDF Generator ------------------------------------------------------------ Author: Lawrence Goetz Institution: Brooklyn College Department: Computer & Information Science Contact: lgoetz@brooklyn.cuny.edu Description: ------------ This program automates the process of capturing screenshots from provided URLs and compiling them into a structured PDF report. It ensures submissions are documented with clickable links and error messages for failed attempts. Features: --------- - Reads titles and submission URLs from `urls.txt`. - Uses Selenium to navigate to each URL and take a screenshot. - Handles slow-loading pages with a timeout mechanism. - Saves screenshots in the `screenshots` folder. - Generates a PDF with titles displayed *above* images. - Adds URLs below images in the PDF, making them clickable. - Marks failed screenshot attempts with error messages (displayed in red in the PDF). - Ensures failed screenshots still list their title and URL in the PDF. - **Validates URLs** to allow only `http` or `https` formats. - **Prevents accidental overwriting** by checking if a PDF already exists before execution. Prerequisites: 1. Install Python 3 2. Before running the program, install the following Python libraries: - Selenium (for web automation) - Pillow (for image processing) - ReportLab (for PDF generation) pip install selenium pillow reportlab 3. Install Chrome WebDriver: - Check your Google Chrome version under "Settings > About Chrome" - Download the matching Chrome WebDriver from: https://chromedriver.chromium.org/downloads - Place `chromedriver.exe` (Windows) or `chromedriver` (macOS/Linux) in your project directory or system PATH https://developer.chrome.com/docs/chromedriver/get-started Setup Instructions: ------------------- 1. **Create a `urls.txt` file** in the same folder as this script. 2. **File Format for `urls.txt`:** - Each **title** should be on one line. - The corresponding **URL** should be on the next line. - You can **leave blank lines between entries** for readability. Example `urls.txt`: John Doe https://student-submission-link.com/johndoe Anna Smith https://student-submission-link.com/annasmith Please note that the URLs must start with http:// or https://, you may adjust follow line of code as necessary: if not url or not (url.startswith("http://") or url.startswith("https://")): Running the Program: --------------------- 1. Ensure that **Google Chrome** is installed on your system. 2. **Run the script using Python**: Running the Script: Start the program by running: python process.py 3. The program will: - Take screenshots for each URL listed in `urls.txt`. - Save them in the `screenshots` folder. - Generate a PDF (`Submissions.pdf`) listing all titles, images, and links. - If a page cannot be processed, an error message will appear in red in the PDF. Troubleshooting & Common Issues: -------------------------------- **Chrome WebDriver Not Found:** - Ensure ChromeDriver is installed and **matches your Chrome version.** - Place ChromeDriver in the same directory or **add it to your system PATH.** **Python Library Not Installed:** - If you get `ModuleNotFoundError`, install missing dependencies with: ``` pip install selenium pillow reportlab ``` **PDF Already Exists Error:** - The program **stops execution** to prevent overwriting. **Delete or rename the existing PDF** before running again. **Screenshots Not Capturing:** - Ensure **URLs are valid** and **pages are accessible** (check manually). - Try **disabling headless mode** in Selenium for debugging: ```python options = webdriver.ChromeOptions() # Remove headless mode to see the browser open # options.add_argument("--headless") ``` **Timeout errors:** Solution: Increase the timeout limit in the script (default is 10 seconds) on line: WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.TAG_NAME, "body"))) Enhancements & Customization: ----------------------------- **Adjust Screenshot Size** - Modify `img.thumbnail((500, 500))` in the PDF section. **Change Font & Colors** - Modify `pdf.setFont()` or `pdf.setFillColorRGB()` for styling. **Add Custom Headers to PDF** - Insert a **title page** or branding text at the beginning. Requires custom coding.