README.txt - Submission Screenshot Processor & PDF Generator
------------------------------------------------------------

Author: Lawrence Goetz  
Institution: Brooklyn College  
Department: Computer & Information Science  
Contact: lgoetz@brooklyn.cuny.edu  

Description:
------------
This program automates the process of capturing screenshots from provided URLs 
and compiling them into a structured PDF report. It ensures submissions are 
documented with clickable links and error messages for failed attempts.

Features:
---------
- Reads titles and submission URLs from `urls.txt`.
- Uses Selenium to navigate to each URL and take a screenshot.
- Handles slow-loading pages with a timeout mechanism.
- Saves screenshots in the `screenshots` folder.
- Generates a PDF with titles displayed *above* images.
- Adds URLs below images in the PDF, making them clickable.
- Marks failed screenshot attempts with error messages (displayed in red in the PDF).
- Ensures failed screenshots still list their title and URL in the PDF.
- **Validates URLs** to allow only `http` or `https` formats.
- **Prevents accidental overwriting** by checking if a PDF already exists before execution.

Prerequisites:
1. Install Python 3  
2. Before running the program, install the following Python libraries:
   - Selenium (for web automation)
   - Pillow (for image processing)
   - ReportLab (for PDF generation)

   pip install selenium pillow reportlab
   
3. Install Chrome WebDriver:
   - Check your Google Chrome version under "Settings > About Chrome"
   - Download the matching Chrome WebDriver from:
     https://chromedriver.chromium.org/downloads
   - Place `chromedriver.exe` (Windows) or `chromedriver` (macOS/Linux) 
     in your project directory or system PATH

   https://developer.chrome.com/docs/chromedriver/get-started


Setup Instructions:
-------------------
1. **Create a `urls.txt` file** in the same folder as this script.
2. **File Format for `urls.txt`:**
   - Each **title** should be on one line.
   - The corresponding **URL** should be on the next line.
   - You can **leave blank lines between entries** for readability.

Example `urls.txt`:
John Doe  
https://student-submission-link.com/johndoe  

Anna Smith  
https://student-submission-link.com/annasmith

Please note that the URLs must start with http:// or https://, you may adjust follow line of code as necessary:
if not url or not (url.startswith("http://") or url.startswith("https://")):  

Running the Program:
---------------------
1. Ensure that **Google Chrome** is installed on your system.
2. **Run the script using Python**:

Running the Script:
Start the program by running:
   python process.py

3. The program will:
- Take screenshots for each URL listed in `urls.txt`.
- Save them in the `screenshots` folder.
- Generate a PDF (`Submissions.pdf`) listing all titles, images, and links.
- If a page cannot be processed, an error message will appear in red in the PDF.

Troubleshooting & Common Issues:
--------------------------------
**Chrome WebDriver Not Found:**  
- Ensure ChromeDriver is installed and **matches your Chrome version.**  
- Place ChromeDriver in the same directory or **add it to your system PATH.**  

**Python Library Not Installed:**  
- If you get `ModuleNotFoundError`, install missing dependencies with:  
  ```
  pip install selenium pillow reportlab
  ```  

**PDF Already Exists Error:**  
- The program **stops execution** to prevent overwriting. **Delete or rename the existing PDF** before running again.  

**Screenshots Not Capturing:**  
- Ensure **URLs are valid** and **pages are accessible** (check manually).  
- Try **disabling headless mode** in Selenium for debugging:
  ```python
  options = webdriver.ChromeOptions()
  # Remove headless mode to see the browser open
  # options.add_argument("--headless")
  ```

**Timeout errors:**
Solution: Increase the timeout limit in the script (default is 10 seconds) on line:
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.TAG_NAME, "body")))


Enhancements & Customization:
-----------------------------
**Adjust Screenshot Size** - Modify `img.thumbnail((500, 500))` in the PDF section.  
**Change Font & Colors** - Modify `pdf.setFont()` or `pdf.setFillColorRGB()` for styling.  
**Add Custom Headers to PDF** - Insert a **title page** or branding text at the beginning.
                                    Requires custom coding.