Streamlit, YouTube, and OpenAI for Video Summarization

In this blog post, we’ll dive deep into some code that illustrates the convergence of Streamlit, an open-source app framework for Machine Learning and Data Science projects, and Generative AI, powered by OpenAI’s models. This code snippet demonstrates how to create a Streamlit application that processes YouTube videos by downloading them, fetching their transcripts, and summarizing the content using OpenAI’s GPT models. I initially wrote this code to simplify a task I was doing in a regular basis, and to give myself an idea of whether a video would be worth watching or not. Since then, creating the app has been a learning experience helping me to be more acquainted not only with Python, but with using Streamlit to create simple web apps and interacting with OpenAI and GPT. Let’s break down the code, piece by piece, to understand its components and the power of Generative AI it showcases.

Import Statements

In my journey to build this application, I start by gathering my toolkit—think of it as grabbing your ingredients before whipping up a gourmet meal. The import statements are our recipe’s foundation, bringing together various Python libraries each with its unique flavor. First, we have os, our Swiss Army knife for navigating the filesystem, necessary for saving files where we need them. Then, there’s streamlit, the library transforms our script into a sleek, interactive web app with minimal effort. Next,openai connects us to OpenAI’s Large-language models, allowing our app to understand and summarize video content. With dotenv, we keep our secret ingredients safe, ensuring our API keys stay private. The pytube library is what we use for downloading YouTube videos, and finally, youtube_transcript_api and its companion, TextFormatter, help us fetch and clean video transcripts. Together, these imports set the stage for our app’s functionality in a neat, professional yet approachable manner.

import os
import streamlit as st
import openai
from dotenv import load_dotenv
from pytube import YouTube
from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api.formatters import TextFormatter

This section imports necessary libraries:

  • os for interacting with the operating system,
  • streamlit for building the web app interface,
  • openai to access OpenAI’s API,
  • dotenv for managing environment variables,
  • pytube to download videos from YouTube,
  • youtube_transcript_api and its TextFormatter for fetching and formatting video transcripts.

Environment Variables and OpenAI API Key

In this part of our code, we’re bringing dotenv into play, where we’re essentially whispering our secrets into a lockbox—our .env file. This is where we stash away our OpenAI API key. We need this key to communicate with OpenAI, but we hide it so others can’t use it. Just like a skilled magician doesn’t reveal their tricks, we use dotenv to keep our API key under wraps, away from prying eyes. Then, with os.getenv('OPENAI_API_KEY'), we fetch this secret key and entrust it to the variable openai.api_key, enabling our application to communicate with OpenAI’s servers. It’s pretty straight forward if you’ve done something like this before.

load_dotenv()
openai.api_key = os.getenv('OPENAI_API_KEY')

Really, to sum it up in simple terms, these lines load environment variables from a .env file and set the OpenAI API key, enabling secure API calls to OpenAI’s services without hardcoding sensitive information.

Streamlit Sidebar Setup

In this section, we’re setting up the sidebar for our application using Streamlit, a tool that allows developers to quickly turn data scripts into shareable web apps. Streamlit is all about simplicity and efficiency, making it easier for us to create interactive elements without getting bogged down in web development details. I’m not a web developer, so this is perfect for me.

The sidebar we’re building acts as a navigational panel for users, where I’m simply linking to my blog, GitHub, and social profiles. Really, I just wanted a little more on the page than a text entry box and a button.

To accomplish this we use commands like st.sidebar.title for the heading and st.sidebar.markdown for the links. It’s an easy way to organize links and provide easy access to other resources, enhancing the user experience. Through this setup, we’re leveraging Streamlit’s capability to create interactive and aesthetically pleasing UI components with minimal code, making our app a bit more engaging and professionally structured.

st.sidebar.title('My Links')
st.sidebar.markdown('[Blog](https://brandonjcarroll.com)')
...

Directories Creation for Videos and Transcripts

Next, our code ensures that directories for storing downloaded videos and their transcripts exist, preventing errors during the download process.

os.makedirs('videos', exist_ok=True)
os.makedirs('transcripts', exist_ok=True)

Now we move on to the functions that are doing most of the work.

The Functions that do the Work

The first function you see in the code is the check_and_download_transcript function.

check_and_download_transcript(video_id)

# Function to check and download the transcript
def check_and_download_transcript(video_id):
    try:
        # Fetch the transcript
        transcript = YouTubeTranscriptApi.get_transcript(video_id)

        # Format the transcript into plain text
        formatter = TextFormatter()
        transcript_text = formatter.format_transcript(transcript)

        # Save the transcript to a file
        with open(f'transcripts/{video_id}.txt', 'w', encoding='utf-8') as file:
            file.write(transcript_text)

        st.success("Transcript downloaded successfully.")
        return transcript_text
    except Exception as e:
        st.warning("No transcript available or an error occurred.")
        return None

As you can see in the code snippet above, this function, takes one argument: video_id, which is the unique identifier for a YouTube video. Its main job is to fetch, format, and save the video’s transcript to a file, providing feedback along the way.

Here’s a step-by-step breakdown of the logic:

  1. The Try-Except Block: The function is wrapped in a try-except block to handle any potential errors gracefully. If anything goes wrong during the process, instead of crashing, it will notify the user that either no transcript is available or some other error occurred.
  2. Fetching the Transcript: It uses the YouTubeTranscriptApi.get_transcript method to retrieve the transcript for the given video_id. This step assumes the video has a transcript available, which isn’t always the case. Interestingly enough It seemed to be about 50/50 that the videos had a transcript.
  3. Formatting the Transcript: Once the transcript is fetched, it’s format is not necessarily clean and readable. That’s where TextFormatter comes in. I used it to format the raw transcript data into plain text, making it more understandable and easier to work with.
  4. Saving the Transcript: The formatted transcript text is then saved to a file within a ‘transcripts’ directory. The filename is constructed using the video ID for easy identification later on. This is done using a context manager (with open(...) as file:) to ensure the file is properly opened and closed after writing, minimizing the chance of file corruption or other I/O errors. There’s probably a better way to do it, but this is what I knew how to do.
  5. Feedback to User: If everything goes smoothly, the function informs the user via Streamlit’s st.success method that the transcript was downloaded successfully. This feedback is important for a good user experience, letting them know the process worked as expected.
  6. Return Value: Finally, the function returns the formatted transcript text. In case of an error, it returns None. This return value can be useful if you want to further process the transcript text within your application.

In essence, this function encapsulates the whole process of dealing with YouTube video transcripts in a user-friendly way, abstracting away the complexities and potential pitfalls of API calls and file handling.

download_video(url)

The next function you see in the code is the, download_video function.

# Function to download the video
def download_video(url):
    try:
        yt = YouTube(url)
        video = yt.streams.filter(file_extension='mp4', progressive=True).order_by('resolution').desc().first()
        if video:
            st.info(f"Downloading video: {yt.title}")
            safe_filename = ''.join(char for char in yt.title if char.isalnum() or char in " -_").rstrip()
            video.download(output_path='videos', filename=f"{safe_filename}.mp4")
            st.success("Download complete!")
            download_path = os.path.join('videos', f"{safe_filename}.mp4")
            return download_path, yt.title  # Return the download path and video title
        else:
            st.error("No downloadable video found.")
            return None, None
    except Exception as e:
        st.error(f"An error occurred while downloading the video: {e}")
        return None, None

This function downloads a YouTube video. Why? Well, the idea was that if I couldn’t get a transcript, I could download the video and then just transcribe it myself. That’s why it’s structured to handle the process from extracting the video streams to saving the video file, while also providing user feedback via Streamlit’s interface.

Here’s the details of how it operates:

  1. Try-Except Block: Again I want to gracefully manage any errors that may arise during the video download process. If an error occurs, it notifies the user instead of letting the application crash.
  2. Extracting Video Information: I start by creating a YouTube object (from the pytube library) with the provided URL. This object allows access to various details and streams associated with the YouTube video.
  3. Selecting the Video Stream: I then filter the available video streams to only those with an ‘mp4’ file extension and are progressive (meaning the video and audio are combined in a single file). Among these, it chooses the one with the highest resolution by sorting the streams in descending order and picking the first one.
  4. Downloading the Video: If a suitable video stream is found, the function proceeds to download it. It first generates a “safe” filename by removing any characters from the video title that aren’t alphanumeric or are not one of ” -_”. This is to prevent issues with file systems that may not support certain characters in file names. The video is then downloaded to a specified ‘videos’ directory, and the user is informed about the download start and completion via Streamlit’s st.info and st.success methods, respectively.
  5. Feedback and Return Values: The function also provides immediate feedback to the user:
    • If the video is successfully downloaded, it displays a success message and returns the path to the downloaded video file along with the video’s title.
    • If no downloadable video stream is found, it displays an error message and returns None for both the download path and video title.
    • In case of any other exceptions during the download process, it shows an error with the exception message and also returns None for both values.
  6. Error Handling: The error handling ensures that any issues encountered during the download process are communicated back to the user, maintaining transparency about the operation’s success or failure.

summarize_transcript(transcript)

Next, the summarize_transcript function is designed to leverage the capabilities of OpenAI’s GPT models to summarize a given text, in this case, a transcript from a YouTube video. This process involves a few key steps and employs error handling similar to the previous functions (check_and_download_transcript and download_video).

Here’s how it works:

  1. Calling OpenAI’s API: At its core, the function interacts with the OpenAI API by sending a request to the openai.chat.completions.create endpoint. This request includes the transcript text along with a specific prompt that guides the AI on how to construct the summary. The choice of the “gpt-3.5-turbo” model is noteworthy for its efficiency and effectiveness in generating human-like text based on the provided instructions.
  2. Formatting the Prompt and Handling the Response: The prompt is carefully crafted to instruct the AI to provide a summary that’s concise and encourages the reader to check out the video, aiming for a balance between informativeness and engagement. Once the response is received, the function extracts the summary from the returned data structure, ensuring it’s neatly trimmed of any excess whitespace.
  3. Error Handling: Similar to the other functions, summarize_transcript includes a try-except block to gracefully handle any exceptions that might occur during the API call. This is is important here because you could have network issues or API limits could cause unexpected errors. If an error occurs, the function uses Streamlit’s st.error method to display an appropriate message, informing the user of the issue without causing the entire application to crash.
  4. Return Value: If successful, the function returns the generated summary, which can then be displayed to the user or used in further processing within the app. In case of an error, it returns None, allowing the calling code to handle the absence of a summary appropriately.

OK, with that all said, lets talk about the UI and the logic for processing.

Streamlit UI and Processing Logic

This section sets up the Streamlit user interface:

  • Displays a title which I think is pretty self explanatory in the code.
  • Provides an input field for the YouTube video URL which is also self explanatory in the code.
  • Includes a button that, when clicked, processes the video by first attempting to download its transcript and then, if available, summarizing it. If the transcript isn’t available, it attempts to download the video instead.
# Streamlit UI setup
st.title("YouTube Video and Transcript Processor")

# Input field for YouTube URL
video_url = st.text_input("Enter the YouTube video URL")

# Button to process the video
if st.button("Process Video"):
    if video_url:
        try:
            video_id = YouTube(video_url).video_id
            yt = YouTube(video_url)  # Get YouTube object to access the title
            transcript_text = check_and_download_transcript(video_id)

            if transcript_text:
                st.text_area("Transcript", transcript_text, height=300)
                summary = summarize_transcript(transcript_text)
                if summary:
                    # Display the video title, URL, and summary in formatted markdown
                    formatted_text = f"- [{yt.title}]({video_url}) - {summary}"
                    st.markdown(formatted_text, unsafe_allow_html=True)
                    
                    # Provide the same text in a text area for easy manual copying
                    copy_text = f"- {yt.title} ({video_url}) - {summary}"
                    st.text_area("Copy the text below:", copy_text, height=100)
            else:
                st.warning("No transcript is available for this video. Starting download...")
                download_path, video_title = download_video(video_url)  # Get video title from download function
                if download_path:
                    st.info(f"Video has been downloaded to: {download_path}")
                    # Optionally, you might want to display video info here as well
                else:
                    st.error("Failed to download the video.")
        except Exception as e:
            st.error(f"An error occurred: {e}")
    else:
        st.warning("Please enter a YouTube URL.")

This block of code effectively ties together the app’s functionalities, from interacting with YouTube for video and transcript data to leveraging AI for summarization, and finally, presenting the results in an interactive and user-friendly manner.

Overall I think the app demonstrates a practical application of Generative AI and just a bit of possibility.

Working with Streamlit has been a ton of fun. I hope this example helps give you an idea of what you can do with a bit pf python, streamlit, and an LLM.

Got ideas or want to share some feedback? Please do, I welcome the words!

Leave a Reply