AugAPI: Give AI Apps & Agents the Ability to Understand Technical Videos
We’re excited to announce our latest offering: a multi-modal video understanding API designed specifically for text and info-heavy videos such as tutorials with screen recordings, presentations, or complex walkthroughs for developers. Turn any such video into knowledge and have our AI write a document from what it learned, or even perform Q&A to get quick answers.
In a world where video content is exploding, understanding and extracting valuable insights from videos has become more critical than ever. Unlike existing solutions that focus primarily on object detection for real world videos or purely a transcript for tutorials, our API excels in combining information from on-screen text and spoken discussion of the content, This allows for a deeper, more comprehensive understanding of video content, making it ideal for educational materials, software demos, and presentations.
Capabilities
AugAPI offers several powerful features:
- Create Documents: Generate concise summaries or highlights from your videos, complete with timestamps and key frames, or create a fully written document or tutorial. This feature is perfect for quickly grasping the essence of lengthy videos or transforming into other formats for use elsewhere.
- Hook into Zapier: Use a Zapier “Zap” to automate processes with video. Generate a new blog post every time a video is added to a Google drive, keep an index of videos with short summaries up to date, and more, all without writing a line of code.
- (Coming soon) LLM-Powered Q&A: Upload a video and ask questions about the content. Whether it’s text from slides or spoken words, our system leverages both OCR and speech recognition to provide accurate answers.
- (Coming soon) Semantic Search: Find specific points in a video. By focusing on textual and spoken content, our API returns the most relevant moments with high precision. Users can configure the embeddings model used to fine-tune search results.
The best videos to use with our API are videos that contain both spoken context and on screen text, where Augmend’s multimodal capabilities can combine information from both sources.
Usage
Using our API from Python is just a couple lines of code with our SDK.
# Python
video_client = AugmendVideoClient(api_key=API_KEY, root_host=root_host)
wid = video_client.upload_video(video_file)
doc = video_client.get_document(wid, "synopsis")
Just add aug-sdk to your python project from https://github.com/AugmendTech/aug-sdk
pip install aug-sdk@git+https://github.com/AugmendTech/aug-sdk/
And create an API key from https://augmend.com/settings to pass into the SDK.
API_KEY="8wdFExampleKeyChangeMe3xbsQM7XqMq"
You can also use this API key with our demo project. Just set the API key as an environment variable and run the project at https://github.com/AugmendTech/aug-sdk-demo
git clone https://github.com/AugmendTech/aug-sdk-demo
cd aug-sdk-demo
pip install -r requirements.txt
API_KEY="8wdFExampleKeyChangeMe3xbsQM7XqMq"
cd src
python -m aug_sdk_demo myfile.mp4
No-code with Zapier
If you use Zapier workflows we offer a no code alternative as well.
Any video file can be uploaded to Augmend, and the easiest way to get videos into Zapier for testing is by watching a Google Drive or Dropbox folder for new files.
First you will need a Zapier account and after creating one, use the following link to add the Augmend Zap, which is currently in private beta:
https://zapier.com/developer/public-invite/207594/45e97af2537ee7d90d2a844005357d64/
When adding the Augmend step in Zapier, you’ll need to provide an API key, which can be created at https://augmend.com/settings, by following these instructions.
Augment currently only supports actions and has no triggers, so first setup a trigger that will provide a video file, then when adding an action search for “Augmend” in the app list.
Next configure the Augmend action, for the event choose “Upload Video”
Next create a new connection under the Account tab, this is where you will paste your API Key created by following instructions here: https://github.com/AugmendTech/augmend-dev-docs/wiki/Manage-API-Keys
Once authentication is configured, you can set up the file upload. The “Video Name” field will be used as the initial name for the video on Augmend, but a new title will be generated once the video has been fully processed.
When the video is done processing, a callback will occur back to Zapier and your Zap will continue processing. The output of the Augmend step will include a generated title, chapter list, synopsis, and full markdown summaries of the video. You can use these fields to consume the video analysis in different ways, such as by using the title and markdown to generate a blog post.
We’re excited about the potential of our video understanding API and look forward to seeing how it can be used to create new Agents and AI powered applications. Whether you're a developer looking to integrate advanced video understanding into your agent or an organization seeking to unlock the value hidden in your video content, our API is here to help.
For more details, checkout the docs: Video Upload API Documentation · AugmendTech/augmend-dev-docs Wiki (github.com)