Wednesday, February 28, 2024

Driving Efficiency with LLMs: Transforming Complex Content into Digestible Summaries

Hey everyone,

I recently shared my latest side project with you all here . It's a simple web page where I've been experimenting with various tools, and my latest addition focuses on AI. Lately, I've been diving deep into the realms of AI, machine learning, and data science, exploring the tools and libraries that are making waves in the tech world. Like many, I'm captivated by the possibilities offered by transformers and LLMs, and I'm eager to integrate them into real-world applications to tackle previously unsolvable problems. We have new tools now, let's see what has changed.

Of course, mastering these technologies doesn't happen overnight. I'm dedicating time to different projects, using them as opportunities to learn, experiment, and refine my skills. In my last blog post , I discussed automating invoice entry, or perhaps "assisting" is a better term. The goal was to extract values from documents and streamline data entry, ultimately making life a little easier and saving time.

This time around, I'm delving into LLMs with a focus on summarization. I've developed and deployed a tool capable of summarizing content from YouTube videos, PDF files, or any web page. The concept is straightforward: extract the text and condense it down. To streamline the process, I'm leveraging 🦜️🔗 LangChain under the hood, which simplifies the task. Here's a glimpse into some of the tools and APIs I've utilized:

1 For YouTube transcript extraction, I've tapped into the youtube-transcript-api , allowing users to download transcripts if available. Transcript can be uploaded by the user or auto-generated, and it can be in multiple languages, and it can be translated to other langueages as well. All this can happen on the fly. Transcribing the auido/video yourself is also an option, there are a lot of models trained for this purpose paid like Oracle AI Speech and open source like Whisper from OpenAI, and many more...

2 Processing PDF documents has been an enjoyable challenge. I was already experimenting extensively with PyMuPDF and now PyPDFLoader . One thing I like about python world is there are hundreds of libraries out there, you can find a huge list of document loaders here .

3 Web page processing required leveraging AsyncHtmlLoader and Html2TextTransformer and beautiful soup to sift through the HTML and focus solely on the textual content.

4 Summarization duties are handled by LLMs (such as OpenAI and Cohere), albeit with a limited context window. If the text fits into LLM context it is great, if not then to work around this, I've implemented a technique called "MapReduce ," chunking the text into manageable pieces for summarization, then condensing those summaries further. This might not be required at all in the near future, competition is tough and everyday we witness increasing model parameters and context windows .

Image taken from geeksforgeeks.org

5 I've added a few user-friendly features, such as automatically fetching and displaying thumbnails for pasted links. Whether it's a YouTube video, PDF file, or web page, users will see a visual preview to enhance their experience. YouTube offers thumnails if you know the video_id, which part of of the URL. For PDF files I've created a thumbnail image of the first page. Web pages were a bit tricky, I used headless chromium to get screenshot of the page. This Dockerfile was extremely helpful to achieve this.

6 To keep users engaged while awaiting results, I've enabled streaming via web sockets using socket.io for both console logs and chain responses. Although it does come with a cost, it is highly motivating for user to keep waiting as witnessing things happen under the hood. And with ChatGPT, it kind of became the defacto.

7 And finally everything is nicely packed in a container and deployed on Oracle Cloud behind load balancers secured with TLS/SSL and CloudFlare etc. You can read about the setup here .

As always, I've shared all the code on my GitHub repository , along with the references that aided me at the bottom of this post. I'll also be putting together a quick demo to showcase these features in action. While these concepts may seem generic, practice truly does refine them. Implementing these tools has not only boosted my confidence but also honed my skills for future challanges.

Conclusion

I hope this project serves as inspiration for readers, sparking ideas for solving their own challenges. Perhaps it could fit into any requirement that needs simplifying the review and categorization of lengthy text. Documents like Market Research Reports, Legal Documents, Financial Reports, Training and Educational Materials, Meeting Minutes and Transcripts, Content Curation for your Social Media and even Competitive Intelligence Gathering...

All we need is a little creativity, and courage to tackle our old unsolvable problems with our new tools. Langchain itself is using this to improve documentation quality combined with clustering/classification logic.

If you have any ideas or questions, feel free to discuss them in the comments. Don't hesitate to reach out; together, we might just find the solution you're looking for.

Happy coding!



References:
1.Developing Apps with GPT-4 and ChatGPT Most of the foundation and ideas came from this book, highly recommended for beginners. It is available on O'Reilly platform
2.🦜️🔗 LangChain You will find almost all you need, getting started guide, sample codes for tools, agents, llms, loaders ...
3.PyMuPDF PDF Library
4.MapReduceDocumentsChain
5.Document Loaders
6.html2image for web page thumbnails
7.Socket.io Web Sockets
8.I, Robot for PDF testing
9.Streaming for LangChain Agents Video Tutorial by James Briggs
10.YouTube Transcript API

Monday, February 12, 2024

APEX with AI Services: Automating invoice entry with AI assisted key/value extraction

Since couple of years, almost everyone is talking about AI, trying to understand how this can help, both life and business. OpenAI ChatGPT made a huge impact in our lives, "You are a helpful assistant", thank you! I am using it daily. I have been also playing around with other language models like Llama 2 and actively learning ML from different resources. Hugging Face is a must join platform along with all the material on LangChain . Just these two got me as far as I could train and run my own model to classify my emails within a week, and the results were incredibly better than I could ever expect. Besides I am enjoying this a lot.

Nowadays the number of interested customers is increasing and this post is about a very basic customer use-case, a real one: invoice entry. I know, it doesn't sound interesting, at first I thought my technical consultancy for ERP days are over, but I promise this is not boring. It has new challanges for me (and for most customers) and demonstrates application of AI services to real life problems. So let's dive into it!

Requirements

"...investigating the possibilities of automating / optimizing the reading and processing of PDF documents with the help of Optical Character Recognition (OCR)..."
The moment I saw this I could imagine what they wanted. After verifying their ideal soution in our discovery meeting, we planned for a demo to proove it can work.
Here is a mock design, on the right side we display the PDF file, OCR'ed and all values extracted, and on the left side form is populated with the extracted values. Ideally operator will just click save, and will have a chance to fill in any missing information, huge time saving.

Challenges

Biggest challange is lack of skills, customer knows APEX inside out but I am not an APEX developer. I understand the APEX environment, components and how they work, installed and configured many times. Followed and demonstrated many workshops, but never developed something from scratch. Yet APEX is low code, there are many samples and I was able to complete this in 2 days.

Development

I will briefly mention the steps I've followed and highlight the important parts. Using cloud services makes it easy to start.
1 I start with creating an Autonomous database and APEX workspace. It takes minutes to start worrking on my APEX development.
2 Then I follow this LiveLab workshop as a starter application. I tweaked the table structure according to my needs, but it gave me the foundation I needed for interacting OCI Document Understanding service using object storage is a good decision.
3 Using Document Understanding service inside APEX was easy.
I have added API endpoint in application definition.

Then I made changes to saveDocument process to invoke the AI service after uploading file to Object Storage. The below code prepares the JSON body, invokes the service processorJobs, parses return message and updates the table. The execution flow and response time is not efficient for production but good enough for a POC. I am using only Key Value Extraction feature, but you can also use all features including generation of a searchable PDF file in your output bucket. The service is pretrained and capable of identifying common key/value pairs from an invoice document. My service call creates a json file, which can be located with the job_id. I've also loaded that json file into a blob, parsed and created views on top of it just to make my life a bit easier. Chris Saxon has an excellent cheat sheat for that purpose.


4 I created a new page, with 3 regions. Two side by side, left for showing/entering extracted values, right an iframe to display PDF file, last for invoice lines, as designed in the mock wireframe.
For displaying PDF inline on the right side of the page I followed instructions in this YouTube video . The only thing is I didn't have a link item on the same page, but the ID has to come as a page parameter. So I added the link on my home page where all uploaded files are listed, and passed the document_id as page parameter. Then created a new Page Load Dynamic Action to get ID and trigger PDFViewer Action to display the file.
After changing the theme to Redwood (with some modifications to make it dark) the application looks like this:

Conclusion

APEX and AI Services is a very powerful combination, that can help you boost the productivity. Please share your ideas about what you think and of course new use-cases in the comments, maybe we can build one together!


References:
1. LiveLabs: Use OCI Object Storage to Store Files in Oracle APEX Applications
2. OCI Developer Documentation: Document Understanding API
3. OCI Developer Documentation: Key Value Extraction (Invoices)
4. Oracle Blog: How to Store, Query, and Create JSON Documents in Oracle Database
5. Oracle Developer YouTube Channel: PDF Viewer in Oracle APEX
6. Banner Image Credit: Dall-e 3
7. YouTube Background Soundtrack: BenSound: Royalty Free Music for Videos
8. Postman Collections: OCI REST API Postman Collection

Saturday, February 3, 2024

From Mock API Server to Mandelbrot Sets over the Load Balancer, CloudFlare, Let's Encrypt and Beyond...

Hello everyone! It's been a while since I last shared my thoughts, but I've been deep in the trenches of a fascinating new stuff. You know how things are, it started with a simple need for a mock API server and led me through setting up load balancers, domain management, and fun with Mandelbrot sets. Let me take you through it. This can be used as a blueprint for deploying your web workload securely on modern cloud environments.

Setting the Stage with a Python Flask App

Normally a quick dirty python http.server is enough for testing accessibility but this one time we needed a bit more than that. It all began with a customer request that they needed to setup a mock API server. A Python Flask app seemed to be a suitable option for that time: I've never used it before. I must admit curiosity and growing pythonic love inside me made me do this choice. This initial step, though seemingly straightforward, laid the groundwork for the exciting challenges that followed. There are lots of tutorials to get you started, just start with this link.

Load Balancers, Domains, and the Birth of CodeHarmony.Net

I was just playing around with my new Flask app, discovering blueprints , jinja templates etc., you know usual things. Then another customer came with a fair request:

I want to use my load balancer with multiple backends accessible via different domains and subdomains.
Which is possible with Virtual Hostnames . Start with adding your hostnames (actual DNS A Records).
Then you need to decide if
  • you want to use one listener for each backendset (which implies each listener will be using a different port) e.g. codeharmony.net:443, files.codeharmony.net:444, etc... Straightforward if you are okay with different ports. Just select the hostname(s) during listener creation.
  • you want to use one listener for all backendsets (which implies you should be able to distinguish by the requested URI). This one a little tricky and requires proper planning. You need to combine hostnames and path route sets. Here is the official documentation
For testing this in my tenancy I needed a domain, well, mmm... Maybe not, I could just edit my host file but to be honest I wanted to also test deploying my Flask app and see it live instead of scraping it. This prompted the creation of CodeHarmony.Net . I followed the hype, and the name born out of collaboration with ChatGPT. With a smooth purchase from Namecheap , CodeHarmony became the hub for my tech experiments

Navigating the Cloud with Cloudflare and SSL/TLS Encryption

You know, these are the basic steps to ensure a secure and efficient web deployment.

  • I created a free Cloudflare account, with basic protection to proxy connections to my loadbalancer as I don't want them to be exposed to all internet directly.
  • Then I updated nameservers on my domain name provider (Namecheap) with the ones CloudFlare provided.
  • Then created domain and subdomains (A Records) using my Load Balancer Public IP
  • Then proxied the connection to load balancer. I implemented full strict SSL/TLS encryption, terminating at the load balancer (which means private subnet traffic is not encrypted)
  • On load balancer end, configured listeners for both HTTP and HTTPS. I mentioned a little about routing above.
  • To make sure all traffic is supported and encrypted I also created an HTTP listener and redirected all HTTP to HTTPS using URL Redirect Rules
  • Cloudflare brings it own TLS certificates which is very cool. For my loadbalancer HTTPS Listener I used a free Let's Encrypt certificate.
  • I restricted access to my loadbalancer to CloudFlare IP addresses (I am not sure if I whitelisted all of them though...). One cool thing I noticed while investigating loadbalancer access logs, CloudFlare sends end client IP address in forwardedForAddr field. Here is a simplified access log:

Flask Web App Deployed on Container Instances as Backend

For those who share my passion for coding adventures, all code is available on my Github repo . Dive in, explore, and feel free to contribute. Dockerfile is also included as it is packaged as a container. You will also find a build-and-deploy.sh file which is building the container image, pushing it to a private OCI registry then creating container instance. You need to prepare .env file which is not included in the repo, the file should contain following values: Once set you should be able to build and deploy your application. Just notice that I am using London region, if you use another remember to change values including OCI Registry.

It was super fun to design, code, build and deploy everything with the best practices applied. It felt good when put myself to test and find out that I can still code Full Stack with a basic Python/Flask backend with Bootstrap and JQuery for the frontend. I was a bit rusty in the begining but progressed very fast. Admittedly, React or Vue might have been the trendy choices, but the experience proved invaluable, reigniting my confidence in full-stack development.

You will find many scripts converted into tools in these pages: manipulating PDF files, playing around with images. Captivating world of Mandelbrot sets which is a beauty of drawing, painting, and exploring color palettes became my newfound fascination, leading to the discovery of hidden gems waiting to be unveiled.

Curious about the technical details? Check out the final topology:

Until next time, keep coding, stay curious, and embrace the ever-evolving tech landscape.

Happy coding! 🚀

Featured

Putting it altogether: How to deploy scalable and secure APEX on OCI

Oracle APEX is very popular, and it is one of the most common usecases that I see with my customers. Oracle Architecture Center offers a re...