Mastering 3D Computer Vision & Point Cloud Processing-Mod 1-My Computer Vision Journey
Introduction:
Welcome to the ✍️”3D Computer Vision & Point Cloud Processing Blog Series”, your gateway to exploring the fascinating world of three-dimensional data. In this post, I’ll share my journey in computer vision, from how I started learning to what I’m doing now.
Topics to be discussed in this blog:
- During my College Days
- While Pursuing My PhD
- In my Corporate Career
- As a Reviewer
- To My Clients
- I am Today
- Next Blog Preview
During my College Days,
I am an electrical engineer learning about power systems, electrical machines, renewable sources, and more. One day, while sitting in my college library, I overheard two individuals discussing image processing algorithms and analyzing their results. Intrigued, I wanted to learn more and I began studying “Digital Image Processing” by Anil K. Jain. Though I initially found it challenging to grasp the concepts, I was determined to learn. I enrolled in a signal and image processing course and began with the fundamentals. During the first exam, I only scored 1.5 out of 17.5 marks and was even advised to drop the course. However, I was persistent in my pursuit of learning image processing. With the support of my classmates and seniors, I gradually improved my understanding. In later exams, I achieved good scores, and ultimately topped the class.
For my first project in image processing and computer vision, I chose to implement image compression techniques. I successfully implemented them in MATLAB and on Analog Devices ADSP-216xx series Processors. Witnessing the results of my first algorithm on the ADSP boards was an incredibly rewarding experience. Since then, I have actively pursued learning and working in the fields of image processing, computer vision, pattern recognition, and related areas.
While Pursuing My PhD,
During my PhD, my research focused on multi-scale and directional transforms and their applications to image/video processing and computer vision. Many image denoising algorithms struggle to effectively remove noise under heavy noise conditions without over smoothing the image. However, I developed a content-dependent noise removal algorithm (like how PCA adapts to the content) utilizing multiscale transform principles. This algorithm demonstrated superior performance in removing noise while preserving edges, even under heavy noise conditions. I also enjoyed developing an image directional and frames-features’ content-dependent denoising algorithm that outperformed many existing video denoising techniques. Sample results are shown below.
During the features extraction and its perseverance in denoising applications, I encountered face detection based feature extraction algorithms. Face detection presents a significant challenge, especially when faces are tilted beyond 45 degrees, poorly illuminated, and have occluded features like eyes and nose. Common feature detection techniques like HOG, SiFT, SURF, LBP, combined with classification techniques like SVM, Haar, etc., often achieve limited accuracy, even on well-controlled face databases. Imagine the results on uncontrolled images! Determined to address this challenge, I developed an algorithm in multiscale and directional space that achieved significantly better results in scenarios with poor illumination and up to 50% feature occlusion.
Once a robust face detection algorithm is established, extending it to face recognition in images and videos is a natural next step. While images present less difficulty, videos pose additional challenges. I was working on developing a face recognition algorithm based on multi-directional feature-dependent space, and the initial results were promising.
To tackle improper illumination, I proposed an Image-Dependent Brightness Preserving Histogram Equalization (HE) method. This approach showed promising results, particularly with evenly illuminated images, adapting to the image content and the extent of uneven illumination.
Meanwhile, Following my work on various feature extraction algorithms for face detection and recognition, I delved deeper into various content-based image retrieval (CBIR) techniques where features are key to fetch the relevant data. So, I transitioned to exploring techniques in CBIR. In CBIR, a query image retrieves similar images from a database, similar to Google Image Search. My CBIR methods achieved excellent results, demonstrating the effectiveness of my approach.
While developing feature extraction techniques for Content-Based Image Retrieval (CBIR), I encountered challenges mainly arising from the balanced content of the background and foreground. To address this, I started understanding different foreground/background separation techniques and eventually delved into image matting and foreground/background separation. By utilizing feature extraction in projected space, I was able to extract the foreground without requiring prior information like the matte region. While the results were positive, perfect separation was not that great.
During the exploration of image matting techniques, I became interested in image inpainting. I developed an inpainting method that yielded good results, although it did not quite reach the state-of-the-art performance.
Receiving genuine blind review comments from more than 28 professors, reviewers, and veterans in the field working for top journals/transactions like IEEE, Elsevier, Springer, Eurasip, IET, IEEE Conferences, etc., on my work in progress was an incredible experience. Their valuable feedback has significantly enhanced the quality of my research. Seeing my published manuscripts was another truly rewarding moment. I extend my sincere thanks to all the reviewers from IEEE, Elsevier, Springer, Eurasip, IET, etc.
In my Corporate Career,
During my corporate career, I worked in a cutting-edge driverless project focused on vehicle and lane detection, specifically under challenging conditions like daytime, nighttime, and near Zebra crossings. Collaborating with a talented 5-person team, we achieved remarkable results, completing the project in 50% less time than the target timeframe.
Our success was driven by two key factors: innovation & exceptional teamwork.
We developed highly accurate vehicle, road, and zebra crossing detection algorithms, as evidenced by the excellent performance on images captured from our in-car camera. While some minor mis-detections and non-detections occurred, our collaborative efforts ensured continuous improvement and optimization for speed and quality.
This dedication to speed and quality led to the initiation of patent drafting in the 4th/5th month, a testament to the project’s significant progress.
It’s important to note that unlike current practices with dedicated annotation teams, every engineer had to perform annotations as part of their algorithm development and optimization process.
Next to Advanced Driver Assistance Systems (ADAS) project, I had the opportunity to propose a multi-class classification algorithm for food detection, competing for the project alongside nominations from various companies. I had to submit my proposal, requirements, algorithm pipeline, and metrics within 5 days. I proposed three distinct approaches — one based on RGB images, one based on infrared images, and another combining both approaches. In the final acceptance, two of my pipelines were selected for further business-level discussions.
In the video surveillance project, various algorithms and techniques were implemented for modules such as theft detection, person tracking, and activity monitoring, yielding good results with some limitations in corner cases. The most demanding and fascinating tasks included tracking individuals who disappear from the view of one camera and reappear in another. Additionally, detecting theft, especially in scenarios with occlusions, presented significant challenges.
In query-based automatic video capturing, a set of query images is given to the system, which then automatically locates and captures the object. For example, if a bride and groom’s photo is given as a query, the system will capture them throughout the marriage ceremony, even if they are in the field of view with a huge crowd.
As an image quality optimization engineer, my primary responsibility was to enhance the quality of ultrasound images, ensuring accurate detection of organs like the thyroid, kidney, and heart across different body types such as thin, fat, and medium. Additionally, I used to develop computer vision algorithms for ongoing and upcoming projects. I succeeded in creating a feature measurement algorithm that was 400 times faster (yes it is correct) than the existing algorithm while maintaining the same accuracy, leading to an invention disclosure. During my work on image quality optimization, I realized the potential for automating the optimization process. Consequently, I developed a content-dependent image optimization algorithm that achieved results comparable to manual optimization, which also led to an invention disclosure.
Routine tasks included checking the quality of ultrasound probe sensors and calibrating them. Manually performing these tasks became tedious, So I devised a simple image processing algorithm to automate the process. This not only saved time but also helped predict future sensor quality, streamlining the workflow and enhancing efficiency.
While working on mammography projects, I found the calibration of sensors and the development of algorithms adhering to medical standards such as HIPAA deeply engaging. Even more captivating was the work on tomography, which involved 3D imaging. Unlike the circular systems of CT or MRI, tomography systems operate on a semi-circular path that combines both rotation and translation, necessitating a unique approach to algorithms and making the process of setting up the 3D reconstructed models fascinating. The aspect of utilizing machine learning algorithms for distinguishing between malignant, benign, and healthy tissue added an extra layer of intrigue to the project. Furthermore, architecting the process of obtaining data from sensors and performing 3D reconstruction, followed by extensive image preprocessing and displaying the results in the format radiologists prefer on multiple monitor systems, was challenging and interesting. Moreover, we received a lot of appreciation for the GUI design, which made it very easy for radiologists to view mammography and tomography images on multiple monitors, with numerous automatic options available.
In Cone Beam Computed Tomography (CBCT) project, developing a solution to generate 3D models with the minimum number of projections was both challenging and fascinating.
While working on electronic medical documents, it was fascinating to process ECG, EEG, CT reports, lab reports, and handwritten medical prescriptions from doctors. The real challenge arose with handwritten medical prescriptions, where achieving zero errors was crucial. Several OCR-based techniques, CV techniques and machine learning algorithms were developed and implemented to minimize errors, although achieving absolute perfection was not possible at that time. Nonetheless, the journey to approach zero errors was both challenging and stimulating, driving innovation and creativity.
Levizi: Making medical emergencies less stressful, one call at a time. Levizi Healthcare Tech provided emergency assistance by connecting individuals, doctors, ambulances, hospitals, and family members.
During my tenure at Levizi Healthcare Tech, our focus was on providing immediate assistance to individuals in emergencies.
- When a person called for emergency help, our platform connected them with our doctors who could provide advice based on their medical history and the medications given to them earlier based on their health assessment.
- If the situation was serious, our doctors would recommend that the person be taken to the appropriate hospital. Our nearest network of ambulances would then be dispatched to transport the patient to the recommended hospital.
- While the patient was en route to the hospital, their health records and history would be shared with the emergency doctors of the hospital where the patient is being shifted.
- Simultaneously, information about the situation and the hospital to which the patient was being taken would be relayed to the patient’s caregivers or family members.
- Once the patient arrived at the hospital, updates and information would continue to be shared with the caregivers until someone could physically be present at the hospital.
While working on a class-based object detection project for edge devices, I focused on developing algorithms that were both fast and efficient, considering the hardware limitations of edge devices. Various techniques were implemented to achieve the desired accuracy levels. It was rewarding to see the final product running smoothly on the code we had developed.
I automated many manual tasks in cloud through code optimization and pipeline redefinition, significantly improving speed. By rewriting the pipeline and optimizing the code, I increased the speed by 590 times (you heard it correct!). Additionally, I automated data augmentation, train and test data selection, simplifying the flow in AWS which eliminates manual errors, reduces manpower and optimize the network model.
I had the opportunity to minimize the model complexity while preserving accuracy, reducing the number of trainable parameters from x million to y million. Through in-depth analysis of my code based on neural network basics backpropagation and with the support of existing explainable AI (xAI) techniques, I could exceed expectations, which was both challenging and rewarding.
Furthermore, I optimized the model architecture for subclass classification, for example optimize the model for coconut tree detection under the tree class. Conventional techniques were ineffective, so I took a slightly different, simple, and innovative approach, which exceeded the expected detection accuracy. It was rewarding to tackle such challenging tasks and achieve successful outcomes.
While working on 3D human modeling with a minimal number of sensors and captures, the challenge was to achieve real-time processing with high accuracy. With the help of a talented team, we were able to accomplish this goal.
In the realm of 3D object and human detection, I utilized various state-of-the-art deep learning techniques and optimized the models based on project definitions and metrics.
Working on 3D modeling, 3D reconstruction, 3D image classification, 3D object detection, 3D segmentation, SLAM, MVS, Pointnet++, NeRF, DNeRF.. has been both challenging and intriguing. In the 3D domain, data comes from various setups such as stereo, RGBD cameras, sets of RGB images, RGB video streams, LiDAR, depth-only, and depth with RGB. result, defining pipelines and combining them to meet specified accuracy and metrics has enhanced our skills and deepened our understanding of learning platforms. Pipelines could involve conventional 3D modeling, the latest deep learning techniques, or a combination of both. Different projects required different pipelines, and while we were able to achieve good results, there were instances where the results were not as optimal.
The latest deep learning techniques have significantly reduced development time while improving accuracy in many cases compared to conventional techniques.
As a Reviewer,
I am pleased to serve as a reviewer for the following journals, where I contribute to sharing my knowledge and staying updated with the latest research in the field of computer vision:
- IEEE Transactions on Image Processing
- IEEE Transactions on Consumer Electronics
- Elsevier Signal Processing
- Springer Signal, Image, Video Processing
- IET Image Processing
- IET Biometrics
- Wiley International Journal for Numerical Methods in Biomedical
- Engineering
- SPIE Optical Engineering
- EURASIP Journal on Image & Video Processing
- EURASIP Advances in Signal Processing
- ELCVIA
To My Clients,
I empower clients to build impactful computer vision applications by,
- Guiding architecture design: I collaborate with individuals, teams and clients to develop a comprehensive roadmap for their CV applications.
- Optimizing data strategies: I assist in selecting and capturing the most suitable datasets and guide through necessary processing for optimal results.
- Recommending tools and techniques: I assist in selecting the most appropriate tools and techniques based on specific project requirements and constraints.
- Selecting efficient algorithms: I help clients choose the most preferred and efficient algorithms for their desired tasks.
- Defining evaluation metrics: I help define relevant metrics to thoroughly assess the performance and effectiveness of the developed application.
- Facilitating deployment: I assist in deploying the applications seamlessly, whether it be on the web, mobile app, or at the edge.
Selecting the right data, tools, algorithms, metrics, and deployment strategies can save time, resources, money and go-to-market. I help clients make these crucial decisions for successful projects.
I am Today,
I am enriching my 3D Computer vision lab with capturing, teaching, mentoring, doing projects in ML/DL/AI/2D/3D Computer vision related areas.
Currently, I’m deeply immersed in AI, Generative AI, 3D Computer vision and 3D deep learning, actively exploring their diverse aspects and applications through ongoing projects.
📝Next Blog Preview:
In the next post, 🚀“Mastering 3D Computer Vision & Point Cloud Processing- Module 2–Introduction,” we’ll embark on a journey of discovery, uncovering the transformative power of 3D point clouds and how they’re revolutionizing industries worldwide.
Topics to be discussed in the Next blog:
- Why series of blogs in 3D computer vision and Point cloud Processing?
- The Importance of 3D Point Clouds
- Applications of Point Cloud Data
- What This Series Offers and Who is this series for?
- Structure of the Blog Series
- Your Learning Journey
- Hands-On Learning
- Your Journey Begins
- Next Blog Preview