This follow-up expands upon the foundational tutorial by diving into object-detection techniques. We walk through classical, machine-learning, and deep-learning approaches entirely in C++. Every block can be collapsed for focused study.
Object detection identifies and localises one or more classes of objects in an image— usually by returning bounding boxes (bb) and class scores. In OpenCV we can:
cv::CascadeClassifier
, SVM).cv::dnn
.Below, sections progress from simple background subtraction to state-of-the-art YOLO.
cv::createBackgroundSubtractorMOG2
/ KNN
cv::Ptr<cv::BackgroundSubtractor> backSub =
cv::createBackgroundSubtractorMOG2(); // or ...KNN()
cv::VideoCapture cap(0);
cv::Mat frame, mask, fg;
while(cap.read(frame)){
backSub->apply(frame, mask); // learn + segment
cv::erode(mask, mask, {}, { -1,-1 }, 1);
cv::dilate(mask, mask, {}, { -1,-1 }, 2);
frame.copyTo(fg, mask); // show foreground pixels
cv::imshow("FG Mask", mask);
cv::imshow("Objects", fg);
if(cv::waitKey(1)==27) break;
}
std::vector<std::vector<cv::Point>> contours;
cv::findContours(mask, contours, cv::RETR_EXTERNAL, cv::CHAIN_APPROX_SIMPLE);
for(const auto &c : contours){
if(cv::contourArea(c) < 500) continue; // ignore small blobs
cv::Rect bb = cv::boundingRect(c);
cv::rectangle(frame, bb, {0,255,0}, 2);
}
cv::CascadeClassifier
Methods
Below is a concise reference for the most-used public methods of
cv::CascadeClassifier
. Each entry lists the signature, purpose, and a small
code snippet to cement the idea. (All snippets assume you have already included
<opencv2/objdetect.hpp>
.)
bool load(const std::string &xmlPath)
Instantiates the cascade from a trained XML. Returns true
on success.
If you ship your model inside the bundle (iOS) or assets/
(Android),
extract its absolute path first.
cv::CascadeClassifier cascade;
if(!cascade.load("haarcascade_frontalface_default.xml")){
throw std::runtime_error("XML not found!");
}
bool empty() const
Quick guard to verify the classifier is ready:
if(cascade.empty()) { /* handle error */ }
detectMultiScale
— Core Detection CallSignature | Meaning (defaults) |
---|---|
void detectMultiScale(
InputArray img,
std::vector<Rect>& objects,
double scaleFactor = 1.1,
int minNeighbors = 3,
int flags = 0,
Size minSize = Size(),
Size maxSize = Size() )
|
|
Example — adjustable sensitivity slider
double sf = cv::getTrackbarPos("Scale", win)/100.0 + 1.05;
int neigh = cv::getTrackbarPos("Nbs", win);
cascade.detectMultiScale(gray, faces, sf, neigh);
void detectMultiScale(
InputArray img,
std::vector<Rect>& objects,
std::vector<int>& rejectLevels,
std::vector<double>& levelWeights,
double scaleFactor = 1.1,
int minNeighbors = 3,
int flags = 0,
Size minSize = Size(),
Size maxSize = Size(),
bool outputRejectLevels = true );
Handy when you need a score per box (e.g. to draw a coloured heatmap):
std::vector<Rect> boxes;
std::vector<int> lvl;
std::vector<double> w;
cascade.detectMultiScale(gray, boxes, lvl, w, 1.1, 3, 0, cv::Size(), cv::Size(), true);
for(size_t i=0;i<boxes.size();++i){
cv::Scalar c = w[i] > .7 ? cv::Scalar(0,255,0) : cv::Scalar(0,165,255);
cv::rectangle(frame, boxes[i], c, 2);
}
bool isOldFormatCascade() const
Tests if the loaded XML stems from the legacy Viola-Jones format (pre-OpenCV 2). Rarely needed—useful when migrating historical datasets.
bool read(const FileNode& node)
Deserialises from an .yml
/.json
node (when you embed the model
inside a larger OpenCV cv::FileStorage). Example:
cv::FileStorage fs("models.yml", cv::FileStorage::READ);
cascade.read(fs["face_cascade"]);
When you wish to ignore parts of the frame (e.g. UI overlays), supply a
custom MaskGenerator
.
struct BlackCorners : cv::BaseCascadeClassifier::MaskGenerator{
cv::Mat generateMask(const cv::Mat& src) override{
cv::Mat mask(src.size(), CV_8UC1, cv::Scalar(255));
cv::rectangle(mask, {0,0,src.cols,40}, cv::Scalar(0), cv::FILLED);
return mask;
}
};
cascade.setMaskGenerator(cv::makePtr<BlackCorners>());
During detectMultiScale
, windows fully inside black areas will be skipped,
reducing false detections in HUDs or top-bar overlays.
OpenCV’s cascade XMLs are trained with opencv_traincascade
. You feed it
thousands of positive patches + negatives. The tool performs
Ada-Boosted feature selection, producing a multi-stage classifier that the
above methods execute in real-time on CPU.
cv::HOGDescriptor hog;
hog.setSVMDetector(cv::HOGDescriptor::getDefaultPeopleDetector());
std::vector<cv::Rect> persons;
hog.detectMultiScale(frame, persons, 0, cv::Size(8,8),
cv::Size(32,32), 1.05, 2);
for(auto &bb : persons)
cv::rectangle(frame, bb, {0,0,255}, 2);
Param | Meaning |
---|---|
hitThreshold | Decision margin of SVM. Lower → more detections. |
winStride | Sliding-window step size. |
padding | Gaussian padding around borders. |
scale | Pyramid scaling between levels. |
groupThreshold | Neighbour merges (like minNeighbors ). |
cv::dnn
auto net = cv::dnn::readNetFromCaffe("deploy.prototxt",
"mobilenet_iter_73000.caffemodel");
net.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV); // or CUDA
net.setPreferableTarget(cv::dnn::DNN_TARGET_CPU); // or DNN_TARGET_CUDA
cv::Mat blob = cv::dnn::blobFromImage(frame, 1.0/127.5,
cv::Size(300,300), cv::Scalar(127.5,127.5,127.5), true, false);
net.setInput(blob);
cv::Mat out = net.forward(); // shape: [1,1,N,7]
float *data = (float*)out.data;
for(size_t i=0; i<out.total(); i+=7){
float conf = data[i+2];
if(conf < 0.5) continue;
int x1 = (int)(data[i+3]*frame.cols);
int y1 = (int)(data[i+4]*frame.rows);
int x2 = (int)(data[i+5]*frame.cols);
int y2 = (int)(data[i+6]*frame.rows);
cv::rectangle(frame, {x1,y1,x2-x1,y2-y1}, {0,255,0}, 2);
}
Convert YOLO model to ONNX → load with readNetFromONNX
.
Outputs require Non-Max Suppression:
std::vector<int> idx;
cv::dnn::NMSBoxes(boxes, scores, /*scoreThresh=*/0.25, /*nmsThresh=*/0.45, idx);
for(int i : idx) draw_box(boxes[i]);
labelImg
or Roboflow
for annotation.opencv_traincascade
CLI.DNN_BACKEND_CUDA
+ DNN_TARGET_CUDA_FP16
when a GPU is available.opencv2.framework
(or compile & drag-in).
// DetectorWrapper.mm (⚠️ .mm)
#include <opencv2/opencv.hpp>
extern "C" UIImage * detectObjects(UIImage *imgIOS){
cv::Mat frame;
UIImageToMat(imgIOS, frame); // from cv::imgcodecs
static cv::HOGDescriptor hog(cv::HOGDescriptor::getDefaultPeopleDetector());
std::vector<cv::Rect> people; hog.detectMultiScale(frame, people);
for(auto &bb: people) cv::rectangle(frame, bb, {0,255,0}, 2);
return MatToUIImage(frame); // helper in OpenCV
}
// DetectorWrapper.h (added to Bridging Header)
#import <UIKit/UIKit.h>
UIImage * _Nullable detectObjects(UIImage * _Nonnull imgIOS);
let processed = detectObjects(uiImage)
Notes 🛈 : keep UI work on the main thread and heavy OpenCV processing on a background
DispatchQueue
to avoid UI stutter.
# CMakeLists.txt (excerpt)
add_library( cvdetect SHARED Detector.cpp )
find_package( OpenCV REQUIRED )
target_link_libraries( cvdetect ${OpenCV_LIBS} )
// DetectorJNI.cpp
#include <jni.h>
#include <opencv2/opencv.hpp>
extern "C"
JNIEXPORT jintArray JNICALL
Java_com_example_ObjDet_detect(JNIEnv *env,jobject, jlong addr){
cv::Mat &frame = *(cv::Mat*)addr;
static cv::HOGDescriptor hog(cv::HOGDescriptor::getDefaultPeopleDetector());
std::vector<cv::Rect> boxes; hog.detectMultiScale(frame, boxes);
jintArray out = env->NewIntArray(boxes.size()*4);
jint *buf = env->GetIntArrayElements(out,nullptr);
for(size_t i=0;i<boxes.size();++i){
auto bb=boxes[i]; int j=i*4;
buf[j]=bb.x; buf[j+1]=bb.y; buf[j+2]=bb.width; buf[j+3]=bb.height;
}
env->ReleaseIntArrayElements(out,buf,0);
return out;
}
using System.Runtime.InteropServices;
public partial class Detector {
[DllImport("cvdetect", EntryPoint="Java_com_example_ObjDet_detect")]
private static extern IntPtr Detect(IntPtr matAddr);
public static Rect[] Run(SKBitmap bitmap){
// Convert SKBitmap → cv::Mat via OpenCV for Unity or AOT-friendly helpers
IntPtr matAddr = /* ... */;
IntPtr arr = Detect(matAddr);
// marshal jintArray → Rect[]
}
}
armeabi-v7a
, arm64-v8a
, & x86_64
→ Platforms/Android
.<uses-feature android:name="android.hardware.camera" />
in AndroidManifest.-O3 -s
flags for native code.With these wrappers, your cross-platform MAUI UI remains in C# while the heavy lifting executes inside an optimised C++ OpenCV library.