mirror of https://github.com/davisking/dlib.git
updated docs and specs
This commit is contained in:
parent
2bee86842a
commit
e19f5d65fe
|
@ -48,6 +48,7 @@ namespace dlib
|
||||||
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing
|
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing
|
||||||
Natural Scene Categories by Svetlana Lazebnik, Cordelia Schmid,
|
Natural Scene Categories by Svetlana Lazebnik, Cordelia Schmid,
|
||||||
and Jean Ponce
|
and Jean Ponce
|
||||||
|
It also includes the ability to represent movable part models.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@ -88,8 +89,11 @@ namespace dlib
|
||||||
score of the classifier. Note further that each of the movable feature extraction
|
score of the classifier. Note further that each of the movable feature extraction
|
||||||
zones must pass a threshold test for it to be included. That is, if the score that a
|
zones must pass a threshold test for it to be included. That is, if the score that a
|
||||||
movable zone would contribute to the overall score for a sliding window location is not
|
movable zone would contribute to the overall score for a sliding window location is not
|
||||||
positive then that zone is not included in the feature vector (i.e. its part of the
|
positive then that zone is not included in the feature vector (i.e. its part of the
|
||||||
feature vector is set to zero. This way the length of the feature vector stays constant).
|
feature vector is set to zero. This way the length of the feature vector stays
|
||||||
|
constant). This movable region construction allows us to represent objects with parts
|
||||||
|
that move around relative to the object box. For example, a human has hands but they
|
||||||
|
aren't always in the same place relative to a person's bounding box.
|
||||||
|
|
||||||
THREAD SAFETY
|
THREAD SAFETY
|
||||||
Concurrent access to an instance of this object is not safe and should be protected
|
Concurrent access to an instance of this object is not safe and should be protected
|
||||||
|
|
|
@ -1574,6 +1574,7 @@
|
||||||
Natural Scene Categories by Svetlana Lazebnik, Cordelia Schmid,
|
Natural Scene Categories by Svetlana Lazebnik, Cordelia Schmid,
|
||||||
and Jean Ponce
|
and Jean Ponce
|
||||||
</blockquote>
|
</blockquote>
|
||||||
|
It also includes the ability to represent movable part models.
|
||||||
|
|
||||||
<br/><br/>
|
<br/><br/>
|
||||||
The following feature extractors can be used with the scan_image_pyramid object:
|
The following feature extractors can be used with the scan_image_pyramid object:
|
||||||
|
|
|
@ -121,22 +121,22 @@ int main()
|
||||||
problem you are trying to solve.
|
problem you are trying to solve.
|
||||||
|
|
||||||
2. A detection template. This is a rectangle which defines the shape of a
|
2. A detection template. This is a rectangle which defines the shape of a
|
||||||
sliding window (the object_box), as well as a set of rectangles which
|
sliding window (i.e. the object_box), as well as a set of rectangular feature
|
||||||
envelop it. This set of enveloping rectangles defines the spatial
|
extraction regions inside it. This set of regions defines the spatial
|
||||||
structure of the overall feature extraction within a sliding window.
|
structure of the overall feature extraction within a sliding window. In
|
||||||
In particular, each location of a sliding window has a feature vector
|
particular, each location of a sliding window has a feature vector
|
||||||
associated with it. This feature vector is defined as follows:
|
associated with it. This feature vector is defined as follows:
|
||||||
- Let N denote the number of enveloping rectangles.
|
- Let N denote the number of feature extraction zones.
|
||||||
- Let M denote the dimensionality of the vectors output by feature_extractor_type
|
- Let M denote the dimensionality of the vectors output by Feature_extractor_type
|
||||||
objects.
|
objects.
|
||||||
- Let F(i) == the M dimensional vector which is the sum of all vectors
|
- Let F(i) == the M dimensional vector which is the sum of all vectors
|
||||||
given by our feature_extractor_type object inside the ith enveloping
|
given by our Feature_extractor_type object inside the ith feature extraction
|
||||||
rectangle.
|
zone.
|
||||||
- Then the feature vector for a sliding window is an M*N dimensional vector
|
- Then the feature vector for a sliding window is an M*N dimensional vector
|
||||||
[F(1) F(2) F(3) ... F(N)] (i.e. it is a concatenation of the N vectors).
|
[F(1) F(2) F(3) ... F(N)] (i.e. it is a concatenation of the N vectors).
|
||||||
This feature vector can be thought of as a collection of N "bags of features",
|
This feature vector can be thought of as a collection of N "bags of features",
|
||||||
each bag coming from a spatial location determined by one of the enveloping
|
each bag coming from a spatial location determined by one of the rectangular
|
||||||
rectangles.
|
feature extraction zones.
|
||||||
|
|
||||||
3. A weight vector and a threshold value. The dot product between the weight
|
3. A weight vector and a threshold value. The dot product between the weight
|
||||||
vector and the feature vector for a sliding window location gives the score
|
vector and the feature vector for a sliding window location gives the score
|
||||||
|
@ -145,11 +145,27 @@ int main()
|
||||||
parameters yourself. They are automatically populated by the
|
parameters yourself. They are automatically populated by the
|
||||||
structural_object_detection_trainer.
|
structural_object_detection_trainer.
|
||||||
|
|
||||||
Finally, the sliding window classifiers described above are applied to every level
|
The sliding window classifiers described above are applied to every level of an image
|
||||||
of an image pyramid. So you need to tell scan_image_pyramid what kind of pyramid
|
pyramid. So you need to tell scan_image_pyramid what kind of pyramid you want to
|
||||||
you want to use. In this case we are using pyramid_down which downsamples each
|
use. In this case we are using pyramid_down which downsamples each pyramid layer by
|
||||||
pyramid layer by half (dlib also contains other version of pyramid_down which result
|
half (dlib also contains other version of pyramid_down which result in finer grained
|
||||||
in finer grained pyramids).
|
pyramids).
|
||||||
|
|
||||||
|
Finally, some of the feature extraction zones are allowed to move freely within the
|
||||||
|
object box. This means that when we are sliding the classifier over an image, some
|
||||||
|
feature extraction zones are stationary (i.e. always in the same place relative to
|
||||||
|
the object box) while others are allowed to move anywhere within the object box. In
|
||||||
|
particular, the movable regions are placed at the locations that maximize the score
|
||||||
|
of the classifier. Note further that each of the movable feature extraction zones
|
||||||
|
must pass a threshold test for it to be included. That is, if the score that a
|
||||||
|
movable zone would contribute to the overall score for a sliding window location is
|
||||||
|
not positive then that zone is not included in the feature vector (i.e. its part of
|
||||||
|
the feature vector is set to zero. This way the length of the feature vector stays
|
||||||
|
constant). This movable region construction allows us to represent objects with
|
||||||
|
parts that move around relative to the object box. For example, a human has hands
|
||||||
|
but they aren't always in the same place relative to a person's bounding box.
|
||||||
|
However, to keep this example program simple, we will only be using stationary
|
||||||
|
feature extraction regions.
|
||||||
*/
|
*/
|
||||||
typedef hashed_feature_image<hog_image<3,3,1,4,hog_signed_gradient,hog_full_interpolation> > feature_extractor_type;
|
typedef hashed_feature_image<hog_image<3,3,1,4,hog_signed_gradient,hog_full_interpolation> > feature_extractor_type;
|
||||||
typedef scan_image_pyramid<pyramid_down, feature_extractor_type> image_scanner_type;
|
typedef scan_image_pyramid<pyramid_down, feature_extractor_type> image_scanner_type;
|
||||||
|
@ -167,8 +183,8 @@ int main()
|
||||||
// we only added square detection templates then it would be impossible to detect this non-square
|
// we only added square detection templates then it would be impossible to detect this non-square
|
||||||
// rectangle. The setup_grid_detection_templates_verbose() routine will take care of this for us by
|
// rectangle. The setup_grid_detection_templates_verbose() routine will take care of this for us by
|
||||||
// looking at the contents of object_locations and automatically picking an appropriate set. Also,
|
// looking at the contents of object_locations and automatically picking an appropriate set. Also,
|
||||||
// the final arguments indicate that we want our detection templates to have 4 enveloping rectangles
|
// the final arguments indicate that we want our detection templates to have 4 feature extraction
|
||||||
// laid out in a 2x2 regular grid inside each sliding window.
|
// regions laid out in a 2x2 regular grid inside each sliding window.
|
||||||
setup_grid_detection_templates_verbose(scanner, object_locations, 2, 2);
|
setup_grid_detection_templates_verbose(scanner, object_locations, 2, 2);
|
||||||
|
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue