2D Bounding box around object for different camera's (different positions) - Blender 2.9

Question

I am a beginner in Blender and Blender scripting. Currently I am trying to create a script that automatically renders images of a single object from different camera viewpoints. I use the following setup:

I have a text file with the different camera poses (position and orientation) w.r.t the world frame (object).
I create the object (Monkey) using the python API

I created functions such as add_camera(), add_light, render_settings, which create these 'objects' with the desired settings. The code for the add_camera function is shown below.

def add_camera(name, focal_length, x,y,z,q0,q1,q2,q3):
  cam_data = bpy.data.cameras.new(name)
  cam = bpy.data.objects.new(name=name,object_data=cam_data)
  bpy.context.collection.objects.link(cam)
cam.data.lens = focal_length
  cam.data.type = 'PERSP'
cam.rotation_mode = 'QUATERNION'
  cam.location = mathutils.Vector((x,y,z))
  cam.rotation_quaternion = mathutils.Quaternion((q0,q1,q2,q3))
  bpy.context.scene.camera = cam

I then use a for loop to loop over the different camera poses extracted from the textfile, creating a new camera for each pose and setting that one to be active. This works and creates the expected rendered images from the different camera views.

for pose in range(len(verts)):
    scene = bpy.context.scene
    cam = add_camera('Camera',50,verts[pose][0],verts[pose][1],verts[pose][2],verts[pose][3],verts[pose][4],verts[pose][5],verts[pose][6])
    light = add_light('Light',verts[pose][0],verts[pose][1],verts[pose][2], 500, 'POINT')
    file = os.path.join('D:/Images/', 'image_' + str(pose))
    bpy.context.scene.render.filepath = file
    bpy.ops.render.render(write_still=True)

Furthermore, I want to generate and export the 2D bounding box around that object. For the bounding box generation I am using this script which works perfectly for a single scene/instance (using e.g. last created camera). However, when I try to incorporate this in the loop, it fails and outputs different coordinates compared to simply running the aforementioned bounding box script on a single scene (camera, object and light). It often outputs the entire image size as the bounding box.

I adapted the for loop as follows:

mesh_object = bpy.data.objects['Monkey']
for pose in range(len(verts)):
    scene = bpy.context.scene
    cam = add_camera('Camera',50,verts[pose][0],verts[pose][1],verts[pose][2],verts[pose][3],verts[pose][4],verts[pose][5],verts[pose][6])
    cam = bpy.context.scene.camera
    light = add_light('Light',verts[pose][0],verts[pose][1],verts[pose][2], 500, 'POINT')
    file = os.path.join('D:/Images/', 'image_' + str(pose))
    print(camera_view_bounds_2d(scene, cam, mesh_object))
    bpy.context.scene.render.filepath = file
    bpy.ops.render.render(write_still=True)

I have no idea what is going wrong, so any help would be greatly appreciated! Furthermore, the end-goal would be to automatically write the bounding-boxes for each rendered image to a separate text file as well. The idea was to use the implementation shown here:

  def write_bounds_2d(filepath, scene, cam_ob, mesh_object):
    with open(filepath, "w") as file:
    file.write("%i %i %i %i\n" % camera_view_bounds_2d(scene, cam_ob, mesh_object))

Thanks!

EDIT: The BoundingBox script I used, followed by a script that loops over the poses. Furthermore, the datafile of the different poses is also included.

import bpy
import os
def clamp(x, minimum, maximum):
   return max(minimum, min(x, maximum))
def camera_view_bounds_2d(scene, cam_object, mesh_object):
"""
Returns camera space bounding box of mesh object.
Negative 'z' value means the point is behind the camera.
Takes shift-x/y, lens angle and sensor size into account
as well as perspective/ortho projections.
:arg scene: Scene to use for frame size.
:type scene: :class:bpy.types.Scene
:arg obj: Camera object.
:type obj: :class:bpy.types.Object
:arg me: Untransformed Mesh.
:type me: :class:bpy.types.Mesh´ :return: a Box object (call its to_tuple() method to get x, y, width and height) :rtype: :class:Box`
"""
"""
Gets the camera frame bounding box, which by default is returned without any transformations applied.
Create a new mesh object based on mesh_object and undo any transformations so that it is in the same space as the camera frame. Find the min/max vertex coordinates of the mesh visible in the frame, or None if the mesh is not in view.
"""
matrix = cam_object.matrix_world.normalized().inverted()
depsgraph = bpy.context.evaluated_depsgraph_get() #Looks like this is only if modifiers and animations have been applied, it then updates this, based on the frame it is at. The scene will be updated based on the dependency graph 
mesh_eval = mesh_object.evaluated_get(depsgraph)
A new mesh data block is created, using the inverse transform matrix to undo any transformations
mesh = mesh_eval.to_mesh() #Simply keep it like this, you do not need to specify arguments
mesh.transform(mesh_object.matrix_world)
mesh.transform(matrix)
camera = cam_object.data
Get the world coordinates for the camera frame bounding box, before any transformations
frame = [-v for v in camera.view_frame(scene=scene)[:3]] #What this does, is that it has the four corners of the camera box in world coordinates before object transformation. However, for this purpose we only need 3, top left, top right and bottom left.
#For perspective camera you need to do a transformation
camera_persp = camera.type != 'ORTHO'
lx = []
ly = []
for v in mesh.vertices: #so you loop over all the vertices of the object (for cube 8)
    co_local = v.co #Extract locations of the object in the object reference frame
    z = -co_local.z #Object in front of the camera has a negative z axis officially, so you now make it positive, to follow the convention that negative z means behind the camera.
if camera_persp:
    if z == 0.0:
        lx.append(0.5)
        ly.append(0.5)
    if z &lt;= 0.0:
        &quot;&quot;&quot; Vertex is behind the camera; ignore it. &quot;&quot;&quot;
        continue
    else:
        # Perspective division - I think this makes it into homogeneous coordinates (divide by homogeneous w component)
        frame = [(v / (v.z / z)) for v in frame] #v.z is the camera frame coordinates world, so you loop through frame (line 46) and then 
#I think this decides on the size of the camera frame in world coordinates
min_x, max_x = frame[1].x, frame[2].x
min_y, max_y = frame[0].y, frame[1].y

# max_x - min_x is the width, so you find the vertex location, minus the min location (the size of image), therefore x always positive in image reference frame.

# max_y - min_y is the height of the camera frame (image) and then with origin of system in top left corner you will always have a positive y coordinate if in the image. This process is done for each vertex. This is normalized (dividing by width or height)
x = (co_local.x - min_x) / (max_x - min_x)
y = (co_local.y - min_y) / (max_y - min_y)

#So you append the x,y location of the vertex in the image reference frame
lx.append(x)
ly.append(y)


mesh_eval.to_mesh_clear()
""" Image is not in view if all the mesh verts were ignored """
if not lx or not ly:
    return None
#What this does is you check all the vertices and then you calculate the image frame coordinate corresponding to it. 
min_x = clamp(min(lx), 0.0, 1.0)
max_x = clamp(max(lx), 0.0, 1.0)
min_y = clamp(min(ly), 0.0, 1.0)
max_y = clamp(max(ly), 0.0, 1.0)
""" Image is not in view if both bounding points exist on the same side """
if min_x == max_x or min_y == max_y:
return None
# You need this to transform the normalized coordinates to the render output size

""" Figure out the rendered image size """
r = scene.render
fac = r.resolution_percentage * 0.01
dim_x = r.resolution_x * fac
dim_y = r.resolution_y * fac
""" Image is not in view if both bounding points exist on the same side """
if round((max_x - min_x) * dim_x) == 0 or round((max_y - min_y) * dim_y) == 0:
    return None
#You scale the image coordinates based on the size of the image.. 
return (
    round(min_x * dim_x),            # X
    round(dim_y - max_y * dim_y),    # Y
    round((max_x - min_x) * dim_x),  # Width
    round((max_y - min_y) * dim_y)   # Height
)

The script I use to loop over the different poses:

import bpy
import os 
import mathutils
import csv
import importlib
import sys
dir = os.path.dirname(bpy.data.filepath)
if not dir in sys.path:
  sys.path.append(dir)
import BoundingBox
import pipeline_arch
importlib.reload(BoundingBox)
importlib.reload(pipeline_arch)
from BoundingBox import *
config_poses = 'D:/cameraPoses.csv'
def add_light(name,x,y,z, energy = 500, light_type = 'POINT'):
Create light datablock
light_data = bpy.data.lights.new(name=name, type='POINT')
   light_data.energy = energy
Create new object, pass the light data
light_object = bpy.data.objects.new(name=name, object_data=light_data)
Link object to collection in context
bpy.context.collection.objects.link(light_object)
  light_object.location = mathutils.Vector((x,y,z))
def add_camera(name, focal_length, x,y,z, q0 = 0, q1 = 0, q2 = 0, q3 = 0):
Create camera block
cam_data = bpy.data.cameras.new(name)
  cam = bpy.data.objects.new(name=name,object_data=cam_data)
Link object to collection in context
bpy.context.collection.objects.link(cam)
#Setting the camera parameters
  cam.data.lens = focal_length
  cam.data.type = 'PERSP'
#Place the camera
  cam.rotation_mode = 'QUATERNION'
  cam.location = mathutils.Vector((x,y,z))
  cam.rotation_quaternion = mathutils.Quaternion((q0,q1,q2,q3))
  bpy.context.scene.camera = cam
def render_settings(resolution_x = 1024, resolution_y = 1024, image_type = 'PNG', color_mode = 'RGB'):
  bpy.context.scene.render.resolution_x = resolution_x
  bpy.context.scene.render.resolution_y = resolution_y
  bpy.context.scene.render.image_settings.file_format = image_type
  bpy.context.scene.render.image_settings.color_mode = color_mode
#Load the camera poses
with open(config_poses, 'r', newline = '') as csvfile:
  ofile = csv.reader(csvfile, delimiter = ',')
  next(ofile)
rows = (r for r in ofile if r)
  verts = [[float(i) for i in r] for r in rows]
#Create the objects
  bpy.ops.mesh.primitive_monkey_add(location = (0.0,0.0,0.0))
  bpy.context.object.name = "Monkey"
  bpy.context.object.scale =  (2,2,2)
  mesh_object = bpy.data.objects['Monkey']
#Running the script over the different poses with the desired settings
  render_settings(512,512,'PNG','RGB')
  #light = add_light('Light',20,1,1,5000, 'POINT')
  file_boundingbox = os.path.join('D:/', 'bounding_box.txt')
for pose in range(len(verts)):
  scene = bpy.context.scene
  cam = add_camera('Camera',50,verts[pose][0],verts[pose][1],verts[pose][2],verts[pose][3],verts[pose][4],verts[pose][5],verts[pose][6])
  cam = bpy.context.scene.camera
  light = add_light('Light',verts[pose][0],verts[pose][1],verts[pose][2], 2000, 'POINT')
  #mesh_object = bpy.data.objects['Monkey']
  file = os.path.join('D:/', 'imageMonkey_' + str(pose))
  print(camera_view_bounds_2d(scene, cam, mesh_object))
  #write_bounds_2d(file_boundingbox, scene, cam, mesh_object)

  bpy.context.scene.render.filepath = file
  bpy.ops.render.render(write_still=True)
#Clean the scene
bpy.ops.object.select_all(action = 'SELECT')
bpy.ops.object.delete()
#Clean the data 
data = bpy.data
for camera in data.cameras:
   data.cameras.remove(camera, do_unlink = True)

Datafile:

x,y,z,q0,q1,q2,q3
20,0,0,0.5,0.5,0.5,0.5
0,20,0,0.0,0.0,-0.707107,-0.707107
-20,0,0,0.5,0.5,-0.5,-0.5
0,-20,0,0.707107,0.707107,0.0,0.0
0,0,20,0.707107,0.0,0.0,0.707107
0,0,20,1,0,0,0
0,0,-20,0,0.707107,-0.707107,0.0
0,0,-20,0,0.382683,-0.923880,0.0
10,-20,0,0.707107,0.707107,0.0,0.0

ok, i am willing to help, but then i want a script example, which i can copy/paste and it works (just the script). But it doesn't ...so please extend it so that it works. thanks. or provide blend file. — Chris, Jun 30 '21 at 13:49
Hey Chris, thank you very much. I've included the two scripts that I use and the dataset. However, in order to be able to import the script BoundingBox, you need to save the scene as a Blend file. As can be seen when running the script, it constantly prints (0,0,512,512) as the BB locations, however, running the BoundingBox script in isolation at one selected pose results in the correct BB coordinates. If there is anything unclear or you still need something, please let me know! — Larsvdh, Jul 01 '21 at 17:07
Often an update is required, particularly if constraints are involved. The matrices may need updating with a context.view_layer.update() or depsgraph.update() used to be scene.update() or a mesh.update() Related re creating 2d bbox https://blender.stackexchange.com/questions/214572/constrain-a-camera-to-an-object-while-also-aligning-it-perfectly-to-the-center-o/214597#214597 — batFINGER, Jul 01 '21 at 17:27
will also post https://blender.stackexchange.com/a/176762/15543 could make a script to add all cameras with markers, then render animation. On a closer look think the issue is cam = bpy.context.scene.camera should be vice versa. Adding a new camera, but always using the first one set as scene.camera — batFINGER, Jul 01 '21 at 17:46

score 3 · Accepted Answer · answered Jul 07 '21 at 12:34

Setup via animation system.

In an previous answer ran over some methods to set up different camera views. By way of answer here will set up the cameras and markers in one script. Calculate the 2d bbox in a frame change handler.

Multiple cameras and markers using data from above.

Result of running on default file

import bpy
from mathutils import Matrix, Quaternion, Vector
context = bpy.context
scene = context.scene
data = '''20,0,0,0.5,0.5,0.5,0.5
0,20,0,0.0,0.0,-0.707107,-0.707107
-20,0,0,0.5,0.5,-0.5,-0.5
0,-20,0,0.707107,0.707107,0.0,0.0
0,0,20,0.707107,0.0,0.0,0.707107
0,0,20,1,0,0,0
0,0,-20,0,0.707107,-0.707107,0.0
0,0,-20,0,0.382683,-0.923880,0.0
10,-20,0,0.707107,0.707107,0.0,0.0
'''
cam_col = bpy.data.collections.new("Cameras")
def newcam(name):
    camdata = bpy.data.cameras.new(name)
    cam = bpy.data.objects.new(name, camdata)
    cam_col.objects.link(cam)
    return cam
def addcam(i, x, y, z, q0, q1, q2, q4):
    q = Quaternion((q0, q1, q2, q4))
    loc = Vector((x, y, z))
    M = q.to_matrix().to_4x4() @ Matrix.Translation(loc)
    f = i + 1
    cam = newcam(f"F{f}")
    cam.matrix_world = M
    print(i, loc, q)
    m = scene.timeline_markers.new(name=f"R{f}", frame=f)
    m.select = True
    m.camera = cam
scene.timeline_markers.clear()

cams = [addcam(i, *args) for i, args in  enumerate(map(float, line.split(",")) for line in data.splitlines())]
scene.frame_start = 1
scene.frame_end = len(cams)
scene.collection.children.link(cam_col)

Frame change handler.

Using script from Constrain a camera to an object while also aligning it perfectly to the center of the object (relative to the camera) even while using perspective which calculates the 2d object bbox, and creates a faceless quad as a visual at object origin.

Here I have converted this to a frame change handler, which using setup above reports the percentage of bbox displayed.

Worth noting the using the evaluated camera object.

import bpy
from mathutils import Vector
from bpy_extras.object_utils import world_to_camera_view as w2cv
import bmesh
import numpy as np
actor = "Suzanne"
create_bbox2d = False
def frame_change(scene, dg):
ob = dg.objects.get(actor)
if not ob:
    print(&quot;No Actor Abort&quot;)
    return None
me = ob.data
cam = scene.camera
print(&quot;Frame&quot;, scene.frame_current, cam.name)
cam = dg.objects.get(cam.name)

bm = bmesh.new()
bm.from_mesh(me)
bm.transform(ob.matrix_world)
u, v, w = np.array([w2cv(scene, cam, v.co) 
        for v in bm.verts]).T
bm.free()
test = w &gt; 0
if not np.any(test):
    print(&quot;All behind camera&quot;)
    return None
#print(u, v, w, test)   
u, v = u[test], v[test]
u = np.where(u &gt; 0, u, 0)
u = np.where(u &lt; 1, u, 1)
v = np.where(v &gt; 0, v, 0)
v = np.where(v &lt; 1, v, 1)

bbox2d = (
        (u.min(), v.min()),
        (u.max(), v.min()), 
        (u.max(), v.max()),       
        (u.min(), v.max()),
        )
dim = Vector(bbox2d[2]) - Vector(bbox2d[0])
area = dim.x * dim.y
if area &lt; 1e-6:
    print(&quot;No area in image&quot;)
    return None
print(f&quot;Area Percent: {100 * area : .2f}&quot;)
print(bbox2d)   



bpy.app.handlers.frame_change_post.clear()
bpy.app.handlers.frame_change_post.append(frame_change)
if name == "main":
    frame_change(bpy.context.scene, bpy.context.evaluated_depsgraph_get())

Result of changing to actor = "Cube" and running on default file.

Frame 1 F1
No area in image
Frame 2 F2
No area in image
Frame 3 F3
No area in image
Frame 4 F4
No area in image
Frame 5 F5
Area Percent:  3.80
((0.4269004762172699, 0.37004533410072327), (0.5730994939804077, 0.37004533410072327), (0.5730994939804077, 0.6299546360969543), (0.4269004762172699, 0.6299546360969543))
Frame 6 F6
Area Percent:  3.80
((0.42690056562423706, 0.37004542350769043), (0.5730994343757629, 0.37004542350769043), (0.5730994343757629, 0.6299545168876648), (0.42690056562423706, 0.6299545168876648))
Frame 7 F7
All behind camera
Frame 8 F8
All behind camera
Frame 9 F9
No area in image

ie on frames 5 and 6 cube is about 4% of camera frame... seems ok going by, by eye, estimate area of rectangle using one fifth of both sides.

Notice too that am clamping the verts to UV range if outside. If the actor is in front but all to the left of the camera view all points on line u = 1 having zero area. Would be very tempting to use in conjunction with Drawing a rectangle using opengl in blender to indicate which way to the context object (game HUD style) when in camera view.

Thank you for the example code! I got it to work! However, I needed to change the lines: M = q.to_matrix().to_4x4() @ Matrix.Translation(loc) to M = Matrix.Translation(loc) @q.to_matrix().to_4x4() for the pose to be correct. — Larsvdh, Jul 19 '21 at 14:04

2D Bounding box around object for different camera's (different positions) - Blender 2.9

A new mesh data block is created, using the inverse transform matrix to undo any transformations

Get the world coordinates for the camera frame bounding box, before any transformations

""" Image is not in view if both bounding points exist on the same side """

if min_x == max_x or min_y == max_y:

return None

Create light datablock

Create new object, pass the light data

Link object to collection in context

Create camera block

Link object to collection in context

1 Answers1

Linked