View Full Version : 4:4:4 10bit single CMOS HD project



Obin Olson
March 18th, 2005, 08:45 AM
we don't have pixel packing with CameraLink

Kyle, what is your project? are you working on software?

Kyle Granger
March 18th, 2005, 09:10 AM
Obin,
> we don't have pixel packing with CameraLink
Then you are unfortunately stuck with the 16-bit pixels. However, you might try a test: Do the 16-bit to 12-bit conversion in software. The CPU overhead should not be too bad: A couple of shifts and an add in C/C++, and you get 72MB/sec. Your programmer sounds very experienced. 10-bit packing (4 pixels in five bytes, 60MB/sec) is also a possibility, but that increases the bit-twiddling. Yet another possibility: "10.667-bit" packing -- 3 pixels in 4 bytes (64MB/sec).

> Kyle, what is your project? are you working on software?
I am pretty much finished with the software. It is merely a grabber application, like yours, with capture, display and recording to disk. For one Gigelink interface (with SI-3300RGB or SI-1300M), or two cameras (stereo, 2x SI-1300).
It all works fine. There is also a converter dialog, which exports image sequences from the RAW file (BMP (24-bit), SGI (48-bit), and DPX (log 30-bit).

You tend to get married to whatever grabber SDK you use. I lucked out with the GigeLink: It is well-written and well-documented.
Most of the CL boards I looked at last summer do perform pixel packing (BitFlow, Coreco, Leutron) --- not that that helps you.

Obin Olson
March 18th, 2005, 10:33 AM
Kyle are you using 10-12bit in your software? or are you using 8bit captures? what are you going to do with your software?

Kyle Granger
March 18th, 2005, 12:17 PM
> are you using 10-12bit in your software?
Application is hot-rodded for 12-bit packed. No 8-bit captures (but simple enough to implement). Device can be 10-bit (3300, 1300) or 12-bit.
> what are you going to do with your software?
I shoot with it almost every day. And a company is involved.

Wayne Morellini
March 19th, 2005, 08:10 AM
Can you keep us updated, and give details when the time comes?

Wayne.

Obin Olson
March 19th, 2005, 08:43 AM
Kyle what camera are you using?
do you have any clips I could see?
are you displaying in black and white or color?

thx for the info on packing we are playing with that ...

Kyle Granger
March 19th, 2005, 10:22 AM
Obin & Wayne,
I'll let you all know when the software goes public, or not.

> thx for the info on packing we are playing with that ...
cool. this means also, as you might guess, more work at the RAW conversion end.

> Kyle what camera are you using?

Two mono SI-1300's, one Bayer SI-3300.

> are you displaying in black and white or color?

both, depending on the camera. color display is quick and dirty. stereo display is red/cyan.

My clips are large, I am still learning, and I am a bit shy about posting clips. Right now I am mainly using the SI-1300's for stereo work. I could post a 16-bit linear mono still picture some place (about 2 MB), if anybody is interested. (where? login?).

Kyle Edwards
March 19th, 2005, 11:03 AM
www.uploadhouse.com or www.imageshack.us

Kyle Granger
March 19th, 2005, 11:27 AM
no luck uploading picture to upload house: not allowed for for PSD, SGI or TIF. Limit at imageshack is 1MB. If anyone wants the 1.4 MB (16-bit mono!) TIF file, just email me.

Kyle Granger
March 19th, 2005, 11:38 AM
http://img216.exs.cx/my.php?loc=img216&image=vasesample9lt.png

I cropped it. This is un-gamma corrected, un-color-corrected. I don't know if the 10-bit data in the 16-bit TIF survived, or if it's just a plain 8-bit image. Looking at an uncorrected 8-bit image would be a waste of time.

Obin Olson
March 19th, 2005, 03:40 PM
wooohooo..and who says digital is not as good as film??


www.dv3productions.com/pub/BAND.jpg

a photo from a photoshoot I just did...like? hate?

using cheapo Sigma digital lens on my Canon 10D 6megapixel with some NICE light ;) a bit of photoshop work on the curves and the contrast...

I LOVE cmos chips...so "organic" I really see CCD and CMOS now when I look at professional work still/motion pictures

Eric Gorski
March 20th, 2005, 12:10 PM
that looks tight obin. now we just need 24fps :)

Obin Olson
March 21st, 2005, 08:21 AM
yes 24fps....that is the ISSUE at hand...LOL

making more progress on save...Kyle we are working with the bit packing...looks like it may not save enough overhead/datarate to be worth it...

Kyle Granger
March 21st, 2005, 09:14 AM
Obin,
Sorry to hear the packing is not helping you with your problem. I was thinking that the 64MB/sec packing would be the best (packing 3 10-bit pixels into one DWORD). Read three DWORDS (6 pixels) and write two DWORDS. That puts additional bus bandwidth at (96 + 64) MB/sec. If you have a 533 MHz FSB (2133 MBsec), I thought it just might work. The CPU should not be overloaded -- it sounds like a memory bottleneck. Oh well...

Obin Olson
March 21st, 2005, 04:06 PM
I am finally making some slow progress. I've spent the morning
developing a more efficient way of pacling bits. I'm in the process of
validating the code right now to make sure I am not losing any data. This
will take a bit as things are totally unintuitive and I have to check the
bits every few seconds to make sure things works properly. The main logic is
that pairs of pixels that are in short integes (16bit) are combined together
in 3 bytes thus saving 25%

Obin Olson
March 21st, 2005, 04:35 PM
of the storage requirements. The only strange
going on so far is that the order of these 3 byte packets are flipped in the
conversion for some strange reason. It seems to be happening during the
copying process. I'm debuggin this right now, but in any case it might be
irrelevant if they get flipped back during conversion back to 16 bit format
to create the TIFFs

Obin Olson
March 21st, 2005, 05:00 PM
I wan to try the bytes to Integer conversion first to
make sure we do no lose any data on the trip back. I got some code this
morning from that friend, but I cannot compile it as he forgot to send some
header files with it. It looks like it may be even more efficient, but he is
in meetings all day so I will not get the headers unitl tonight at the
soonest.

Kyle Granger
March 21st, 2005, 05:08 PM
Obin,
You'll get it to work, I am sure! Two more points...
1) I wouldn't worry too much yet about doing it "perfrectly correct", and confirming the unpacking during the export process. More important is "optimally", i.e., the inner loop code. You want to make sure that your performance is not compromised, and that you are truly gaining something by writing less to disk, which in your case will be 72MB/sec
2) I have recorded about half a terabyte, so far. I just want to add that every byte you save is a byte earned: in the conversion process, transfering files, and archiving. It will be worth it.
My two cents.

Obin Olson
March 21st, 2005, 05:10 PM
That wont work as the pixels coming out of the SDK are 12bit even
though the underlying data is 10bit. You would need to convert them from 12
to 10bit and that would waste too many cycles. If they were already 10bit
this would be very easy to implement and would be just as fast as the 12 bit
solution with much better performance. I will fiddle with the SDK a bit
later to see if it is possible to get the actual 10bit pixels from the
framebuffer. Since Xcap does not give that choice I doubt it.

Kyle Granger
March 21st, 2005, 05:17 PM
> That wont work as the pixels coming out of the SDK are 12bit even though the underlying data is 10bit.

you bit structure is then thus?

msb lsb
0000bbbb bbbbbb00

> and that would waste too many cycles.

not necessarily. this depends on how you are masking and shifting.

but if the 12-bit packing works for you, then great!

Obin Olson
March 21st, 2005, 07:51 PM
Right now I am testing with packing 2 pixels into 3 bytes thus
saving 25%, but I do that with a single shift and masking. To do 10 bit
packing I need to align into 8bit boundaries as you can only copy memory
fast in 8bit chunks and I would have to pack 5 pixels in 6 bytes, as doing
otherwise would leave dead space which would defeat the purpose of it all.
This would mean some more shifts, but you would save 38% space instead of
25%. In any case I am testing data integrity now, as it is no use going very
fast if the data gets corrupt in the first place while recording. I will
write code to try 10bit packing too and then I will profile it to see if I
gain enough speed from the extra work to make the space saving effective.

Regards,

Luc

Obin Olson
March 21st, 2005, 09:25 PM
George Lucas will be re-issuing all the Star Wars films in 3D, one film per year, starting in 2007.

Kyle ?


"I'm a man on a mission when it comes to 3-D," Cameron said. "I will be making all of my films in 3-D in the future. We need exhibition to come in to own a big chunk of the (emerging 3-D) market."



hmm..Kyle works for a "firm" this firm or "group" wants "3d imaging" kyle gets on the message board and NEVER tells what he is really doing.....makes you wonder huh...

rock on Kyle! I wish you the best of luck... someday you can tell me what your ---really doing--- ;)

Kyle Granger
March 21st, 2005, 10:03 PM
I know this is probably not the best place for code, but here goes anyway....

// packing three 10-bit pixels, received as shorts,
// into DWORD
// assuming 10-bits here:
// 0000bbbb bbbbbb00
// savings of 33% over unpacked 16-bit format

unsigned short *src;
unsigned long *dst;

*dst = (unsigned long) *src++;
// dst = 00000000 00000000 00001111 11111100

*dst += (((unsigned long) (*src))<<10);
src++;
// dst = 00000000 00222222 22221111 11111100

*dst += (((unsigned long) (*src))<<20);
src++;
// dst = 33333333 33222222 22221111 11111100

dst++;


[ code is not tested, nor compiled...but no byte arithmetic is done...just longs...no masking...just adds...luc or you can email me, obin ]

Kyle Granger
March 21st, 2005, 10:14 PM
>> hmm..Kyle works for a "firm" this firm
>> or "group" wants "3d imaging"

heh, heh...thanks, Obin.

Well, i used to live down the street from George in San Rafael. Now I live down the street from Arri.

However, the reality is much less exciting, since I'm just doing what all of you are doing....

WORKING ON MY SCRIPT!!!

But it's nice to see Jimbo and George "möge die Macht bei Dir sein" Lucas getting with the program. It makes me feel a little less insane.

Wayne Morellini
March 22nd, 2005, 04:56 AM
<<<-- Originally posted by Obin Olson : George Lucas will be re-issuing all the Star Wars films in 3D, one film per year, starting in 2007. -->>>

I wish he get over it, maybe even film the last three episodes. I am fed up with the reruns everytime he releases a money spinning modification of the original (which looked better than the last mod anyway).

Maybe they can replace princess Lela with a better looking computer generated character this time, and the dorky cloths, and replace Mark Hamil with a computer generated young Brad Pit, or Matt Daemon (the female audience would love that ;). You think I'm joking, eventually this is what they will be able to do. actually they could give Luke and Hans a real challenge and replace Lela with Julia Roberts, or freak everybody out by replacing her with Meg Ryan (no Jeniffer Lopez).

Actually I can almost feel this post being sucked away by the local Rob Star.

Anyway, Obin, do you program, I thought you had a programmer doing all of this for you?

Steve Nordhauser
March 22nd, 2005, 07:31 AM
Obin,
This might be the time do dust off some assembler skills. At least, look at the code generated by the C++ compiler. I haven't checked the instruction set on an x86 in years but I think that there was a barrel shifter that could do any shift in a single cycle. Sometimes the compilers don't pick up on these things and do 4 separate shifts.

The only place code really needs to be optimized is in the inner-most loops - the stuff you do a million times an image - packing is certainly one of them.

One trick I learned back when coding was a simpler place was to write a function in C and compile everything. Then I could change the function at the assembly level and know all my stack and data was correct.

Of course all this could just be showing how obsolete my programming skills have become.......

Brad Abrahams
March 22nd, 2005, 01:33 PM
Hi Obin, I posted this over on the Andromeda forums.

The source code for the DNxHD codec is freely available for download from:

http://www.avid.com/forms/DNxHDinfo.asp

I've already tested the 10-bit codec and the results were very promising.

Obin Olson
March 22nd, 2005, 06:25 PM
I am glad to hear that Brad as I am going to use that codec once we get the save working in CIneLink...

Anders Holck Petersen
March 22nd, 2005, 09:54 PM
But this is 8 and 10 bit YUV only right?


<<<-- Originally posted by Brad Abrahams : Hi Obin, I posted this over on the Andromeda forums.

The source code for the DNxHD codec is freely available for download from:

http://www.avid.com/forms/DNxHDinfo.asp

I've already tested the 10-bit codec and the results were very promising. -->>>

Wayne Morellini
March 22nd, 2005, 11:14 PM
You guys realise that DNXHD 10-bit is 220Mb/s which is close to lossless compressed bayer 1080p?

Obin Olson
March 22nd, 2005, 11:28 PM
Don't think it's MB/sec Wayne

Wayne Morellini
March 22nd, 2005, 11:39 PM
I said bits per second.

Jason Rodriguez
March 24th, 2005, 09:25 AM
BTW Obin,

beware that you can't change the DNxHD codec in any way for redistribution, at least not without AVID's approval and licensing fees.

In other words, this isn't "open source" as in GPL, BSD, etc.

Wayne Morellini
March 24th, 2005, 11:35 AM
Kyle,

Yes, it is a pain. I prefer the way that traditional Internet newsgroups work. It would be good if the board could do the same the same in the editing window using ">" for indentation of quotes. But replace the edit view ">" indentations in the thread view, with box indentations around each subsequent quote level with the present reply with no box. I am sure I have seen this done on this software on other boards.

That thread, I don't know what it was about, apart from another project like this. It was mostly savaged (edited) by the time I got there, with very very little left about the actual project, we suspect it was just a troll that has been sub-sequentially deleted. They should really make a general discussion forum called "Arguments" for each board and Usenet group, so all the Trolls and flamers can get together. That was a valid suggestion Rob.


Guys,

Boy this thread has slowed down, I'm glad. I looked over the thread history, and do you realise we laid down the over 50% of this thread in the first three months.

Obin Olson
March 24th, 2005, 04:56 PM
It will be an "option" for export Jason ;) if you have the codec installed then your golden ;)

Obin Olson
March 26th, 2005, 09:57 AM
ok..so we are doing well with the bitpacking(thanks Kyle) we have a solid 33fps 1080p 12bit at 88MB/sec..things seem to be on a good track again and we are looking at getting a working test version with display and save today..

Obin Olson
March 26th, 2005, 09:58 AM
...I am keeping my fingers crossed! our overhead for bitpacking and save is only 4-13% cpu..this is good! as we will have leftover CPu for other things...display is 60% cpu...

Obin Olson
March 26th, 2005, 09:59 PM
I got some updates but it's too late to test as I am at hom now..I will go in the morning and see what we have! my programmer says the save and display is working well on his system...now I gotta test it on the Intel mobile board/chip here....

Wayne Morellini
March 27th, 2005, 01:11 AM
<<<-- Originally posted by Obin Olson : ...I am keeping my fingers crossed! our overhead for bitpacking and save is only 4-13% cpu..this is good! as we will have leftover CPU for other things...display is 60% cpu... -->>>

I thought the display was being done by the GPU, if so then display should be close to 4% also. This leaves me to believe, that the onboard GPU you are using does not support all the GPU features you are using, and is emulating the unsupported instructions in software. My experience with Direct X games is that when you switch from the custom made software render to the Direct X only software render, you get great slow downs (I forget but at least half, maybe 4+ times). I think Directx is good when it addresses actually hardware, but poor when it has to emulate it. So maybe similar things are happening here.

The best bet is to have a profiling program that looks to see what features are available in hardware that can't be emulated quicker on the local processors (some hardware is that slow compared to high speed processor with speed left over going to waste). Have dynamic code that uses custom written hi-speed portions to emulate the missing hardware instead of direct X. This may also be undoable with the GPU programming system you have. You might have to re-arrange the code so most of the portions that are compatible are together and separate from most of the emulated bits. One thing I think is happening in direct x is that their dynamic code is not written for speed, and is getting context hits through the different abstraction layers they are using. So the methods used to jump between different code portions cane also greatly slow you down.

If you can get this down to a low percentage then you will have enough to run lossless compression or very quiet, low speed processor on it.

Happy Easter

Radek Svoboda
March 27th, 2005, 07:59 AM
Is the Altasens CMOS avilable already? If no, when will it become available?

Steve Nordhauser
March 28th, 2005, 07:34 AM
Radek,
The short answer is yes. Shipping now. In the SI-1920HD camera. The longer answer is that we are still trying to bridge the gap between a working piece of hardware and a cinematography tool. Please contact me off-list for specific sales information.
Steve

Noah Yuan-Vogel
March 30th, 2005, 09:59 AM
I've been following these homemade HD threads for months and have been trying to decide what my best option is. I've been thinking about the SI-3300 for a while, but I am unsure as to whether it is in my price range with whatever other expenses it will require. I'm someone who would otherwise be buying a prosumer DV or HD camera but cant stand the limitations of the color sampling and compression of DV and HDV formats.
Do cameras (SI-3300 and 1920) from Silicon Imaging have built in GigE in the unit or do they use a cameralink to GigE adapter (is that then added to the price)?
Also, it sounds like there is no tried and true method of capturing video... It would be nice if someone could just condense all the best (and cheapest) options for each required piece of hardware/software necessary for getting an HD image onto HDD (camera, grabber, software).
Sounds like streampix (also expensive) and Xcap have their problems... but is such software adequate for adjusting, previewing, and capturing uncompressed video streams to HDD. Is a special framegrabber required for GigE cameras? or is a standard Gigabit ethernet connection all you need. Also, It is still unclear to me if these cameras can do 24 or 48fps while maintaining 1/48th shutter...

Maybe I just need to look harder for these answers, but if anyone can help me, thanks. I trying to learn as much as i can about this.

Anyone know anything about the Epix "silicon video" packages that have micron cmos sensor cameras bundled with cable, grabber and software? 1280x1024 at 30p over GigE with all components needed for capture at only $995 seems like a great deal.

http://www.epixinc.com/products/sv9m001.htm
http://www.epixinc.com/products/sv9m001.htm

Obin Olson
March 30th, 2005, 07:26 PM
the HuGE issue is SOFTWARE...and that is my focus..and as of now after 6 months we are VERY close but not done....

from my programmer:

I've just complete a new suite of test on a new system and I am
still running into the same problem even though the technology and approach
is totally different. When calling the routine across threads it seems that
there is a pretty big delay involved that ends up being close to saving time
itself. I'm building a fourth test case right now to see if I can find a way
around it as I have still 3-4 different things to try.

can anyone on here HELP with this issue? it seems that call times across threads is our last problem to a working software...we are at the LIMIT of the CPU at this point and MUST increase the performance or we will not have a working system

Obin Olson
March 30th, 2005, 07:35 PM
I can tell you right now that the 1280 images are not going to be a clear as you may want...it's a single cmos camera not 3CMOS system...makes a BIG change in the resolution of your image( after all your looking at RGGB in the space of 4 pixels instead of 4 pixels that EACH show RGB) I would try and go with the 1080P if you can...Kyle on the board has some software that will record the raw images from a GigaBit camera - the 3300rgb...THe 3300rgb is a GREAT camera...if you can capture your data from it!!

Wayne Morellini
March 31st, 2005, 08:25 AM
<<<-- Originally posted by Obin Olson :
can anyone on here HELP with this issue? it seems that call times across threads is our last problem to a working software...we are at the LIMIT of the CPU at this point and MUST increase the performance or we will not have a working system -->>>

Hunt around for a real-time embedded system/kernel for Windows XP (used to be popular for previous versions of Windows). I do not know exactly what you are meaning, but I can guess. Yes, getting a routine to work on another thread should introduce a big latency problem. There is a large latency hit in subroutine calls in modern computers and memory systems (for other readers here, the further you get away from primary cache to storage the longer, of course). When you call these routines through windows/abstraction layers they add a major hit too. I imagine waking up another thread to do something also has an significant hit. Just calculate all the various hits and find the shortest path. Windows is not so good at these things, I don't know if you can really reduce it (apart from a embedded replacement kernel) apart from doing things differently. Also the programming technique you use to cause a thread to wait for something to happen, can waste lots of cycles, but also putting something to sleep and waking it up can take a lot, but there are ways to do it with less latency and less cycles. I am used to thinking of a lot of latency in terms of less than one cycle, all this PC stuff is so primitive. Rob.L should know how to do these things on a PC, it would be good to ask him.

Your CPU overhead should be a lot less then what it is at the moment for view, what is Kyle doing on his system?

Thanks for your continuing efforts.

Rob.S, when are you coming back with yours. He might have some ideas that could help?

Juan M. M. Fiebelkorn
March 31st, 2005, 03:28 PM
I got an easy and foolproof solution.
It is called : LINUX

Wayne Morellini
April 1st, 2005, 02:01 AM
Yes, I think Mac OSX as well, but how do we get this Windows version completed.

I have news that a dual core Mac Mini will be out mid year season, that would be a good basis for the Mac version .

Have a technical update on the technical thread with other new Macs, batteries, JVC HD lower compression camera, interesting stuff:

http://www.dvinfo.net/conf/showthread.php?s=&postid=294872#post294872

By the way, where is Ronald Biese, he was interested in Linux version.

Obin Olson
April 1st, 2005, 11:03 AM
Juan how hard to convert our software to LINUX? we are still looking at other solutions..I am going to overclock the board a bit and see if that will be enough power

Obin Olson
April 1st, 2005, 11:43 AM
so anyone have some answers to our questions? we are a bit stuck at this point... I understand Linux but that is a last resort as we have everything working but this last mile...

Kyle Granger
April 1st, 2005, 01:20 PM
Hi Obin,
OK, you seem to have a problem. Before jumping to another OS or the Mac or overclocking your board, it may be wise to first determine WHAT the problem is.

It SOUNDS like there is a thread interaction problem (waiting for mutex, or some signal, not sleeping?), but you should first exclude a problem with one thread. Wayne is right about problems with subroutine calls and Windows programming in general, but I believe the problem is PROBABLY much less exotic. Also, if you assume the problem exists in your code, you then have a chance to fix it. These things for me have always turned out to be bugs. And bugs can be fixed.

For what it is worth, my application is extremely multi-threaded, and I have not experienced these problems.

You said earlier that the problem had to do with "When calling the routine across threads", but I don't really know what "across threads" means.

1) Profile each of your threads individually with a canonical test case (1920x1080x24p, or some standard config file) I assume there are three main threads or tasks: Capture, Display, and Writing. Get a rough CPU usage for each. Display can just read the same buffer, 24 times a second. Same with writing.

If the CPU usage is reasonable (say, 3% for Capture, %17 for Display, and 24% for Writing), then there is some interaction problem.

2) Try thread interactions, doing just Capture and Display.
Does that work with the same CPU usage as Capture only PLUS Display only?

3) Try Capture + Writing.

4) Try Display + Writing.

5) Look at how the threads get fed. Are they properly sleeping? If they are waiting for a signal, can you test just by polling and sleeping 1-10 ms?

6) Is one thread running at too high a priority?

If you are using DirectX for the interface to the GPU, you will have to rewrite that for Linux.