DV Info Net

DV Info Net (https://www.dvinfo.net/forum/)
-   Alternative Imaging Methods (https://www.dvinfo.net/forum/alternative-imaging-methods/)
-   -   how fast: Packing < 16 bit pixels into words? (https://www.dvinfo.net/forum/alternative-imaging-methods/29998-how-fast-packing-16-bit-pixels-into-words.html)

Les Dit August 3rd, 2004 12:42 AM

how fast: Packing < 16 bit pixels into words?
 
I have a programming question that maybe some of you can answer: How fast can we expect a high end P4 system with dual channel DDR be expected to bit shift and repack 12 bit data into a few words?
Is this something that MMX or SSE instructions can help with?
I suppose this would happen in cache if it's a scan line at a time.

So, How many megapixels per second, for lack of better metrics?
I don't care if it's not real time, but how fast. I want to move images across a gigE network, weighing packing vs just transferring.

Thanks
-Les

Rob Lohman August 3rd, 2004 05:21 AM

If I understand you correctly this can be done very fast. Especially
since it can be done inline in your buffer (ie, no need to allocate
a second buffer). I have no indications to you how fast, but
probably fast enough to do realtime over a gigE network.
Especially if it where coded in assembly.

12 bits are excellent as well since you can store these as 8 + 4
bits instead of 16 bits. So you will need to process two 16 bit
pixels at a time and pack that into 24 bits (3 bytes), not that
hard to do in C or assembly as long as you know how many
pixels there are in the memory block.

I'm not sure if MMX/SSE could help here, I have no experience
with those. But if someone checks the spec on the instructions
available it shouldn't be too hard, basically in assembly it boils
down to something like this (the following code assumes the
lower 12 bits are to be used in Intel format and is not tested):


MOV ESI, [source buffer]
MOV EDI, ESI
MOV ECX, [number of 16bit pixels]

Start:
; load 2 16 bit pixels, 32 bits
LODSD
MOV BX,AX
SHR EAX, 16

; re-pack them as 24 bits
SHL BX, 4
SHLD AX,BX,4
STOSW

SHR BX, 4
MOV EAX,EBX ;probably faster than AL,BL
STOSB

DEC ECX
LOOP Start


Something like this. Can probably be optimized futher, but this
should basically do what you want pretty fast (it has one 386
instruction in it to speed up some of the packing). This routine
overwrites the buffer it is given transforming every 4 bytes into
3 bytes. So for every n pixels (which must be a multiple of 2!)
you will get a buffer back of (n * 2) - (n / 2) bytes. It might swap
some pixels, but that rarely is a problem if you have enough time
on the other end to de-pack the data which can probably be
done in realtime as well.

I've looked the web a bit regarding MMX and SSE and I don't
think they will help for this. MMX seems to largely operate on
words/double words/quad words which is not what we are
trying to do. SSE seems to be mostly for floating point work
which is also now what you are trying to do.

One last thing. In this case it will introduce another loop over
the data. Whenever possible you should try to integrate packing
into either the writing routine or the reading one so you don't
waste time going over the data again.

Rob Belics August 3rd, 2004 07:59 AM

Hey! A fellow asm coder! Have I seen you on hutch's board or win32asm before?

Rob Lohman August 3rd, 2004 08:28 AM

Hmmm, all Rob's seem to be ASM coders thusfar, hehe. Rob Scott
at least codes as well, think he can do ASM as well.

Nope, never been to any of those boards. It has been a pretty
long time since I've done anything in ASM (at least 5+ years),
but did a lot of low-level stuff in the DOS days and whatnot.

Never really forgot it although I never ventured into the whole
386/protected mode/MMX/SSE stuff etc.

Gives you a great insight into computers / operating systems and
how things works, don't you think?

My programming experience went like this:

(Quick)basic -> assembly -> pascal/delphi -> C(++) -> Visual Basic -> C#

Rob Belics August 3rd, 2004 08:46 AM

I used to design hardware so assembly was part of the job. Begrudgingly learned C, then C++. Looks like I'll be starting a server business so need to get into C#, Java, etc. But I'd rather do it all in assembly.

But programming has taken a back seat for a few years now that I've gotten back into film. So I hesitate to answer MMX/SSE questions since it would only be from a foggy memory.

My foggy memory says SSE can do this 12-bit work on chip but, as I said, I don't recall.

Rob Lohman August 3rd, 2004 08:50 AM

It's like the other way around for me. I'm doing programming as
a job and hopefully will be moving to some film related stuff in
the near future. Too bad I ain't exactly on the right side of the
globe for that.

Les Dit August 3rd, 2004 11:51 AM

Thanks Rob,

I think it would be interesting to see what the ASM output of the Visual C compiler would look like to do the same thing!
A test for their optimizer?
I hear the Intel compiler is the best, but I don't think it's popular.

-Les

Rob Belics August 3rd, 2004 11:57 AM

The output of compilers can be really bizarre to look at, especially Microsofts. Names get mangled and it can be hard to follow the logic flow. Though efficient, it is just hard to follow sometimes.

The optimizers are very good. But in critical timing, it can still be best to hand optimize it.

I just happen to think that, in the programming world, you get arguments about HLL language vs assembly all the time. Just like the film/digital arguments.

Rob Lohman August 4th, 2004 02:05 PM

If you build the function correctly in C I think Microsofts and Intels
compiler will probably closely match to what I've written. They
might even include some more tricks. I've been reading an
assembly optimization guide for Intel processors the other day.
Interesting stuff regarding cache misses etc. etc. Such stuff will
take quite a lot of time if you want to do it correctly (ie do it in
C first time that, do it in assembly, time again and see what can
be futher improved etc. etc.)

Les Dit August 4th, 2004 02:11 PM

Thanks again for the info guys.

On a related note: I just did some network tests between 2 identical P4 3Ghz machines with a tool called iperf.
I'm getting 90 megabytes a second between the two !!

This has no file system overhead, it's just raw data passing between the two, but it looks very good!

-Les

Rob Lohman August 4th, 2004 02:25 PM

Is this true the microsoft drivers and TCP/IP stack? What kind of
network is this exactly?

Les Dit August 4th, 2004 02:34 PM

Microsoft drivers, for the Marvel Yukon chip on the motherboards.
8 port gig E switch.
Nothing fancy!
-Les

Rob Scott August 9th, 2004 03:56 AM

Les, Rob Lohman is correct -- it should be possible to write a very efficient routine in assembler. In the ObscuraCapture app (tm :-) I've been able to pack the 10-bit data from the SI-1300 camera at over 250 MB/sec.

Les Dit August 9th, 2004 11:44 AM

Thanks Rob, that's fast enough. I'm looking at options for speeding up my film scanner, it has an 8 megapixel camera on it.
Are you guys using GCC ? I was wondering what the interactive debugger is like on that. Most of my code is written by my programmer, but I do mods and add features. I am OK with the MS visual debugger, but MS isn't issuing bug fixes on the C side of that dev system much anymore, so we want to switch to the GCC system. I'm not comfortable with their optimizer and I hear the debugger is command line.

What type of system are you getting 250MB a sec on? dual ddr with 800 Mhz FSB ?

Thanks
-Les

Rob Scott August 9th, 2004 12:09 PM

Quote:

Les Dit wrote:
Are you guys using GCC?
I'm using MS VC++. I started out using MinGW (basically GCC for Windows) but had trouble calling DirectX. There is very nice a graphical IDE for MinGW and it had a decent visual debugger IIRC.
Quote:

What type of system are you getting 250MB a sec on? dual ddr with 800 Mhz FSB ?
No, it's just a basic laptop running an AMD Athlon XP-M 2500. I'm not sure about the memory configuraiton. (And no, I don't have the camera connected to the laptop. I have some code that simulates it.)


All times are GMT -6. The time now is 07:25 PM.

DV Info Net -- Real Names, Real People, Real Info!
1998-2025 The Digital Video Information Network