Graphics Programming...

Blazestorm

Supreme [H]ardness
Joined
Jan 17, 2007
Messages
6,940
Most posts I see on here are SQL and databases or webpages. Of which I know nothing about.

Anyone else interested in graphics programming or game development?

I've been working on Tile-based Deferred Rendering over the weekend and I'm glad to have it working for the most part (Need it finished for project credit). This is the same technique DICE used for Battlefield 3. I like to browse random whitepapers and talks by companies, most of the time I have no idea what they're talking about. Same for this until I found an example article and program by Andrew Lauritzen. I knew nothing about how compute shaders worked or even what the algorithm was doing.

But seeing his code and stepping through it, I was able to implement my own version of it. It's pretty clever in my opinion. Normally in a pixel shader, you write code to handle a single pixel and have no knowledge about what is happening with the other threads running this shader. You have no shared memory that you can write to, just read from. But with compute shaders, your shader is written with the knowledge you're working with a "thread group" with x,y,z dimensions you specify (z is usually just 1). So instead of running a full-screen pixel shader, you split the screen up into tiles and have each tile be executed by a compute shader. In the end you're still doing per-pixel lighting, but your shader now has shared memory, and a lot more information about neighboring threads, etc. And just gives you the power to do a lot more. The compute shader isn't even part of the main DirectX pipeline, so it's kind of clever to just use it like a pixel shader.

Here's a couple of screenshots from a few hours ago, and yea my scene is pretty lame. But I'm lazy and just used procedurally generated meshes and lots of random numbers. This scene has 2048 point lights and still runs at well above 60fps on my machine at this resolution, if I go fullscreen it sits about 60fps when you're near the center (where most of the lights are). The colors of the tiles in the second screenshot are kind of like a "heatmap" of how many lights are affecting those tiles. Red is 256 or more lights, blue is 0 lights.


 
I wrote a tiny software renderer a while ago, it's tiniest incarnation compiled to like 3kb but could only draw projected points. Later it blew up a couple more kilobytes (oh god no) and ended up significantly more flexible and structured internally, like there was some semblance of an index buffer which made drawing wireframe easy. I never got around to actually adding polygon filling or texturing. The only windows graphics related call (past setting up a few things) is BitBlt(), nothing else. Drawing 2d stuff like a picture was just a straight memcpy() to the backbuffer (provided it was a bitmap).

It was also pretty fast, since after some time I eventually vectorized the entire thing, which wasn't much of a feat since it was so tiny. I remember trying to write a faster sincos() call and eventually coming up with a what I thought was a decent implementation... until I profiled it and it was less accurate and not as fast as the native fsincos x87 instruction. My matrix / vector stuff was pretty fast though and the inner most 3d transform was vectorized as well, so it ate 4 verticies at once instead of just one. I even had stuff like vectorized alpha blended routines for drawing pictures.

sw.jpg

sw2.png


I wanted to try my hand at DX by never got far. I've taken a couple stabs at it but never went much of anywhere.
 
Last edited:
Nice, we had to write a software rasterizer for one class, but it was just to learn how graphics worked. They were sllllooowww... Most chugged at 20-30FPS when you had a few hundred triangles. But eventually did a full triangle rasterizer, depthbuffer, and perspective-correct texturing, culling/clipping, etc. and got it to usable framerates. It was a good learning experience to do it in software first, nothing hidden behind API's. (Besides sending the framebuffer to the videocard).

The images in my original post were from my own engine, my math library is just plain x87 FPU. Could vectorize it but I kinda like how it's setup and forcing things into SSE registers takes some of the flexibility out. Because we did all the math in that class, I know how everything works with it... otherwise I'd probably just use XNAMath which is just typedefs and wrapper functions for SSE intrinsics. I think it's cross-platform with Xbox360 and PC (handles endian-ness and stuff).
 
I used to do a lot programming auxiliary to graphics programming back at my days at MN/Dot, but we used OpenSceneGraph to do all the actual rendering.
 
Nice, we had to write a software rasterizer for one class, but it was just to learn how graphics worked. They were sllllooowww... Most chugged at 20-30FPS when you had a few hundred triangles. But eventually did a full triangle rasterizer, depthbuffer, and perspective-correct texturing, culling/clipping, etc. and got it to usable framerates. It was a good learning experience to do it in software first, nothing hidden behind API's. (Besides sending the framebuffer to the videocard).

The images in my original post were from my own engine, my math library is just plain x87 FPU. Could vectorize it but I kinda like how it's setup and forcing things into SSE registers takes some of the flexibility out. Because we did all the math in that class, I know how everything works with it... otherwise I'd probably just use XNAMath which is just typedefs and wrapper functions for SSE intrinsics. I think it's cross-platform with Xbox360 and PC (handles endian-ness and stuff).

Pixel fill rate was probably my biggest enemy, or at least close to it so there was some trickery there to keep things quick. For instance, plotting a single pixel was simple, the back buffer was literally just an array of 32 bit pixels. Like if you move 0x00FFFFFF to the first entry, you'd get a white pixel in the corner, dead simple, that's all there was to it... but...

You had to take some precautions like doing a bounds check to make sure you don't write outside the array (and likely crash). I had a "safe" pixel drawing function which did that for you and implemented some other stuff like alpha blending. If you were sure you knew what you were doing, you could just request a pointer to the backbuffer where you need it and implement whatever you need yourself. Like instead of calling it thousands of times to draw a 2d picture, you could just clip the picture where it intersects with the edge of the buffer. So you'd only need to compare the edges of the picture in the end instead of doing it per pixel.

The actual geometry transform was very simple and very fast. There was no comparing or anything, just some data ready to get brute force plowed through. The engine was very, very simple though. No depth buffer or anything, and it didn't clip geometry so strange things would happen when points went behind the camera.

As for the math library, changing from x87 to SSE was basically transparent. I didn't have to change anything in the end to try and accommodate vectorizing it. My matrix for example just did something like this:

Code:
struct Matrix
{		
	//functions and shit

	__declspec(align(16)) union
	{
		struct
		{
			f32	_11, _12, _13, _14,
				_21, _22, _23, _24,
				_31, _32, _33, _34,
				_41, _42, _43, _44;
		};

		f32 m[4][4];
		__m128 m128[4];
	};
};

Compiler handles alignment for you, and if you use SSE intrinsics, everything "just works." The only downfall was intrinsics sucked early on but compiler support seems much better now, especially in the latest dev preview of MSVC. It'll generally do the right thing (and do it well) behind the scenes without you having to change anything. XNA Math is a good idea too, it's probably very fast.


When I'm done with class in a couple weeks and have some free I might take a stab at DX11 again. I want to try my hand at a simple 2d tile engine I think. Shaders are very :eek: whoa :eek: though coming from fixed function.
 
I never had a chance to use fixed function, I started with DX9c vertex/pixel shaders. But jumping to DX11 was still a huge change. vertex, pixel, geometry, domain, hull, compute... 6 shaders you can use now.

There are issues with mixing __m128 and f32 though. At least according to one of my instructors who's on the DirectX / Xbox team. That's why XNAMath is just typedefs of __m128 and __m128 array[4]. But you also waste space when you want float3's and 2's. Alignment is only handled on the stack, still need aligned_malloc and aligned_free if you do heap allocation, at least that's how I remember it.

Also according to him the best size for a vertex was to match the memory bus of the GPU you're working on. GTX680 is 256bit, 5870 is 256bit, 7970 is 384bit ( or 48 bytes ) so 32bytes (without going over your bus width) per vertex is a nice size.. or 8 floats. Lots of data can be packed into f16's or half's so that gives you 16 pieces of data. Positions you probably want full precision for, but normals, texture coordinates etc. can be packed down. Makes sense to match your memory bus though, one cache fetch instead of multiple per vertex. But then again the vertex shader isn't usually the bottleneck.

Just rambling. You only need vertex/pixel shaders though for basic drawing. And some stages in DX are handled for you, you can set states but you can't program them. So some parts are still fixed-function. And for the longest time I couldn't understand what shaders were doing. People tried to explain it to me but it just went over my head, all using overly complicated explanations. But really the vertex shader is just called for every vertex when you call draw-primitive. If you're doing standard 3D rendering, you should output to the pixel shader at minimum a position of the vertex in homogeneous clip-space (before the w-divide). And then based on the primitive topology you set (triangle list, lines, etc. ) it will start rasterizing the primtive and calling your pixel shader for every pixel. Your output from your vertex shader should match the input for the pixel shader. Interpolation of each component between the shaders is handled for you. Output from the pixel-shader should be a pixel color.

Here's a super simple fullscreen vertex and pixel shader. I draw a 2D quad that covers x/y in NDC space (-1.0 to 1.0f), so my vertex shader just passes the data straight through. Then in the pixel shader I'm sampling a texture with the interpolated texture coordinates and return the color read from the texture.

You would use this kind of shader if you rendered your scene to a render-target, and want to do some fullscreen effects, so that "gDiffuseTexture" would actually just be your normal back-buffer you're reading from.

There's a bit more to using Direct3D, but shaders themselves are usually pretty simple.

Code:
//--------------------------------------------------------------------------------------
// Fullscreen NDC Vertex Shader
//--------------------------------------------------------------------------------------
fullscreen_ps_input fullscreen_vs( mesh_vs_input input )
{
  fullscreen_ps_input output = (fullscreen_ps_input)0;

  output.Pos = float4( input.Pos, 1.0f );
  output.Tex = input.Tex;
    
  return output;
}

//--------------------------------------------------------------------------------------
// Fullscreen NDC Pixel Shader
//--------------------------------------------------------------------------------------
float4 fullscreen_ps( fullscreen_ps_input input ) : SV_Target
{
  return gDiffuseTexture.Sample( gLinearSampler, input.Tex );
}
 
Yeah you need to use aligned malloc for the heap. Because I used SSE for the transform, my "vertex buffer" needed to be aligned. I also just dummied up extra verticies to keep it divisible by 4.

I've also heard of potential problems with using a union like that and mixing things, but it seems to work fine. Same thing with directly accessing the _m128 innards, you're apparently not supposed to, but I checked and it generated pretty much exactly what you'd expect. I think the main worry is you can potentially damage the compiler's ability to optimize it?


The only thing I don't like about DX11 is how verbose it is. There seems to be an awful lot of fluff (hope you love filling out structs!).
 
The method described to me for best performance w/ vectorized stuff is to use x87 for storage, and SSE for computation. That prevents you from having those union issues and relying on the compiler to generate what you expect. I had situations where SSE was slower than x87 because it was spending more time loading it in and out of x87 and the XMM registers than it was doing calculations. The best usage of SSE is when everything stays in the registers, which is why all intrinsics return "copies" because usually that will just be sitting in a register, "no copy" is really made. I haven't profiled / tested enough though. Some things vectorize well, others are worse than x87.

Yea it's a bit fluffy. But I actually prefer it over D3D9, which was a huge mess of fixed function mixed with shaders. D3D11 ensures both you and D3D know exactly how you're using resources, and with debugging turned on it will spew out error messages when you screw it up (like forget to unbind something). I've definitely had fewer bugs developing in D3D11 than D3D9. You can create a 32bit depth-buffer and bind it as a "depthstencil", and then also bind it as a "shaderresource". This means in one pass I can write to it as a standard depth-buffer, but then read from it in another pass. You couldn't do this easily in D3D9.

Because of how they set it up though, it just comes naturally.
 
Fell off the front page :(. I should change the title to something broader. I'm not just interested in graphics, just real-time stuff in general. Most common application being game development.

Couple more screenshots from when I finally turned in the project.




Also starting a source mod with a friend to finish over this summer. Inspired by one of my favorite HL mods for LAN parties... Rocket Crowbar. The crowbar that shot rockets was actually a small part of it, the ridiculous weapons and ability to tweak attributes for your character made it really fun. You could put all your points into jump and just start leaping hundreds of feet into the air, or all the points into speed and be across the map in half a second. Weapons like shrink-rays, where you could squish them like bugs, or trap them under parts of the level so when they unshrunk they died. Or shotguns that shot exploding scientists. Rocket Crowbar source never really took off though. We got our source control setup and got our builds working. Then started playing with the code and this happened... wehavenoideawhatweredoing.
 
Back
Top