Thread

Homebrew microcomputers & video generation

Homebrew microcomputers & video generation Hardware 57 posts Feb 25, 2013 — Apr 16, 2013

#31 Sun, 3 Mar 2013 - 10:29

I'm not saying dual-ported VRAM is "historically authentic"; I'm saying it's reasonably available, today, for homebrew projects.

was once commonly used / invented /in 1980 / first commercial use /in 1986 / Prior to the development of VRAM, dual-ported memory was quite expensive / Through the 1990s, many graphic subsystems used VRAM

The linked article also goes into some depth about how to use it.

Meanwhile, here's DVI-D video out from an FPGA: http://hackaday.com/2012/08/03/moving-an-fpga-project-from-vga-to-dvi-d/

Perhaps outputting DVI-D is a worthwhile approach, whatever the base platform. Fewer pins, less messing about in the analog domain, and same signals as HDMI, so a simple adapter would let you hook up to a modern TV.

#32 Sun, 3 Mar 2013 - 11:16

http://www.pliner.com/macminix/

Why MacMinix?
In an educational environment, MacMinix is ideal. It's easy to install, it runs on top of the Mac OS, and it starts up very quickly. (Just like MachTen.) You can recompile the OS, quit it, and restart the MacMinix application, very simply, without restarting the whole computer. And if you mess something up, you can easily revert to an old version. This is one advantage MacMinix has over the other versions of MINIX. Plus, it utilizes 68K assembly code, which may be more suitable for an educational enviroment than PowerPC, or even Intel instruction sets.

There's also an official, active but unfinished ARM port of Minix3.

#33 Mon, 4 Mar 2013 - 12:30

Here is a DIY GPU to drive a classic Mac display, using a cheap 72MHz ARM over USB.

http://spritesmods.com/?art=macsearm&page=5

[ Related thread ]

Here's a roundup of DIY 68k systems:

http://hackaday.com/2012/09/04/homebrew-68k-extravaganza/

/ETA/ from the comment thread:

The student manual for The Art of Electronics provide a series of labs that result in a stand alone computer based upon the 68k series of chips…

For real old-school homebrewing of the 68000, you have to read the original, early 80′s “DTACK Grounded” newsletters, written by one of the first hobbyists to try to use in a simple system, not a heavy Unix or fancy GUI box. Heavy with the hardware and software details. Be prepared to set some time aside! All written by one engineer with very strong opinions, right in the middle of the 80′s PC craze.
http://www.easy68k.com/paulrsm/dg/

Also, from one of the linked projects:

Now the one truly remarkable feature the 68K CPU S-100 board has going for it is the fact that back in 1987 a guy named Alan Wilcox wrote a whole book describing a complete S-100 based 68K CPU board. The book is the only one of its kind -- describing the construction of an S-100 CPU board, and goes into considerable detail chapter by chapter building up a completely functional board. The book is an absolute "must read" for anybody building a 68K system. The title is "68000 Microcomputer Systems Designing & Troubleshooting" by Alan D. Wilcox. Prentice-Hall Inc. Publishers, 1987. Copies can be obtained from time to time on eBay and Amazon.

#34 Mon, 4 Mar 2013 - 21:30

MacMinix

For the record, if you were porting Minix to a homebrew 68k machine MacMinix probably would not be the best place to start for the very reason that it *is* encapsulated inside of a MacOS program. There were other 68k versions (Atari ST and Amiga, for instance) that were more "bare metal" and thus would provide more in the way of useful bootloader/initialization/etc. code.

For anything more powerful like an ARM system... not to be cruel to it, but Minix is pretty straight-up a waste of time. Minix is slower than Linux, by design, doesn't exactly have much of a track record when it comes to multi-platform-ity, and suffers from an absolute dearth of device drivers even on its native 386 platform. If you absolutely abhor Linux but still want something UNIX-like there are ARM ports of FreeBSD and NetBSD, they'd probably be your best options. (But peripheral support is sorely lacking at this point for most "multimedia" SoCs, in large part because of "blob issues". But drivers *may* actually happen someday, unlike the prospects for Minix.)

Here is a DIY GPU to drive a classic Mac display, using a cheap 72MHz ARM over USB.http://spritesmods.com/?art=macsearm&page=5

[ Related thread ]

This project (which I'd forgotten about) is indeed interesting, if for no reason than it demonstrates the sort of overhead you can expect from "completely emulating" video hardware with a small 70-ish Mhz ARM slaved to an external RAM via GPIO pins. You would have to optimize this a *lot* to squeeze color out of it, let alone substantially higher resolutions.

I look forward to your coredumps, myself, Gorgonops. You have a knack for explaining hellishly tricky low-level stuff in a clear and understandable way.

Thank you. Again, a lot of it is just thinking out loud. Occasionally when you do that you'll manage to have someone with a good idea or two overhear it.

I've been mulling over ways to substitute a quick MCU for all that timing hardware, because I badly want DMA video on *my* homebrew computer. The smallest flexible alternative is something like a 6845

Click to expand...

Considered an SAA5050? Or possibly more useful, its descendant:

The SAA5050 was later superseded by the SAA5243, a similar teletext video generator chip, not only a character generator but a complete stand alone video generator, controlled through I²C.

Click to expand...

The SAA5050 still requires something like a 6845 for video address generation. It's still a semi-interesting chip; it combines a "character generator" (which in most systems was literally nothing more than a ROM full of character glyphs) with the actual pixel generation hardware/shift register, so if you're happy with the text-mode character set it's a way to save a few chips. (Its successor basically has the 6845 built in. Again, if you're happy with a simple text display it'll do but if you're after graphics or want your own character set it's not ideal.)

The outline of my Propeller idea starts with the assumption that the Propeller itself is a *little* too slow to just act like a memory device itself, IE, present itself so a "master CPU" can just shove bytes into its internal HUB RAM. Projects like the Prop6502 deal with this by actually making the "master" CPU a slave from a clocking perspective, but from what I've seen of that approach it just doesn't seem to scale too well. My idea is to instead have the "master" CPU talk to plain old SRAM and have the Prop regularly copy segments into its internal buffer. For "text/tile" systems like the Commodore PET or TRS-80 doing that would actually cut the memory bandwidth required for video a *lot*. For instance, the TRS-80's 64x16 video system only uses 1K of RAM, but each character is 12 scanlines tall, so each 64 bytes on each character line is scanned 12 times, making for a total of 12K's worth of memory reads for each frame. At 60 FPS NTSC that's 720KB/sec. If you only had to read the 1K once per frame then you've cut the speed requirement to 60KB/sec. That's so low that you could easily allow the Prop to only steal a RAM cycle every ten CPU cycles or so and operate somewhat asynchronously. In *theory*.

Obviously the limitation is that if you're doing full-screen graphics instead of characters/tiling pulling a whole framebuffer puts the bandwidth requirements back to even with "hardwired". My thought there is for graphics applications the Prop could be set up to accept *commands* written to some area in RAM and behave basically like a 9918A or similar. (Code already exists to emulate sprite engines like the Commodore VIC-II.) Unused HUB RAM on the Prop could be used as a tile cache and sprite buffer, meaning after an initial load the screen could be manipulated very quickly with minimal bandwidth requirements. (The design could include the capability for the Prop to write data back to RAM, in order to update things like sprite position, collisions, mouse cursors?, etc. That would also, again in theory, make it possible to do things like let a PS/2 keyboard on the Prop emulate a memory-mapped matrix with no additional hardware.)

Anyway. Again, blawblaw.

#35 Tue, 5 Mar 2013 - 19:33

For the record, if you were porting Minix to a homebrew 68k machine MacMinix probably would not be the best place to start for the very reason that it *is* encapsulated inside of a MacOS program.

Ah, yeah. Didn't explain my train of thought there. I was thinking (for myself, anyway) that as an educational venture, building and tweaking Minix inside an app would be more appealing than doing so on DIY hardware that I would have to concurrently debug. And possibly useful in terms of understanding what the heck is involved when it comes time to do it in hardware.

not to be cruel to it, but Minix is pretty straight-up a waste of time.

It's mostly building it in conjunction with the Tanenbaum courseware that appeals to me, not setting it up as a rock-solid end-user experience.

My idea is to instead have the "master" CPU talk to plain old SRAM and have the Prop regularly copy segments into its internal buffer.

OK, pardon me if I've got this sideways -- why copy it into the Prop? So you can use the Prop's dedicated video timers/libraries and whatnot?

I'm imagining something where the Prop tickles the SRAM in just the right way to make it clock the data straight out to video, without having to ingest and regurgitate it.

My thought there is for graphics applications the Prop could be set up to accept *commands* written to some area in RAM

Sort of like -- "You there! I need a circle yea big, here and here, and move sprite #3 a smidge to the left. You handle the details, I have stuff to do." ?

#36 Tue, 5 Mar 2013 - 21:51

OK, pardon me if I've got this sideways -- why copy it into the Prop? So you can use the Prop's dedicated video timers/libraries and whatnot?
I'm imagining something where the Prop tickles the SRAM in just the right way to make it clock the data straight out to video, without having to ingest and regurgitate it.

So, my reasoning goes that if you're "tickling the SRAM directly" so it spits out bytes on cue then you're again having to make the memory system a slave to the very rigid timing requirements necessary for the video system. Which... on one hand doesn't seem like it should be a big deal, since that's how those 80's 8 bit computers were designed originally, but on the other it means that you're not really gaining much by using the Prop; you'll basically be using it as if it were something like a 6845.

The "caching" idea is that you can take a more relaxed approach: the Prop needs to grab one or two K inside of every 1/60th of a second window, but it could do it by, say, halting the CPU *once* and doing the transfer in one big chunk at the end of every vertical refresh cycle. IE, instead of having to master the intricate cadence of tossing the memory bus back and forth between CPU and video once every X cycles at precisely the right time you just halt everything for a thousand cycles once you're "offscreen", suck the contents of the RAM locations allocated to video at warp speed, or the closest it can achieve into the Prop, and then let everything go about its merry way again. That does have some implications for timing issues if you were explicitly trying to emulate an existing system, but I don't see it being a huge deal, and the "pauses" would at least be quite predictable.

Anyway, how well it would actually work in practice depends on just how quickly the Prop could suck bytes in from an SRAM. One thought I had would be you'd supplement the Prop with an 8 bit latch and a binary counter like a 74HC590; for a typical 16 bit address bus application before starting a transfer you'd load the high byte (let's call that the page address) into the latch (which sets the top 8 bits of the address bus) and reset the counter to zero. An actual transfer would consist of setting the 8 GPIO pins on the prop to inputs (or outputs, depending on whether you're writing or reading) and then just *tearing* through 256 memory locations at a time by triggering the increment line on the counter (which is wired to the bottom 8 bits of the address bus) for each transaction. Hardware-assisted block transfers like this are essentially how the optional SRAM card for the "Hydra" game system works, and the literature for that claims that you can read from that card into a COG as fast as a COG can access HUB memory. (Which means in theory you could clock the RAM *really fast* when the Prop controls it, while using a more leisurely clock for the master CPU. Another bonus of using the latch/counter system is you only need to use about a dozen GPIO lines out of the Prop's 32, instead of the essentially all of them you suck up trying to directly use them as address/data lines, and it's still less complicated than the alternative of using multiple latches and multiplexing. That leaves you with lots of I/O lines to use for other things.)

Again, that's totally just the *idea*.

Sort of like -- "You there! I need a circle yea big, here and here, and move sprite #3 a smidge to the left. You handle the details, I have stuff to do." ?

Basically, yeah. All the truly classic 8 bit graphics chips (9918A, VIC-II, GTIA/Antic, etc) all have capabilities like that. ("Okay, move the space invaders left two pixels, let me know if the missile fired by the player hit any of the invaders, and if not go ahead and move it up a few pixels".). The cool thing about the Propeller, of course, is it's a fully general purpose CPU so in theory you could order it to do things like line/circle drawing, vectors, etc. (Also, you could wire a mouse to the Propeller and let the Prop completely handle the cursor generation, tracking, and even some of the high-level aspects of a GUI like window drawing and text handling. Think of it as a sort of Fisher-Price My First NeXT.)

#37 Wed, 6 Mar 2013 - 17:31

Also, you could wire a mouse to the Propeller and let the Prop completely handle the cursor generation, tracking, and even some of the high-level aspects of a GUI like window drawing and text handling. Think of it as a sort of Fisher-Price My First NeXT.

Now that would be cool!

#38 Wed, 6 Mar 2013 - 20:15

How about looking at the video generation circuitry used in the B&W compact Macs? Schematics are out there, it's a simple, elegant design, built from commodity parts. It would make a reasonably starting point.

Another option is to use one of the various CRT controllers that were used in early video cards and some embedded appliances like word processors and terminals. Or you could try driving an ISA video card.

Seems like a giant project though, you might have better luck starting with something a bit simpler, maybe make a single board computer with just a serial port for a console to start with. If you expose the CPU bus on an expansion connector, you can interface anything you like to it later on.

#39 Thu, 7 Mar 2013 - 04:55

Hey all, I haven't popped in here in a while, so I wanted to give a quick update.

I have my Digilent Nexys 2 FPGA board set up, and I've got all the development tools installed in linux. I tested out some VGA signal generation code from a friend, and it works! I'm not going to use his though; I need to spend a little time to learn verilog on my own, and then I'll write my own implementation. Luckily VGA isn't exactly complex, so I think I'll be able to do that over the weekend (or maybe sooner.)

The current plan is to let the FPGA handle video generation and framebuffer, and use an atmel SAM7SE as the CPU. $7 for the FPGA (which will be capable of VGA and HDMI) and $6 for the CPU. Not bad!

Hooray! I was afraid I wouldn't like playing with the FPGA, but this is pretty cool. :approve:

#40 Thu, 7 Mar 2013 - 12:28

Hey Grackle! Good to hear of your progress

The "caching" idea is that you can take a more relaxed approach: the Prop needs to grab one or two K inside of every 1/60th of a second window, but it could do it by, say, halting the CPU *once* and doing the transfer in one big chunk at the end of every vertical refresh cycle.

Ah ok, I'm with you now. That does sound elegant.

The cool thing about the Propeller, of course, is it's a fully general purpose CPU so in theory you could order it to do things like line/circle drawing, vectors, etc. (Also, you could wire a mouse to the Propeller and let the Prop completely handle the cursor generation, tracking, and even some of the high-level aspects of a GUI like window drawing and text handling. Think of it as a sort of Fisher-Price My First NeXT.)

Sound more like a My First XTerm to me - one device to handle the whole UI.

Or you could try driving an ISA video card.

This is ... interesting.

maybe make a single board computer with just a serial port for a console to start with. If you expose the CPU bus on an expansion connector, you can interface anything you like to it later on.

This too.

#41 Thu, 7 Mar 2013 - 17:31

This article on developing new (but so minimalist it might as well be retro) game hardware is interesting; down the page a bit it goes into some detail about handling video on restrained hardware. The base unit is an 8051-family chip intended as a controller for wireless keyboards.

http://www.adafruit.com/blog/2012/12/05/how-we-built-a-super-nintendo-out-of-a-wireless-keyboard-sifteo-sifteo/

#42 Sat, 9 Mar 2013 - 23:26

Hooray, colors!

I worked on that last night and today. Lots of learning about the xilinx toolchain, verilog in general, and VGA signals. Finally got it!

#43 Sun, 10 Mar 2013 - 03:01

Couldn't leave well enough alone... I had to go back and add some counters and offsets to make it scroll.

I know everybody in the history of everything has done a XOR effect, but it's still pretty cool to do it yourself for the first time.

Next step is to add a framebuffer. I'm not sure how involved that'll be; it means talking to memory but I think this device has a controller built in.

#44 Sun, 10 Mar 2013 - 03:25

This is absolutely fascinating!

How expensive are these FPGA things? I think I might get myself one, mostly because I'm curious.

I don't really have any productive purpose for one (yet!), but I probably never will unless I find out exactly what it is and learn how to use it.

OK, I don't want to hijack the thread, I just wanted to interject some of my thoughts on the matter.

Carry on...

c

#45 Sun, 10 Mar 2013 - 04:35

Hey no problem, we've touched on a bit of everything here!

The board I have is the Digilent Nexys2, which I got a few years ago for $100 with an academic discount. Apparently they have a Nexys3 now, with a Xilinx Spartan 6 (vs the Spartan 3 on my board). If you get that one you can use the new Xilinx "Vivado" development environment, which is supposed to have some improvements. There's also the Basys2 which is super cheap if you're a student.

The other direction to go is Altera, which I've heard has a better development environment. However, their boards are a little more expensive.

Both Xilinx and Altera have free (limited) licenses for their software. One of the biggest differences is that Altera's free license gives you access to a logic analyzer tool that lets you see signal timings on your device, which can be very handy.

#46 Sun, 10 Mar 2013 - 04:44

Hi,

I looked up your suggestions, and the Basys2 looks good. It's cheap, which is nice ($50 for a US Student isn't bad at all. One question though-- do I have to prove anything special (like what I'm majoring in, what school I'm in, etc.), or can I just say that I'm a regular college student?)

Quite frankly, the more expensive ones would probably be a waste for me, since I know almost nothing about them.

I could upgrade to one in the future, though.

Thanks!

c

#47 Sun, 10 Mar 2013 - 14:49

I think all you need is a .edu email address.

#48 Mon, 11 Mar 2013 - 15:54

You might also check out fpga4fun, which has some reasonably cheap entry-level devices and a few cool projects and tutorials.

#49 Tue, 19 Mar 2013 - 06:32

PropBerry – Propeller & Raspberry Pi combo

For PropBerry, I was thinking using a Parallax Propeller* (or just Prop) as a super i/o co-processor for the RPi where the Prop would be used to offload the real-time I/O and let the RPi handle the higher program features. After talking about this combo on the Parallax forums, the Props VGA video capabilities were mentioned which got me thinking about using the PropBerry as a VGA serial terminal console and shelve the i/o co-processor idea for now.

RBox: A diy 32 bit game console for the price of a latte

Uses the smallest and cheapest 32 bit CPU to generate 3D graphics and sound.
The RBox is a game console that is simple enough to build on the prototype area of an NXP LPC111X dev kit; no pcb required just a crystal, a few capacitors and resistors.

Features:

320x240 composite or s-video output generated entirely in software

256 colors with standard palette, up to 8k colors

8 bit 15khz stereo audio

~$1 Analog joystick

~$1 CPU

(from http://zuzebox.wordpress.com/2011/01/31/an-update-to-list-of-homebrew-video-games-consoles/ which has some dead links)

And this is probably about as minimalistic as you can get:

Bit banger

Bit banger is my most constrained and minimalistic microcontroller-based demo yet. It won the Oldschool 4k compo at Revision 2011.
Bit banger is built around an ATtiny15 microcontroller, which runs at 1.6 MHz and has 1 kB of flash ROM and a claustrophobic 32 bytes of RAM. / the entire demo is cycle counted.

At a clock rate of 1.6 MHz, the visible part of each line of the VGA signal swooshes by in exactly 36 clock cycles. The entire line, including horizontal blanking, is 51 clock cycles wide. During this time, both graphics and sound must be generated.

His Craft demoboard, based on an ATmega88, is a bit more featureful.

#50 Tue, 19 Mar 2013 - 18:07

Oh, Linus Åkesson's stuff is awesome.

Recently I read about the Macintosh Display Card 8/24 GC, which is a nubus display card that uses an AM29000 RISC CPU to do quickdraw operations on its framebuffer. One of the particularly neat things about it is that it can operate as a NuBus master, which allows it to accelerate other dumb framebuffer cards you might have installed. Very cool! There is a MacTech article about it here.

Ideally I would like to do something a little like that, but I can't seem to get around needing a dual port RAM, and I don't have enough control over the external bus to do a funky workaround. Argh.

#51 Fri, 29 Mar 2013 - 11:51

Gameduino: a game adapter for microcontrollers

(spoiler: it's an FPGA. But it's also cheap-ish, powerful, programmed and ready to go)

Gameduino is a game adapter for Arduino - or anything else with an SPI interface - that has plugs for a VGA monitor and stereo speakers.

video output is 400x300 pixels in 512 colors

all color processed internally at 15-bit precision
compatible with any standard VGA monitor (800x600 @ 72Hz)

background graphics
512x512 pixel character background
256 characters, each with independent 4 color palette
pixel-smooth X-Y wraparound scroll

foreground graphics

each sprite is 16x16 pixels with per-pixel transparency
each sprite can use 256, 16 or 4 colors
four-way rotate and flip
96 sprites per scan-line, 1536 texels per line
pixel-perfect sprite collision detection

audio output is a stereo 12-bit frequency synthesizer

64 independent voices 10-8000 Hz
per-voice sine wave or white noise
sample playback channel

The adapter is controlled via SPI read/write operations, and looks to the CPU like a 32Kbyte RAM. There is a handy reference poster showing how the whole system works, and a set of sample programs and library.

#52 Fri, 29 Mar 2013 - 16:15

Ironically the Gameduino seems to pretty much emulate roughly the capabilities of a Propeller. (Even down to having 32k of RAM. Granted there's probably somewhat more actually available on this for tiles and sprites, since the Prop will be using at least some HUB RAM for the video and sound generation software.)

Still, I can see the attraction. One feature the Propeller could really benefit from is a hardware "SPI slave mode". It's possible to make it be one in software but the Prop is better at generating clocks than following them. (For that matter, it'd be nice if it were somewhat easier to make it follow an external clock for parallel transfers, so it'd be easier to use in place of things like 6522s.) That'd make it easier to use as a coprocessor in more "conventional" computer designs where the Prop doesn't run the whole show.

#53 Sun, 14 Apr 2013 - 12:16

So, I've been wracking my brains trying to come up with a fast way of clocking video data out of a low-end micro without tying up the CPU overmuch.

I was reading up about SPI, as I remembered that was the port Sprite was using on the SE/ARM to direct-drive the Mac's 1-bit video. And about DMA, as that is pretty much all about getting bits in and out of the micro with minimal CPU intervention.

Every micro under the sun seems to have SPI, and all but the very cheapest seem to have at least one or two DMA channels: the Cypress PSoCs for example allow DMA between any port and any internal logic block and in between and vice versa and so forth.

I can't find the relevant piece to quote here, but I recall Gorgonops mentioning that Sprite's SE/ARM (LCP ARM running a b&w compact Mac CRT, acting as an external GPU for another ARM via USB) only achieved four frames per second. Having read this though:

{SPI clock} frequencies are commonly in the range of 1–100 MHz. {*}

I thought to myself, huh? That should be plenty fast enough to drive a compact Mac CRT, so what's the glitch here?

* Yes, I know "clock" =/= bps

I went over to have a closer read of that part of his project, this being the most relevant section (my emphasis added):

I also had some speed issues. The LPC has no problem pushing the pixels to the display quickly enough thanks to the SPI controller I used, and even with the tedious task of fetching the data from the external RAM first, it ran just fine. Problems started appearing when I wanted to implement the USB-interface to actually make the Dockstar write to display RAM: The ARM still had enough power to actually perform the tasks, but I ran into timing issues. Basically, I couldn't handle the USB-transfers quickly enough to be done before I had to write another line to the CRT, thereby throwing off the timings and introducing many ugly glitches in the image. I solved that my creating a routine estimating the time I had left before I had to write another line, and only processing as many bytes as I could do in that time. The disadvantage was that a routine like that introduces a lot of overhead in switching over the DRAM; in the end I could only upload about 4 full frames per second to the GPU. Luckily, implementing RLE acceleration was already planned from the start, making me only hit the 4FPS worst-case-scenario when the complete screen had to be redrawn.

So, thoughts:

It seems a bit like by introducing a second device to act as a GPU, he's actually made this more complex than it needed to be. Now, I get that his desire was to offload video wrangling from the Dockstar, so it could just get on with the task of running smoothly as a server.

But, if that's not your desire, hijacking an SPI output pin from the main device to drive 1-bit video directly seems like a much less clunky approach that should in theory be fast enough to smoothly update a screen - especially if you can drive the SPI from DMA. Ditching the external SRAM (or not using it for video) and using a micro with enough on-die RAM that you can reserve 171 kbits/21kB for video (512*342) should speed things up too, neh?

Going for a smaller screen rez is also an option: QVGA (320x240) LCDs are common, and require only 9.6kB of RAM @1bpp. Alternatively, I *think* you should be able to make the bit width of the Mac screen any arbitrary size without any hardware mods, as it's just pulses to drive the level of the electron gun up and down. 480x342 for example, or 480x320 (half-VGA) with a few blank lines. Even if that's not possible, a smaller screen could be displayed within the 512x342 by just padding it out with zeros at either end.

NB: altering the vertical linecount of the Mac screen is basically not doable.

Alternatively, I wonder why he didn't link the two devices directly via SPI rather than via USB, with all its issues.

One tidbit in particular from the SPI article on wikipedia caught my eye:

Arbitrary choice of message size, content, and purpose

So (and other text on the SE/ARM sort of implies this might have been Sprite's approach), it seems like it should be possible for the micro to do the following:

Set SPI clock to Mac CRT scan rate *
Initiate SPI transfer at the start of each scanline
Order DMA controller to begin transfer of 512 bits from location Y

And you're done.

* independently of CPU master clock, unless I'm reading everything wrong

#54 Sun, 14 Apr 2013 - 12:28

Incidentally, the paragraph following Sprite's quote above:

Later on, I'd changed my mind: it would probably be more challenging but in the end more satisfying and universal to make a kernel frame buffer driver. This way, when I would plug in the GPU, the kernel would recognize it and create a framebuffer device. A framebuffer device is an abstract representation of a graphic card plus display, and there are a lot of programs which can talk to that: MPlayer, image viewing tools and even X.org. Getting X.org running on the device made things much simpler: any program I would want to run, including webbrowsers and Macintosh emulators, could run on top of that without any modifications to the programs themselves. So I took to work, and the result was nice to look at: Firefox, running on my workstation, could render itself to a second X session and into the classic CRT of the Mac.

So there you have something fairly cool - with a $12 LPC dev board and a little soldering, your boxmac is now a plug & play external USB monitor (for *nix/X11 systems only), even without the main CPU in Sprite's SE/ARM. Albeit a somewhat slow one, if you follow his design to the letter. A faster micro - one with USB2.0 device, in particular - should get you something more performant.

#55 Sun, 14 Apr 2013 - 22:28

So (and other text on the SE/ARM sort of implies this might have been Sprite's approach), it seems like it should be possible for the micro to do the following:

Set SPI clock to Mac CRT scan rate *
Initiate SPI transfer at the start of each scanline
Order DMA controller to begin transfer of 512 bits from location Y

And you're done.

* independently of CPU master clock, unless I'm reading everything wrong

So, it's not a bad idea at all. There are a few... semi-gotchyas:

1: Obviously it assumes that you're able to set the SPI speed to exactly the desired pixel clock. How trivially you can do that with a given MCU undoubtedly varies. (It may depend on you using an external clock crystal of an odd frequency so the dividers the hardware offers are appropriate.

2: You'll need a couple GPIO pins and some tightly-coded timing loops to generate the horizontal and vertical sync pulses. Not a huge problem certainly, but the video generation still won't be as "set and forget" as you might like.

3: The last bit. I hesitate to speak "authoritatively" on this point, but... the SPI protocol has "built in" to it the insertion of additional "0" bits in between data bytes. Which means that you can't just get a clean pixel stream out of it; you'll have a bit stuck on every 9th pixel. Here's someone complaining about that.

Note that the thread calls out the solution: at least some microcontrollers use hardware called a USART to accelerate serial bitstreams (including SPI), and if your chosen hardware supports using the USART in "Raw" mode without the SPI overhead you're still okay.

(And in fact, using the USART for this purpose is apparently fairly common for "toy" MCU video setups. Honestly I'd be sort of surprised if Sprite wasn't using a USART for the pixel clocking on his hardware setup... in fact, he says "The LPC1343 had almost everything I needed: enough flash to store a large program in, USB for the connection to the Dockstar, hardware timers to make the timing to the CRT easier and a SPI-port I could abuse to output pixels to the display without too much CPU overhead."... I'm sure by "SPI" he's actually using it in RAW USART mode.)

I suspect the real problem with his design achieving better than the "4FPS" raw framerate is the software overhead from driving the external RAM chip. An MCU that had enough onboard RAM to hold the framebuffer might well do much to solve that problem.

#56 Tue, 16 Apr 2013 - 00:16

Well, he seemed pretty clear that the "GPU" (which is the micro with the external RAM) could handle the video just fine by itself, but that the problems began when linking the two micros up over USB. Are you thinking that it's a combination of the two, and that dropping either of them would have helped?

I've left him a question in the comments over there; I'll pop in again in a bit and see if he's replied.

the SPI protocol has "built in" to it the insertion of additional "0" bits in between data bytes

Ah. Well, that's annoying. Thanks for the link; again, something I will follow up on.

Incidentally, I've gotten interested in building something myself, probably based around the eZ80 AcclaimPlus! micro. (ugh, awful name)

#57 Tue, 16 Apr 2013 - 22:21

Well, he seemed pretty clear that the "GPU" (which is the micro with the external RAM) could handle the video just fine by itself, but that the problems began when linking the two micros up over USB. Are you thinking that it's a combination of the two, and that dropping either of them would have helped?

It's not my design, but from looking from the outside I'd almost definitely "the combination". I don't think the villain here is USB per se (although being limited to USB 1.0 speeds undoubtedly doesn't help), but the fact that he's asking a single-core MCU to multitask pretty hard and it simply doesn't have the resources for it. Here's what he said on the blog:

"Basically, I couldn't handle the USB-transfers quickly enough to be done before I had to write another line to the CRT, thereby throwing off the timings and introducing many ugly glitches in the image. I solved that my creating a routine estimating the time I had left before I had to write another line, and only processing as many bytes as I could do in that time. The disadvantage was that a routine like that introduces a lot of overhead in switching over the DRAM; in the end I could only upload about 4 full frames per second to the GPU.

Note the line about "switching over the DRAM". One thing that may not be particularly obvious about his design until you think about it a bit is that the DRAM isn't really "RAM" so far as the LPC1343 is concerned. It's a random 4-bit wide DRAM device hanging off GPIO pins and thus requires a software loop to "bit-bang" values in and out of it. Without digging into his code I'd be hard-pressed to lay out in detail exactly where the worst problems are, but here's some things to ponder.

1: With the DRAM device being software driven and inadequate framebuffer memory available internally that means that he has to find time multiple times per frame to suck lines of pixels from DRAM into main memory so the USART hardware can clock them out using DMA. (He can't use DMA straight from DRAM.) This takes time, obviously, and it'd be interesting to know if it's something that can be done while the DMA loop is executing. (IE, does the DMA controller cause RAM contention that would prevent the main CPU from filling the buffer that it's reading from while he's reading?)

2: USB has limited granularity with it comes to initiating transactions: You can only start one once per millisecond. (You can transfer a semi- arbitrary amount of data once you've started the transaction, but that's your granularity for starting and stopping one.) A single frame of video at 60hz is only 16.6ms. Remember that even if we're using DMA to push pixels during the 342 active lines of the display you're still going to have be spending some time sending the hsync pulses and setting up the DMA transfers for each line. (Hypothetically maybe the hardware offers a hardware timer able to handle the hsync? Again, I'm too lazy to look at the code.) So somehow you have to find time in all that shuffling data to and from RAM and twiddling hsync pulses to be able to take a transfer block "when you can" and gulp data from USB without breaking the refresh loop. Pure speculation, but I'm guessing what he ended up resorting to is only taking data during the front/back porches. (And being able to do *that* would still basically require being able to either set up timers for vsync and hsync and have them happen automagically, or being able to respond to interrupts to do the needful *during* a USB transfer. It does look like DMA for USB transfers is supported, but perhaps you can't do that at the same time you're using the USART.)

3: If you google for "LPC1343 usb transfer speed" the first few links will point to developers hashing around the various limitations that USB has on that device, including one thread where someone makes a pretty solid case that the best that device can do is around 4mb/s. That's 500Kb/s, which is only about 2/5th of what you'd need for 60 FPS. So even best case that class of device would only give you 24 FPS, USB working non-stop. Divide 24 by 4 and you come up with 6. It's an interesting coincidence that he's getting about 1/6th the theoretical bandwidth the chip is capable of while roughly 1/6th of your average video frame is vsync front/back porch/blanking area. That sort of supports the speculation above that he can't receive data from USB and refresh the screen at the same time, be that because of DMA memory contention, issues with driving his chosen RAM chip, whatever... But, again, it's pure speculation.

I'm sure he's doing about as well as his chosen hardware can handle, I'm totally not criticizing the genius of his design, but... it is an interesting illustration of how a few TTL gates, a hardware shift register, and a few other little bits can make an 8mhz 68000 (Or, heck, a 2Mhz 6502) outperform a 72mhz ARM when it comes to walking and chewing gum at the same time. This is why I'm just generically not that fond of software-generated video solutions. (With the partial exception of the Propeller because it's basically built specifically for the job.)

Incidentally, I've gotten interested in building something myself, probably based around the eZ80 AcclaimPlus! micro. (ugh, awful name)

The eZ80 is cool. The only thing really not to like about it is it only comes in surface mount. (Yeah, yeah, I know, not a problem for these kids today.) I've been thinking about ordering a couple DIP-package Z180s just for kicks; it should be fairly straightforward to substitute for a plain Z-80 in some of those single-board wire-wrap projects I've been looking at and it includes an onboard MMU, DMA controller, and some UARTS that might come in handy and save a chip or two. It's "purely 8 bit" instead of "24 bit" like the ez80, though.

Homebrew microcomputers &amp; video generation

Homebrew microcomputers & video generation