Fuse-cdfs
From BononWiki
Contents |
Introduction
Hi,
I've been busy writing a decoder fs, but decided to first write a cdrom read fs, to read tracks from an audio cd. This audio cdrom fs is a first step of programming a thread safe reader in a filesystem, which I consider as easier than writing a decoder fs.
It uses different techniques to smoothen the read process.
Cache
In stead of reading the data from a cdrom after a read call directly, it uses a cache.
For example, the first track ( which gets the name track-01.wav) is cached in a cached directory, like $HOME/.cache/fuse-cdfs/.
The cached file has the exact same size as the file on the cdrom. It's the purpose to make the cdromreader write everything it's reads from the first track to this file (of not alreadt written before).
A read call will first check the data is already present in the cached file. If that's the case (everything from offset to offset+size-1) it will take that, which is of course very fast. If it's not it sends a "read command" to read the missing sectors from the cdrom.
Special threads
To read the data there is one "cdromreader" thread. A queueu is used to send readcommands to it, and another thread is used to manage the cache.
An administration of already read bytes is used. This is a double linked list of intervals which are already read and present in the cached file. The cached intervals use sectors as startpoint. Little extra complication is the header, which has to be added cause it's not present in the tracks on the cd.
If I've got this ready I will use these concepts, like:
- a fixed amount of worker threads, using a queue
- cache everything
- an administration of already read and cached data
For a decoder there can be more worker threads.
And you can choose it by default will only read the first x bytes ( to offer a buffer to apps which look at the first x bytes to get metadata about the file).
News
23 September 2011
Rewritten the code on some important issues:
. added a seperate thread to manage the cache
. added some extra structs like
struct read_call_struct struct read_command_struct struct read_result_struct
A read call will start with a read_call struct, to have a common reference for other struct (read_command and read_result). The read call will send (if not found in cache) one or more "read_command" structs to the cdrom reader using a queue. The cdrom reader thread will try to read the requested sectors from the cdrom. Because the cdrom reader is reads sectors in batches of a certain size (on my machine 25) one read_command will result in more than one read_result. The relations look like:
read_call 1:n read_command 1:n read_result
Every result the cdrom reader gets, is send directly to the cache manager thread. Here also a queue is used to send (by the cdrom reader) and receive ( by the cache manager) read results. When received the cache manager will write the bytes read to the cache, update the cache administration and send a broadcast signal to waiting clients (=the original read call) that bytes have become available in the cache. This thread will test it's enough for the original read request. If this is the case, it reads the bytes, if not it will wait.
Orphaned read results
The fs is using a timeout for the read call. When this timeout is expired the read call will return with an error, and the read_call is destroyed, while there still read_commands/read_results "out there", which will result in an error. I have no solution yet.
I'm thinking about making the struct read_call a pointer, which is not freed when the fs read call is finished.
Howto get it
Source is available at Gitorious:
git clone git://gitorious.org/fusemodules/fuse-cdfs.git fuse-cdfs cd fuse-cdfs
Required are of course FUSE (I'm using version 2.8.5) and libcdio, version 0.80 with cd-paranoia support.
Links on LinuxFromScratch:
To build, just:
make
and use the executable fuse-cdfs wher you like.
Notes:
. the make command creates warnings. (2011-08-11)
Some notes
At this moment (2011-08-11) it works very good. After the fs is mounted like:
/home/sbon/Programmeren/fuse-cdfs/fuse-cdfs --device=/dev/sr0
--cache-directory=/home/sbon/.cache/fuse-cdfs/
--logging=3
--progressfifo=/home/sbon/tmp/leesvoortgang /home/sbon/testmount
This command does:
- mounts the cdrom reader fs at /home/sbon/testmount
- reads the cdrom present in /dev/sr0
- use loglevel 3 (logmessages are written to syslog, loglevels are from 0-3, 0: no logging)
- use the directory ~/.cache/fuse-cdfs to store cached files and
- write progress (percentage) to the fifo ~/tmp/leesvoortgang (leesvoortgang: dutch for readprogress ...)
When entering /home/sbon/testmount, you'll see all the tracks found on the cdrom. The tracks are called track-xx.wav, with xx of course the number of the track. In the cache the exact same file is created.
Progressinfo
The progressinfo is available to read from the fifo, with a script like:
Extended Attributes
Like in most of my FUSE fs's, Extended Attributes are used to get/set various settings:
. logging
cd %mountpoint of fuse-cdfs% setfattr --name system.workspace_logging --value 3 .
This will set the loglevel to 3. which is the maximum loglevel. Setting it to zero means no logging.
. hiding of entries
setfattr --name system.workspace_entry_hide --value 1 track-xx.wav
will hide the entry, eg it will not show after a readdir call, but it's still present, and can be read from.
setfattr --name system.workspace_global_nohide --value 1 .
will disable the hiding of entries globally. So no entry is hidden.
One of the things I want to add is the getting/setting of caching/readahead behaviour.
TODO/Improvements
I'm not completely happy yet. Some issues:
- there is some extra code (xattr code for example) which has no function (yet).
- some warnings during compliation. most of them can be "fixed"
- the readprocess it now like read the track in one batch. I want to change this into more flexible readahead, like the reading/readahead stops when there are no reads done for some time.
- the cached data has to be reread after unmount. I want to look for a way to keep this data (a db?).
- have an unique id per track, which can be determined before reading the whole track. I got a suggestion to use libofa0
Contact
I've consulted the libcdio maillist about this fs: