Vous êtes sur la page 1sur 22

A Wander through GHC’s

New IO library
Simon Marlow
The 100-mile view
• the API changes:
– Unicode
• putStr “A légpárnás hajóm tele van
angolnákkal” works! (if your editor is set up
right…)
• locale-encoding by default, except for
Handles in binary mode (openBinaryFile,
hSetBinaryMode)
hSetEncoding :: Handle -> TextEncoding -> IO ()
• changing the encoding on the fly
hGetEncoding :: Handle -> IO (Maybe TextEncoding)

data TextEncoding
latin1, utf8, utf16, utf32, … :: TextEncoding
mkTextEncoding :: String -> IO TextEncoding
localeEncoding :: TextEncoding
The 100-mile view (cont.)
• Better newline support
– teletypes needed both
CR+LF to start a new
line, and we’ve been
paying for it ever since.
hSetNewlineMode :: Handle -> NewlineMode -> IO ()

data Newline = LF {- “\n” –} | CRLF {- “\r\n” -}


nativeNewline :: Newline

data NewlineMode = NewlineMode {


inputNL :: Newline,
outputNL :: Newline }

noNewlineTranslation = NewlineMode { inputNL = LF, outputNL = LF }


universalNewlineMode = NewlineMode { inputNL = CRLF, outputNL =
nativeNewline }
nativeNewlineMode = NewlineMode { inputNL = nativeNewline,
outputNL = nativeNewline }
The 10-mile view
• Unicode codecs:
– built-in codecs for UTF-8, UTF-16(LE,BE),
UTF-32(LE-BE).
– Other codecs use iconv on Unix systems
– Built-in codecs only on Windows (no
code pages)
• yet…
– The pieces for building a codec are
provided…
The 10-mile view
• Build your own codec: API in
GHC.IO.Encoding
data BufferCodec from to state = BufferCodec {
encode :: Buffer from -> Buffer to -> IO (Buffer from, Buffer to)
close :: IO ()
Saving and restoring state is
getState :: IO state
important since Handles support
setState :: state -> IO ()
buffering, random access, and
}
changing encodings
type TextEncoder state = BufferCodec Char Word8 state
type TextDecoder state = BufferCodec Word8 Char state

data TextEncoding = forall dstate estate . TextEncoding {


mkTextDecoder :: IO (TextDecoder dstate)
mkTextEncoder :: IO (TextEncoder estate)
}
The 1-mile view
Type class providing I/O
• Make your own Handles! device operations: close,
seek, getSize, …
Type class providing
buffered reading/writing
mkFileHandle :: (IODevice dev, Typeable, in case we
BufferedIO dev, need to take the Handle
– why Typeable dev)
mkFileHandle,
=> dev
not mkHandle?
apart again later
For error messages
-> FilePath
-> IOMode
-> Maybe TextEncoding ReadMode/WriteMode/…
-> NewlineMode
-> IO Handle
IODevice
-- | I/O operations required for implementing a 'Handle'.
class IODevice a where

-- | closes the device. Further operations on the device should


-- produce exceptions.
close :: a -> IO ()
Default is for the
-- | seek to the specified positing in the data. operation to be
seek :: a -> SeekMode -> Integer -> IO () unsupported
seek _ _ _ = ioe_unsupportedOperation

-- | return the current position in the data.


tell :: a -> IO Integer
tell _ = ioe_unsupportedOperation

-- | returns 'True' if the device is a terminal or console.


isTerminal :: a -> IO Bool
isTerminal _ = return False

… etc …
BufferedIO
class BufferedIO dev where
newBuffer :: dev -> BufferState -> IO (Buffer Word8)

fillReadBuffer :: dev -> Buffer Word8 -> IO (Int, Buffer Word8)


fillReadBuffer0 :: dev -> Buffer Word8 -> IO (Maybe Int, Buffer Word8)

emptyWriteBuffer :: dev -> Buffer Word8 -> IO (Buffer Word8)


flushWriteBuffer :: dev -> Buffer Word8 -> IO (Buffer Word8)
flushWriteBuffer0 :: dev -> Buffer Word8 -> IO (Int, Buffer Word8)

Device gets to allocate the buffer.


This allows the device to choose the
buffer to point directly at the data in
memory, for example.

0-versions are non-blocking, non-0


versions must read or write at least
one byte (but may transfer less than
the whole buffer)
RawIO
-- | A low-level I/O provider where the data is bytes in memory.
class RawIO a where
read :: a -> Ptr Word8 -> Int -> IO Int
readNonBlocking :: a -> Ptr Word8 -> Int -> IO (Maybe Int)
write :: a -> Ptr Word8 -> Int -> IO ()
writeNonBlocking :: a -> Ptr Word8 -> Int -> IO Int

readBuf :: RawIO dev


=> dev -> Buffer Word8 -> IO (Int, Buffer Word8)

readBufNonBlocking :: RawIO dev


=> dev -> Buffer Word8 -> IO (Maybe Int, Buffer Word8)

writeBuf :: RawIO dev


=> dev -> Buffer Word8 -> IO ()

writeBufNonBlocking :: RawIO dev


=> dev -> Buffer Word8 -> IO (Int, Buffer Word8)
Example: a memory-mapped
Handle
• Random-access read/write doesn’t
perform very well with ordinary
buffered I/O.
– Let’s implement a Handle backed by a
memory-mapped file
– We need to
1. define our device type
2. make it an instance of IODevice and
BufferedIO
3. provide a way to create instances
Example: memory-mapped
files
1. Define our device type
Ordinary file descriptor,
data MemoryMappedFile = provided by GHC.IO.FD
MemoryMappedFile {
Address in memory where
mmap_fd :: FD, our file is mapped, and its
mmap_addr :: !(Ptr Word8), length
mmap_length :: !Int,
mmap_ptr :: !(IORef Int)
} The current file pointer
deriving Typeable (Handles have a built-in
notion of the “current
position” that we have to
emulate)

Typeable is one of the


requirements for making
a Handle
aside: Buffers
module GHC.IO.Buffer ( Buffer(..), .. ) where

data Buffer e
= Buffer {
bufRaw :: !(ForeignPtr e),
bufState :: BufferState, -- ReadBuffer | WriteBuffer
bufSize :: !Int, -- in elements, not bytes
bufL :: !Int, -- offset of first item in the buffer
bufR :: !Int -- offset of last item + 1
}

Data

bufRa b b bufSi
w ufL ufR ze
Example: memory-mapped
files
2. (a) make it an instance of
BufferedIO
instance BufferedIO MemoryMappedFile where
newBuffer m state = do
fp <- newForeignPtr_ (mmap_addr m)
return (emptyBuffer fp (mmap_length m) state) fillReadBuffer returns
the entire file!
fillReadBuffer m buf = do
p <- readIORef (mmap_ptr m)
let l = mmap_length m
if (p >= l)
then do return (0, buf{ bufL=p, bufR=p })
else do writeIORef (mmap_ptr m) l flush is a no-op: just
return (l-p, buf{ bufL=p, bufR=l }) remember where to
read from next
flushWriteBuffer m buf = do
writeIORef (mmap_ptr m) (bufR buf)
return buf{ bufL = bufR buf }
Example: memory-mapped
files
2. (b) make it an instance of IODevice
instance IODevice MemoryMappedFile where
close = IODevice.close . mmap_fd

seek m mode val = do


let sz = mmap_length m
ptr <- readIORef (mmap_ptr m)
let off = case mode of
AbsoluteSeek -> fromIntegral val
RelativeSeek -> ptr + fromIntegral val
SeekFromEnd -> sz + fromIntegral val
when (off < 0 || off >= sz) $ ioe_seekOutOfRange
writeIORef (mmap_ptr m) off

tell m = do o <- readIORef (mmap_ptr m); return (fromIntegral o)

getSize = return . fromIntegral . mmap_length

… etc …
Example: memory-mapped
files
3. provide a way to create instances
mmapFile :: FilePath -> IOMode -> Bool -> IO Handle
mmapFile filepath iomode binary = do

(fd,_devtype) <- FD.openFile filepath iomode


sz <- IODevice.getSize fd
addr <- c_mmap nullPtr (fromIntegral sz) prot flags (FD.fdFD fd) 0
ptr <- newIORef 0

let m = MemoryMappedFile {
mmap_fd = fd, Open the file and mmap() it
mmap_addr = castPtr addr,
mmap_length = fromIntegral sz,
mmap_ptr = ptr }
Call mkFileHandle to build
the Handle
let (encoding, newline)
| binary = (Nothing, noNewlineTranslation)
| otherwise = (Just localeEncoding, nativeNewlineMode)

mkFileHandle m filepath iomode encoding newline


Demo…
$ ./Setup configure
Configuring mmap-handle-0.0...
$ ./Setup build
Preprocessing library mmap-handle-0.0...
Building mmap-handle-0.0...
[1 of 1] Compiling System.Posix.IO.MMap ( dist/build/System/Posix/IO/MMap.hs,
dist/build/System/Posix/IO/MMap.o )
Registering mmap-handle-0.0...
$ ./Setup register --inplace --user
Registering mmap-handle-0.0...
$ ghc-pkg list --user
/home/simonmar/.ghc/x86_64-linux-6.11.20090816/package.conf.d:
mmap-handle-0.0
Demo…
$ cat test.hs
import System.IO
import System.Posix.IO.MMap
import System.Environment
import Data.Char

main = do
[file,test] <- getArgs
h <- if test == "mmap" then mmapFile file ReadWriteMode True
else openBinaryFile file ReadWriteMode

sequence_ [ do hSeek h SeekFromEnd (-n)


c <- hGetChar h
hSeek h AbsoluteSeek n
hPutChar h c
| n <- [ 1..10000] ]

hClose h
putStrLn "done"
$ ghc test.hs --make
[1 of 1] Compiling Main ( test.hs, test.o )
Linking test ...
Timings…

$ time ./test /tmp/words file


done
0.24s real 0.14s user 0.10s system 99% ./test /tmp/words file
$ time ./test /tmp/words mmap
done
0.09s real 0.09s user 0.00s system 99% ./test /tmp/words mmap
$ time ./test ./words file # ./ is NFS-mounted
done
10.44s real 0.20s user 0.52s system 6% ./test tmp file
$ time ./test ./words mmap # ./ is NFS-mounted
done
0.10s real 0.09s user 0.00s system 93% ./test tmp mmap
More examples
• A Handle that pipes output bytes to
a Chan
• Handles backed by Win32 HANDLEs
• Handle that reads from a
Bytestring/text
• Handle that reads from text
The -1 mile view
• Inside the IO library
– The file-descriptor functionality is
cleanly separated from the
implementation of Handles:
• GHC.IO.FD implements file descriptors, with
instances of IODevice and BufferedIO
• GHC.IO.Handle.FD defines openFile, using
FDs as the underlying device
• GHC.IO.Handle has nothing to do with FDs
Implementation of Handle
Existential: packs up the
IODevice, BufferedIO,
Typeable dictionaries,
and codec state is
existentially quantified

data Handle__
= forall dev enc_state dec_state . (IODevice dev, BufferedIO dev, Typeable dev) =>
Handle__ {
haDevice :: !dev,
haType :: HandleType, -- read/write/append etc.
haByteBuffer :: !(IORef (Buffer Word8)),
haCharBuffer :: !(IORef (Buffer CharBufElem)),
haEncoder :: Maybe (TextEncoder enc_state),
haDecoder :: Maybe (TextDecoder dec_state),
haCodec :: Maybe TextEncoding,
Two buffers: one for
haInputNL :: Newline,
bytes, one for Chars.
haOutputNL :: Newline,
.. some other things ..
}
deriving Typeable
Where to go from here
• This is a step in the right direction, but there
is still some obvious ugliness
– We haven’t changed the external API, only
added to it
– There should be a binary I/O layer
• hPutBuf working on Handles is wrong: binary Handles
should have a different type
• in a sense, BufferedIO is a binary I/O layer: it is
efficient, but inconvenient
– FilePath should be an abstract type.
• On Windows, FilePath = String, but on Unix, FilePath
= [Word8].
– Should we rethink Handles entirely?
• OO-style layers: binary IO, buffering, encoding
• Separate read Handles from write Handles?
– read/write Handles are a pain

Vous aimerez peut-être aussi