Regular expression to free Alternative
, part 4
Before making RSeq
and RAlt
in the previous post more generic, let’s revisit our match functions.
match :: Regex a -> String -> Maybe a
match r s = listToMaybe $ mapMaybe f $ matchAlt r s
where
f (a, "") = Just a
f _ = Nothing
matchAlt :: RAlt a -> String -> [(a, String)]
matchAlt (RAlt seqs) s = concat [matchSeq seq s | seq <- seqs]
matchSeq :: RSeq a -> String -> [(a, String)]
matchSeq (REmpty a) s = [(a, s)]
matchSeq (RSeq char alt) s = do
(a1, s1) <- matchChar char s
(a2, s2) <- matchAlt alt s1
pure (a2 a1, s2)
matchChar :: RChar a -> String -> [(a, String)]
matchChar (RChar rc a) (c : cs)
| rc == c = [(a, cs)]
| otherwise = []
matchChar _ [] = []
As you can see, types of matchAlt
, matchSeq
and matchChar
are f a -> String -> [(a, String)]
for some type f
. They take a state (String
), and return a list of a pair of a value (a
) and a new state (String
). Yes, it’s a state monad on a list monad. It means that we can write them as f a -> StateT String [] a
. Let’s rewrite them.
matchChar
is the simplest. You can use get
and put
to work with a state.
matchChar :: RChar a -> StateT String [] a
matchChar (RChar rc a) = do
c : cs <- get
guard (rc == c)
put cs
pure a
matchSeq
has already used a list monad, so let’s make it handle state implicitly.
matchSeq :: RSeq a -> StateT String [] a
matchSeq (REmpty a) = pure a
matchSeq (RSeq char alt) = do
a1 <- matchChar char
a2 <- matchAlt alt
pure (a2 a1)
You can write it in Applicative-style, too.
matchSeq :: RSeq a -> StateT String [] a
matchSeq (REmpty a) = pure a
matchSeq (RSeq char alt) = (&) <$> matchChar char <*> matchAlt alt
The last one is matchAlt
. You’ll run each state in RAlt
and concatenate them.
matchAlt :: RAlt a -> StateT String [] a
matchAlt (RAlt seqs) = StateT $ \s -> concatMap (\seq -> runStateT (matchSeq seq) s) seqs
Or, you can take advantage of Alternative
instance of StateT
.
matchAlt :: RAlt a -> StateT String [] a
matchAlt (RAlt seqs) = foldr (\seq states -> matchSeq seq <|> states) empty seqs
match
runs a state monad with a given String
and get a result.
match :: Regex a -> String -> Maybe a
match r s = listToMaybe $ mapMaybe f $ runStateT (matchAlt r) s
where
f (a, "") = Just a
f _ = Nothing
Writing these functions using StateT
itself doesn’t mean anything, but this will take on the meaning later.
Now, let’s revisit RSeq
and RAlt
. Even though they work only with RChar
now, but they don’t depend on RChar
when we build them. I mean, pure
(rEmpty
), (<*>)
(rSeq
), empty
(rNever
), and (<|>)
(rAlt
) don’t use properties of RChar
. This means that we can add one more type parameter of kind Type -> Type
to RSeq
and RAlt
and use it instead of RChar
.
data RSeq f a where
REmpty :: a -> RSeq f a
RSeq :: f b -> RAlt f (b -> a) -> RSeq f a
deriving instance Functor (RSeq f)
newtype RAlt f a = RAlt [RSeq f a] deriving (Functor)
What changes should we need to make matchAlt
and matchSeq
work with any f
? These functions cannot call matchChar
directly now because matchChar
works with RChar
. Instead, we can pass matchChar
as a parameter to these functions.
matchAlt m (RAlt seqs) = foldr (\seq alt -> matchSeq m seq <|> alt) empty seqs
matchSeq _ (REmpty a) = pure a
matchSeq m (RSeq fa alt) = (&) <$> m fa <*> matchAlt m alt
match
will pass matchChar
to them.
match :: Regex a -> String -> Maybe a
match r s = listToMaybe $ mapMaybe f $ runStateT (matchAlt matchChar r) s
where
f (a, "") = Just a
f _ = Nothing
The question is, what type matchAlt
and matchSeq
will be. The concrete type of matchChar
was RChar a -> StateT String [] a
. Are these types generic enough?
matchAlt :: (f a -> StateT String [] a) -> RAlt f a -> StateT String [] a
matchSeq :: (f a -> StateT String [] a) -> RSeq f a -> StateT String[] a
Unfortunately, these types don’t work because we need m
to be polymorphic so that we can apply it to the first part of RSeq
whose type is b
, and the second part whose type is b -> a
. Let’s make it polymorphic.
matchAlt :: (forall x. f x -> StateT String [] x) -> RAlt f a -> StateT String [] a
matchSeq :: (forall x. f x -> StateT String [] x) -> RSeq f a -> StateT String[] a
But when you look at the implementations of matchAlt
and matchSeq
, you’ll find that they only use methods of Applicative
and Alternative
. They should work not only with StateT String []
, but with any Alternative
. Their types will be these types finally.
matchAlt :: (Alternative g) => (forall x. f x -> g x) -> RAlt f a -> g a
matchSeq :: (Alternative g) => (forall x. f x -> g x) -> RSeq f a -> g a
This means that matchAlt
can turn RAlt f a
to any g a
where g
is an instance of Alternative
.
For example, let’s define a function listChar
of type RChar a -> [a]
. This type matches (Alternative g) => f x -> g x
.
listChar :: RChar a -> [a]
listChar (RChar _ a) = [a]
With a regular expression regex
which expresses (a|b)c|d(e|f)g
,
regex :: Regex String
regex = (<>) <$> (rChar 'a' <|> rChar 'b') <*> rChar 'c' <|> (\a b c -> a <> b <> c) <$> rChar 'd' <*> (rChar 'e' <|> rChar 'f') <*> rChar 'g'
matchAlt listChar regex
returns ["ac","bc","deg","dfg"]
. This is a list which will match regex
. We can use regex
to match a regular expression as well as list strings that match it. Unfortunately, matchAlt listChar regex
doesn’t work when regex
contains many
because it produces an infinite number of strings.
We’ll put everything together in the next post.