"can NN be applied to X class of problems with no human-added domain specific knowledge"
Universal approximation theory suggest that for a significant class of problems, the answer to this is "obviously" [1]. The problem that remains is how effective is the learning.
I'm not convinced applying them to areas like this one where the are clearly much better approaches teaches us anything useful for more interesting cases, but maybe it does.
[1] NB it's not immediately clear that OPs problem is in that class.
Maybe the useful thing here is that Sudoku is a problem space a lot of people understand pretty well even though it's not trivial. That makes a nice domain for thinking about and understanding NN. I guess that people who are familiar with the problem space of solving Sudoku and similar problems but who don't know much about NN will find this pretty interesting but people who already understand NN pretty well (or aren't familiar with solving puzzles like Sudoku) won't.
I think your guess is reasonable. It's the reason I wasn't very prescriptive: i.e. NN is not a very sensible approach if the goal is a great Sudoku solver, but the exercise might have other value I just can't immediately see.
Universal approximation just says there exists a solution, not that there's a reasonable way to find it with a bounded amount of training data. The networks used in proofs typically have contrived and impractical architectures.
The problem that remains is how effective is the learning.
The distinction I'm trying to draw is a tiny bit nuanced - since we know NN's are broadly applicable if we can figure out how to train them, the posters "can NN be used here" is really "can we figure out how to train it". My question is, since Sudoko solution obviously has better, non NN approaches, does spending time on that lead to anything generally useful, or would you be better off spending the same time working out how to train a NN on a more appropriate problem domain?
> I'm not convinced applying them to areas like this one where the are clearly much better approaches teaches us anything useful for more interesting cases, but maybe it does.
I don't think it teaches us much about sudoku, but any formerly 'hard' problem that we've solved using traditional means is perfect for rapidly generating arbitrary amounts of training data, which makes it very useful for learning about neural nets.
I'm not convinced applying them to areas like this one where the are clearly much better approaches teaches us anything useful for more interesting cases, but maybe it does.
[1] NB it's not immediately clear that OPs problem is in that class.