Machine Learning with Scala


Good Example: REST API & Velox PPT, Papre, Github


Construct project, with SBT & dependencies

The reference example project can be found on my github

  • think-bayes [github], [Repository]
    • probability density function of the standard normal distribution & cumulative density function of the standard normal distribution
      1
      - libraryDependencies += "net.ruippeixotog" % "think-bayes_2.11" % "0.1"
  • Other libraryDependencies
    spark-core_2.10, spark-mllib_2.10, jblas
  • Using IntelliJ IDEA to package scala with spark
    • The reason of the Scala for Eclipse is not comfortable in using
    • Support code checking, cvs/ant/maven/git
    • JDK, Scala
  • Censored regression model, contains 5 scala class
    • Main class -> TestTrainer.scala
    • optimization.scala
    • gradient.scala
    • linalg.scala
    • utils.scala
    • build.sbt

Complie

  • Ref to: spark-shell –packages com.databricks:spark-csv_2.11:1.2.0
  • spark-shell –packages net.ruippeixotog:think-bayes_2.11:0.1
  • Using the think-bayes to check the Pdf() and Cdf() functions
    • SBT package for think-bayes with scala 2.10.4
    • ./bin/spark-shell –driver-class-path /home/azureuser/think-bayes-scala/target/scala-2.10/think-bayes_2.10-1.0-SNAPSHOT.jar
    • then, import thinkbayes._

sbt assembly methods

1
2
3
4
5
6
7
8
mergeStrategy in assembly := {
case m if m.toLowerCase.endsWith("manifest.mf") => MergeStrategy.discard
case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => MergeStrategy.discard
case "log4j.properties" => MergeStrategy.discard
case m if m.toLowerCase.startsWith("meta-inf/services/") => MergeStrategy.filterDistinctLines
case "reference.conf" => MergeStrategy.concat
case _ => MergeStrategy.first
}
  • Dependencies
1
2
3
4
5
libraryDependencies ++= Seq(
"com.github.wookietreiber" %% "scala-chart" % "0.5.0",
"nz.ac.waikato.cms.weka" % "weka-stable" % "3.6.13",
"org.apache.commons" % "commons-math3" % "3.5",
"org.specs2" %% "specs2-core" % "3.6.5" % "test")
  • Push source to Github with SourceTree

    • new clone
    • repositry
    • commit
    • push
  • Keen on study on the function of pdf and cdf in censored regression model

    • Self-defined the function using normal distribution function
      • f(x) -> 1/(sqrt(2pi)sigma) exp(-(x-u)^2/2*sigma^2)
      • then f(x) accumulate to cdf