Old Notes circa 2020 - Haskell, Stats, Data Science, Bitcoin, etc

Push a local git repo to a server

1## Create a bare repo on the server
2git init --bare
3## Add the remote origin on the local server
4git remote add origin shu@linode:git-repos/docs
5## Push to the server with update option
6git push -u origin master

git-annex workflow for me

  • Still setup a bare git repo in the cloud:

    1git init --bare
    2git annex init "cloud"
    
  • Clone the clound repo to your laptop/desktop

    1git clone shu@linode:annex-repo annex-repo-laptop
    2git annex add .
    3git commit -a -m "checking in on the laptop"
    4git annx sync --content
    

IT tips

  • Curl: curl -L your_url > your.file : use -L to follow redirections

  • tar: tar -xjvf xxx.tar.bz2: use -j to uncompress bz2 files; -z for gz files.

  • Kill a bunch of processes : ps -ef | grep R | awk '{print $2}' | xargs kill -9

Vim tips

  • zM to open all folds; zR to fold everything; za to toggle.

Hakyll Notes

  • Provider is the list of files under the source dir, plus a in-memory store
  • ID is the cleaned up version of file path
  • Rule is a RWST monad, it's env has provider/IDs/pattern, it's output is a rule set which has routes and compilers
  • There are two entries in the store for each item: one for the body and one for the meta data
  • Meta data itself has two parts: Ymal meta data if any, and modified time.

Postgres commands

1    \l: list database
2    \c <db name>: connect to a database
3    \dt: show tables
4    \d <table name>: show schema of a tble
5    copy (select * from projects_project where id = 640) to '/tmp/640.csv';

Process a file line by line

1while read i ; do git clone  "$i" ; done < selected.github

Existential types

1(exists x. p x x) -> c  forall x. p x x -> c

It tells us that a function that takes an existential type is equivalent to a polymorphic function. This makes perfect sense, because such a function must be prepared to handle any one of the types that may be encoded in the existential type. It’s the same principle that tells us that a function that accepts a sum type must be implemented as a case statement, with a tuple of handlers, one for every type present in the sum. Here, the sum type is replaced by a coend, and a family of handlers becomes an end, or a polymorphic function.

Free Moniod

  1. Constructed from the generator set, and only identify the generated tuples with unit law and associativity law. For example (0, 3) is same as (3, 0), and (1, (2, 3)) is same as ((1, 2), 3).
  2. The intution is that we don't collapse them further using the binary operator, as we would identify (0, 3) with (1, 2). We will leave this step open, then different operators will generate different moniods.
  3. In essence free monoid is a list.

Distributions

  1. Gamma is the sum of exponentials; and Beta is the ratio of two gammas.
  2. Geometric is the special case of negative binomial where the success is one.
  3. Hyper Geometric is almost like negative bionomial without replacement.
  4. Pareto is the exponential gamma mixture -- the prior on the rate parameter follows gamma distribution.

End and Natrual Transformation

  1. Yoneda lemma in Haskell as an End:
1forall x. (a -> x) -> f x == f a

Think of (endo)functor f as a containter, it says if you give me a function from a to x, and I can always give you a container of x, then this function or transformation equvialently has a container of a.

  1. Yoneda lemma in Haskell as a Coend:
1Exists x. ((x -> a),  f x) == f a

Think of (endo)functor f as a containter, it says if you have one pair of a function from x to a and a container of x, then I just simpley fmap the function and give you a container of a.

Javascript notes

  1. JS's object model is prototype based, not class based. There is no notion of class in JS. And in essence object is simply a list of key-value pairs.
  2. So the inheritance works via prototype: if a property is not found in the current object, it tries to find out in the prototype (parent) object, all the way up to the null.
  3. To implement this design, each object has __prototype__ reference to it's parent object.
  4. In JS function is the first class citizen, and in this sense it's sort of a functional language.
  5. Function is always an object, with its __prototype__ ponits to an object whose constructor points right back to the function.
  6. You can of course add properties to the prototype of a function, which will be shared by all the objects created with this function/object.
  7. On the other hand, if you need object-specific properties, you can use contructors in the function.

Modern C++

  • Use auto and decltype
  • User uniform initializer with { }
  • Be specific and uniform with default *structors ( 6 of them ): default, delete, or custom.
  • Specify the owner of resources: owners are responsible for the life cycle management of the underlying resources, non-owners are just users of resources.
  • Use handle model instead of raw pointers
  • Only use raw pointers for dynamic dispatching
  • Only user pointers to point to ONE object, not a sequence of objects; in other words, stay away from pointer arithmetic
  • Whener use a pointer, carry the size info with it: span, view.
  • Use zstring for c-style zero terminated string
  • Use in-member initialization first, then initialization list (in the right order), and lastly assignment within the constuctro.
  • DRY in constructor: call/delgate to simpler constructors in more complex ones
  • Use range-based for loop whenever possible: much better than the old style for loop

Effective Modern C++

  • auto uses almost as the same rules of type inference of templating
  • 3 cases of inference rules: parameter is reference & point; universal reference; plain type
    • reference & pointer: remove the referenceness of expressoion, then match with parameter
    • universal reference: if expr is lvalue, then the type is lvalue, otherwise use rule #1
    • plain type: losing constness, referenceness, voliteness..
  • auto is almost what you want, except for proxy types, where you need to do a static cast.
  • Prefer using long-or-dependent-type your-short-type to typedef.
  • Use { } initializer with care: compiler has strong preference for std:initailizer_list based constructor. So if this is a concern, use old styles ( ) way of calling constructors. On the other hand, uniform initialization is very handy to initialize containers.
  • Use auto for function pointer
  • Use nullptr instead of 0 or NULL
  • Use delete if you want to disable some functions
  • Use override in derived class for virtual functions

More Effective Modern C++

  • Copy constuctor is used to initialize a new object; assignemnet operator is to overwrite the lhs object with the rhs one. Same for move semantics counterparts.
  • Explictly use default an delete for all 5/6 members that could be auto generated by the compiler.
  • Use unique_ptr for most owner pointers; use make_unique_ptr in C++14
  • Use weak_ptr to detect dangling pointer, implementing cache, etc.
  • std::move unconditionally casts the parmater to a rvalue
  • std::forward<T> conditionally casts: only when the parameter is a rvalule.
  • && could be a rvalue ref or a universal one
  • Universal one is used for type deuction, in auto or template functions

Even More Effective Modern C++

  • Use high-level async/future/promise abstraction for concurrent programming
  • Only use low level std:thread if you need to finer control like priority setting
  • Don't use raw mutex, use scoped_guard instead
  • Understand 3 levels of threads: hard, soft, and language-level
  • Undertand joinable vs unjoinable threads

keyword static

  • static is confusing since it means different things in differernt contexts.
  • Static variable:
    • Static global variable: static means no access outside the file where it's defined; In this sense it is the opposite of extern. By default, constant global variables are static by default (you can use explicit extern to override it), and non-constant global variables are extern by default (you can use static to override it)
    • Static local variable: static changes the duration of the variable - only one copy during the whole life cycle of the program. That's usually meaning of being static.
  • Static function: this has the similiar meaning as that of global variable: static says it's for internal linkage only, can't be accessed outside the defining file.
  • Static for class member and class member functions: one copy, no this pointer passed in.

Haskell Exceptions

  • There are two types of exs: sync and async.
  • For sync exs, they are generated by throw (from pure code) or throwIO. And there are two strategies for handling them:
    • clean up and rethrow: onException, finally, bracket
    • try to recover: catch, handle, try.
  • For async exs, they are generated by throwTo - it's the message from the outside world to your, and you have to rethrow it. If you try to recover from it, then you are beaking this invariance.
  • You can tell it's sync or async by the tyep info, but it's not guranteed.
  • Package Control.Exception.Safe provides wrappers around functions in Control.Exception, so use the safe ones.

Functor, Applicative, Monad, Monoid, Traversable

  • The essence of all those abstractions is deal with effects: instead of having a plain function a -> b, we are dealing with a effectfull function a -> F b where F captures the effect -- having a writer monad in mind helps.
  • Kleisli arrow a -> M b forms a category (of arrows), with the usual (.) as the composition thus as the bind >>= operator
  • You can reduce the bind operator to join operator M M a -> M a, and think this as the mappend of a monid (of endofunctors), you get the saying of Monad is just a moniod of endofunctors.
  • If we can not get a full monad, in other words, we can not define a bind operator for the arrow a -> F b, they still form a catgory (of arrows), one quesiton we can ask is that: can we get some interesting functor for this category ? How about a functor that can lift the arrow into T a -> F (T b) ? It turns out this special functor is called traversable, where the method traverse is like the fmap for a normal functor.
  • Note there are two functors invovled for traverse: an applicative functor F and a traversable functor T. Therefore there two ways of composing them: F . T or T . F, and the natrual transformation between them is defined as sequenceA.

Two examples with traversable

  • Most commonly used traversable is a list: and the behavior of list traversing is defined as applicative style fold of :, i.e., traverse f = foldr cons_f (pure []) where cons_f x ys = liftA2 (:) (f x) ys

  • The most straight forward example is for side effects:

    1traverse (putStr . show) [1..10]
    
  • The applicative behavior of a list is the full combination: liftA2 f xs ys = [f x y | x <- xs, y <- ys]

    1rep x = replicate x x
    2traverse rep [1..3]
    3--[[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
    
  • The applicative behavior of a Maybe is aborting the moment one input is Nothing

    1half x = if even x then Just (x `div` 2) else Nothing
    2traverse half [2,4..10]
    3--Just [1,2,3,4,5]
    4traverse half [1..10]
    5-- Nothing
    

Arrows

  • Arrow is different from monad, where monad is used to compose effectful functions, arrow is to compose steps.
  • If you look at the signature of an effectful function a -> M b, a monad is only parameterized by the type of output, thus has no control of the input.
  • On the other hand, an arrow takes two type parameters: arr (a -> b), so in this sense arrow is more general.
  • While the essential behavior of a monad is its bind operator, arrow needs two:
    • >>>, the usual composition operator
    • One of those: ***, &&&, first, or second. This is need to forward output from previous steps

Examples of math in markdown

Hi $z = x + y$.

$$a^2 + b^2 = c^2$$

$$ \begin{vmatrix}a & b\ c & d \end{vmatrix}=ad-bc $$

GPG usage

  1. gpg —sign : compress the doc first, then sign it, and it has both the compressed doc and the signature.
  2. gpg —detach-sign: only generated the signature.
  3. gpg —clearsign: putting the clear text and the signature in one file

Bitcoin Wallet Standards

  1. BIP 39 defines HD wallet, the “root seed" is generated from 24 nemonic words + an optional password.
  2. A cool idea is like the hidden volume of TrueCrypt(VeraCrypt): an intentionally design password will lead to the private key of small wallet, while the main wallet is controlled by another password, both use the same set of 12 words.
  3. Three piece of info is used to generate a child: parent private key(256 bit), parent chain code(256 bit), and an index (32 bit)
  4. Here is the cool part: an extended private key is private key + chain code, and an extend public key is pubic key + chain code, and all the public addresses can be generated from the extended public key without knowing the private key.

Bitcoin Address

  1. Private key k is just a 256-bit random number.
  2. Public K is derived from private k: K = k * G, where G is one fixed point on the elliptic curve, which is defined by y 2 == x 3 + 7 mod by p, a large prime number.
  3. Elliptic curve addition: P3 = P1 + P2, the line between P1 and P2 intersect with the curve once at P3’ == (x, -y).
  4. Elliptic curve multiplication: the point -2G is the interaction of the tangent off point G with the curve, point 2G is the minus of -2G, so the multiplication works like this: G -> -2G -> 2G -> -4G…..
  5. Therefore public key K is a coordinate on the curve (x, y).
  6. An uncompressed K is just 04(prefix) x y.
  7. Since we only need the x coordinate to get the full public key, i.e., y = srqt(x ** 3 + 7) mod p, we only need x coordinate plus the sign of y coordinate, so compressed K is either prefix 02 x (even, positive), or prefix 03 x (odd, negative).
  8. An “compressed” private key is a misnomer, it’s actually longer than an uncompressed private key: it has a 01 suffix added. This is needed to tell the wallet software that uncompressed private key is intended to derive uncompressed public keys, while compressed private keys are intended to derive compressed public keys.
  9. A bitcoin address is a double hash from a public key: address == RIPEMD160(SHA256(K)).
  10. A WIF (wallet import format) is a Base58 encoded address: Base50 takes out a few confusing Ascii codes like (0, O, l, I) from the set, and also added a prefix and a checksum at the end.
  11. A WIP uncompressed address == base58enc( 00 address checksum), which starts with 1 for P2PKH (3 for P2SH)
  12. A WIP uncompressed private key == base58enc(0x80 private), which starts with 5.
  13. A WIP compressed address == base58enc(00 compressed address checksum), which still starts with 1.
  14. A WIP compressed private key == base58enc(0x80 private 01), which changes the starting char from 1 to K or L.

RL Notes

  1. The framework of RL is a MDP: state, action, and reward.
  2. A policy is a set rules, think of it as transition matrix and associated rewards.
  3. The main task in RL is to learn the value factions: V(s), and Q(s, a)
  4. Bellman equation is the key idea here: for a know model, DP is just the iterative implementation of Bellman equation.
  5. Two problems with RL: prediction and control.
  6. Prediction is to calculate values for a given policy: it differs from algorithm to algorithm.
  7. Control is to search from optimal value: it’s pretty much the same across different learning algorithms.
  8. For training from episodes, MC can be used to calculate the sample average of values, just by walking through all the steps for each episode.
  9. TD, or TD(0), one-step prediction, is the key idea in RL, it combines the DP (bootstrapping) and MC( learning from samples).
  10. For control problem: there is on-polocy vs off-policy.

Miscs

There is no such thing as a waste of time if:

  • you are not day dreaming, i.e., discursive thoughts in the past or future
  • you are not feeling bored: feeling bored means you are NOT paying attention
  • you are not consuming senseless entertainment
  • then everything you do is just cross training, more or less. Take your time and engage in everything you do (wholesome, of course), cultivate whatever skills needed to do a beautiful job at hand.

What’s the practice all about? It’s about two things:

  • Make complex things simple, by investigation and deconstruction
  • Make simple things easy, by being mindful and still

Simple vs easy:

  • simple: untangled, not interwined, orthogonal, composible
  • easy: handy, familiar, relative to people in question

Fundations of healthy living:

  • Respect your Circadian Rythm
  • No fuss over with food, be primal and occassional slipup is fine
  • Taste the true hunger
  • No fuss over with training, some strength training, and have fun
  • Move daily, slowly, walk a lot
  • Slow down the time, no rush, live in the moment
  • Boredom and fear are evils

The culture of hacking:

  • solve new problems once only, with competency, and enjoy it, share it, and do everything on your terms. – Eric Raymond

Ajanh Braham’s meditation insights:

  • It’s all about working on the five hindrances to get to high level of concentration
  • Sensual desires: get rid of the five senses
  • ill will: compassion for everyone; be kind to yourself, forgive all your past unskillful things you’ve done (Even a serial killer can be enlightened, remember this!); And most importantly be kind to your meditation object; think of your breath as your baby.
  • restless: you want to be here and now, not somewhere else, contentment, contentment.
  • doubt: speak with your own experience – you can not afford not to meditate, the benefits are so obvious, why doubt?

Three most important questions:

  • When is the best time? Now
  • Who is the most important person? The one you are with now
  • What’s the most important thing to do? Care (pay 100% attention, observe, not lost in your own thoughts)

Data Science Tips

  • Technical:
    • Check for distribution
    • check for outliers
    • slice and dice the data
    • give the intervals, not just the point estimate; don’t be fooled by randomness
    • does it make a difference, practically?
    • is the consistent across slices, time periods?
  • Process:
    • sanity check; description; then evaluation
    • undertand the data collection process if possible
    • check for comman/major metrics, not necessarily the target, but common sensical ones
    • hypothesis first, and either validate or reject it
    • run the standard tests; for example linear model as baseline
    • define mutlipile ways of measuring the impact
    • check the reproducibility with past estimations
  • communication:
    • quesiton first, technical stuff second
    • educate the audience
    • clearly defining the ratios: numerator and denominator
    • document exactly how you filter the data to get final version fed into models
    • open minded: listern as if you were wrong, and speak as if you are right
    • no ego here: admit everybody screws up, and that’s nature of data analysis

Bengio's priors

The univeral priors for RL: (Bengio’s 10 priors in my words)

  1. Sparsity of data
  2. Smoothness
  3. Lower dimenstional manifold
  4. Othogonal nature of different factors, more or less
  5. Heirarchical nature
  6. Simple relationships at high level of abstraction: linear, additive, etc.
  7. semi-supervised learning: P(X) can help understand P(Y|X)
  8. Natural clustering for differnet categories: P(X|Y = Ci) is usually quite different from P(X|Y = Cj)
  9. Spatial and temporal patterns
  10. Shared the patterns across tasks

Tobit, censored, hurdle

  1. Censored vs. Truncated model: for a censored model we still observe Xs, but Ys are either zeros or capped for those outside the range; for truncated model we don’t observe either Xs or Ys outside the range.
  2. Tobit model is a censored model with the error term assumed to be normal. With that assumption we could use invserse Mill’s ratio to correct the bias.
  3. Variants of truncated model:
    • Tobit, as indicated above.
    • Two-equation selection model: this is Heckman’s two-step selection model – there is a separate mechanism driving the selection.
    • Hurdle model: Similiar to Tobit model, but there is a separate mechanism for truncation.

DL notes

  1. CNN is basically using a slide window (2d for image, usually 1d for NLP) to create filters ( learns), to convert a k-dim window to a d-dim vector representation. Each dim of the d-dim vector is called a channel.
  2. RNN is more flexible than HMM in that it does NOT require Markov assumption: it conditions on the whole history.
  3. LSTM splits the state into two parts: long term C of memory cell, and short term state H. And use three gates for the evolution of the state: f (forget), i (input), o (output). Ci = f Ci-1 + i Cprop ; Hi = o tanh (Ci), where Cprop = tanh(input), and i , o, f = sigmoid(input), and the input = (Xi, H(i-1))
  4. Cross entropy is just the NLL.
  5. First layer is usually the embedding layer: converting sparse one-hot coding into a low dim embedding vector space
  6. In DL, we only use core features, and let the network figure out the interactions.
  7. Theano & tensor flow: they construct the computation graph, so the feed forward computation and backward updating is performed based on the tree.
  8. biRNN is to traverse the sequence from both ends.
  9. encode-decoder RNN is the use the encoded vector to feed into the decoder: it’s sort of like the idea of forward filtering and backwards smoothing

Tennis mental game

  1. Split, not jump; TOH == ball on the string.
  2. Orienting the swing toward the ball, not the target. Plenty of space, swing out, 45 angle.
  3. The moment the shot is finished, recover to the middle, a few feet behind the baseline, slight adjustment for wide angle shots.
  4. Keep the racket in front of you until the last moment, then drop slowly and hit.
  5. Relax == focus == confidence
  6. Three option mental game:
    1. Scramble: touch or meet the ball in front of you for fast balls coming at you, slice or squash shots when pulled wide, or just lob as the last resort.
    2. Defense / rally: aim high in the middle.
    3. Offense: for short or easy balls, either go cross or down, make the decision as early as possible

Semi-supervised learning notes

  1. Well separated clusters? Try EM with generative mixtures
  2. Two independent sets of features? Try co-training.
  3. Making sense to separate in the low-density regions? Try transductive SVM.
  4. Making sense to assume similar features tend to have similar labels? Try graph-based methods.
  5. None of the above making sense? Self-learning wrapper.

Tips for tennis (2017)

  1. Preparation: two stage preparation — After landing from the split step, the first stage is to turn your shoulder and shift weight on the outside foot before the ball bounces; the second stage is to bend the knees when the ball bounces, step in if you have the time.
  2. Contact point: 45 angle to go after the ball, make sure you have enough space to hit the ball.
  3. Rhythm: slow and then fast; don’t rush the dropping of racquet; watch the full swing all the way while keeping your head still.
  4. Lifting the ball: bend the knees and push off; this also helps a lot with your timing.
  5. Follow through: hit through the ball, finish up with high elbows.
  6. Mental: relax and enjoy the game, never beat up yourself; when you miss a bad shot, have a good laugh.
  7. Mental: always go after the ball — play the ball rather than being played by the ball.
  8. Mental: never rush — you have plenty time for most shots.
  9. Mental: no fear — fast balls are easy to handle actually, just don’t panic, pause, and use the pace. Use body weight to guide the ball, minimum back swing.
  10. Mental: focus on watching the ball all the way from the moment your opponent hits the ball until the ball leaves your string; don’t get lost enjoying your own shot, watch your opponent and get ready and time your split step instead.

Notes on Mixture models

  1. Mixture of unigram is just simple clustering, or FMM, each word is probabilistically assigned to the topics.
  2. LDA/NMF/pLSI moves one step beyond to what’s called mixed membership model, or what I’d call as co-clustering.
  3. In mixed membership models, both words and documents (columns and rows) are being clustered to the latent topics.
  4. LDA is almost like NMF with KL as the underlying metric being optimized, as a matter of fact KL NMF is a just a Gamma-Poisson model, which is an unnormalized version of LDA which is a Dirichlet-Multimomial model.
  5. LDA is just a pLSI with a hyper dirichlet parameter on the topic proportions.
  6. If we also put a hyper dirichlet parameter on the topics, i.e., a prior what are the chances a word in the vocabulary belongs to one topics, that we have a “smooth LDA”.

Notes on TS models

  1. A good value of λ is one which makes the size of the seasonal variation about the same across the whole series, as that makes the forecasting model simpler. ( rationale behind this ? )
  2. stl can’t deal with fractional frequency. The basic ideas is still to use moving average as the trend — only the trend is smoothed over a window. Plus stl can only deal with additive decomposition.
  3. ets can not handle high frequency.
  4. tbats: trigonometric box-cox arima: it’s innovation SSM — single source of error.
  5. Residue checking: zero mean, no serial correlation, normality ( for producing intervals )
  6. Accuracy measure: MSRE, MAPE, scaled error

Parenting

https://www.youtube.com/watch?v=NAu_KdiapFY

  1. 0 - 3; positive
  2. 3 -13; still and study
  3. 14 - 30; ambition
  4. 30 - ; viture

Check TCP port status

netstat -nap