Tuesday, April 2, 2019

My Journey to my ML Learning

Built in Data Structures, Functions and File handling in Python

Data Structures and Sequence

  • Tuple {tup = (1,2,3)}
    • UNPACKING TUPLES {tup = (4, 5, 6); a, b, c = tup}
    • TUPLE METHODS
      • count { a = (1, 2, 2, 2, 3, 4, 2); a.count(2)
  • List {mylist = [1,2,3,4]}
    • ADDING AND REMOVING ELEMENTS { b_list.append('dwarf'); b_list.insert(1, 'red');
      • insert is computationally expensive compared with append
    • The inverse operation to insert is pop { b_list.pop(2)}
    • Elements can be removed by value with remove { b_list.remove('foo')}
    • Check if a list contains a value using the in keyword: { 'dwarf' in b_list}
    • CONCATENATING AND COMBINING LISTS { [4, None, 'foo'] + [7, 8, (2, 3)]}
    • you can append multiple elements to it using the extend method: { x = [4, None, 'foo']; x.extend([7, 8, (2, 3)])}
    • Using extend to append elements to an existing list, especially if you are building up a large list, is usually preferable
    • SORTING {a = [7, 2, 5, 1, 3];a.sort()}
    • Sort has a few options that will occasionally come in handy. One is the ability to pass a secondary sort key—that is, a function that produces a value to use to sort the objects { b = ['saw', 'small', 'He', 'foxes', 'six']; b.sort(key=len)}
    • BINARY SEARCH AND MAINTAINING A SORTED LIST { import bisect; c = [1, 2, 2, 2, 3, 4, 7]; bisect.bisect(c, 2)}
    • bisect.insort actually inserts the element into that location: { bisect.insort(c, 6)}
    • SLICING
      • start:stop passed to the indexing operator [], start index is included, the stop index is not included
      • A step can also be used after a second colon to, say, take every other element
      • A clever use of this is to pass -1, which has the useful effect of reversing a list or tuple:
  • Built-in Sequence Functions
    • ENUMERATE: Python has a built-in function, enumerate, which returns a sequence of (i, value) tuples: { for i, value in enumerate(collection):; #do something with value}
    • SORTED: The sorted function returns a new sorted list from the elements of any sequence:
    • ZIP: zip “pairs” up the elements of a number of lists, tuples, or other sequences to create a list of tuples {seq1 = ['foo', 'bar', 'baz']; seq2 = ['one', 'two', 'three']; zipped = zip(seq1, seq2)}
      • A very common use of zip is simultaneously iterating over multiple sequences, possibly also combined with enumerate: {for i, (a, b) in enumerate(zip(seq1, seq2)):;print('{0}: {1}, {2}'.format(i, a, b))}
    • REVERSED: reversed iterates over the elements of a sequence in reverse order:
  • Dict: dict is likely the most important built-in Python data structure. A more common name for it is hash map or associative array ( empty_dict = {}; d1 = {'a' : 'some value', 'b' : [1, 2, 3, 4]} )
    • You can delete values either using the del keyword or the pop method ( d1 = {5:'Some other','a' : 'some value', 'b' : [1, 2, 3, 4]}; del d1[5] or ret = d1.pop(5) )
    • You can merge one dict into another using the update method:( d1.update({'b' : 'foo', 'c' : 12}) )
    • CREATING DICTS FROM SEQUENCES { mapping = dict(zip(range(5), reversed(range(5)))) }
    • DEFAULT VALUES
      • the dict methods get and pop can take a default value to be returned {value = some_dict.get(key, default_value)}
      • The built-in collections module has a useful class, defaultdict { from collections import defaultdict; by_letter = defaultdict(list);
    • VALID DICT KEY TYPES
      • The keys generally have to be immutable objects like scalar types (int, float, string) or tuples (all the objects in the tuple need to be immutable, too) - term here is hashability {hash('string'); hash((1, 2, [2, 3])) # fails because lists are mutable}
  • set : A set is an unordered collection of unique elements {set([2, 2, 2, 1, 3, 3])}
    • Sets support mathematical set operations like union, intersection, difference, and symmetric difference. Consider these two example sets { a = {1, 2, 3, 4, 5}; b = {3, 4, 5, 6, 7, 8}; a.union(b); a | b; a.intersection(b); a & b;}
  • List, Set, and Dict Comprehensions
    • List comprehension: [expr for val in collection if condition] { strings = ['a', 'as', 'bat', 'car', 'dove', 'python']; x.upper() for x in strings if len(x) > 2}
    • Dict comprehension: dict_comp = {key-expr : value-expr for value in collection if condition}
    • Set comprehension: set_comp = {expr for value in collection if condition}
    • NESTED LIST COMPREHENSIONS {result = [name for names in all_data for name in names if name.count('e') >= 2] }
  • Functions - Functions are declared with the def keyword and returned from with the return keyword: def my_function(x, y, z=1.5): if z > 1:
      return z * (x + y)
    else:
      return z / (x + y)
    • There is no issue with having multiple return statements. If Python reaches the end of a function without encountering a return statement, None is returned automatically.
    • Each function can have positional arguments and keyword arguments. Keyword arguments are most commonly used to specify default values or optional arguments
    • The main restriction on function arguments is that the keyword arguments must follow the positional arguments (if any).
    • Namespaces, Scope, and Local Functions
      • Assigning variables outside of the function’s scope is possible, but those variables must be declared as global via the global keyword: def bind_a_variable(): global a a = [] bind_a_variable()
      • generally discourage use of the global keyword
    • Returning Multiple Values def f():
        a = 5
        b = 6
        c = 7
        return a, b, c
      a, b, c = f()
    • Functions Are Objects { clean_ops = [str.title, str.strip]}
    • MAP function: Used to map function with object { for x in map(str.title, value):;print(x)}
    • Anonymos(Lambda) functions: way of writing functions consisting of a single statement, the result of which is the return value def short_function(x):
        retrun x*2
      equiv_anon = lambda x:x*2

NumPy Basics: Arrays and Vectorized Computation

  • NumPy is better (10 - 100 times) for vector computation from performance prospective

The NumPy ndarray: A Multidimensional Array Object

  • Creating ndarrays

No comments: