bomonike

python-coding.png How to code Python as it matters, as shown in my samples github: how best to use Keywords, arguments, Exception Handling, OS commands, Strings, Lists, Sets, Tuples, Files, Timers

US (English)   Norsk (Norwegian)   Español (Spanish)   Français (French)   Deutsch (German)   Italiano   Português   Estonian   اَلْعَرَبِيَّةُ (Egypt Arabic)   Napali   中文 (简体) Chinese (Simplified)   日本語 Japanese   한국어 Korean

Overview

NOTE: Content here are my personal opinions, and not intended to represent any employer (past or present). “PROTIP:” here highlight information I haven’t seen elsewhere on the internet because it is hard-won, little-know but significant facts based on my personal research and experience.

This is the last in my series of articles about Python:

Why This?

In my python-tutorials page I list the many tutorials on YouTube and paid subscription channels.

What I don’t like about them I aim to fix on this page.

PCEP-30-02 Exam Outline

This article contains topic names and links to the PCEP™ – Certified Entry-Level Python Programmer (Exam PCEP-30-02) last updated: February 23, 2022: Sections:

  1. 7 items (18%) Computer Programming and Python Fundamentals
  2. 8 items (29%) Control Flow – Conditional Blocks and Loops
  3. 7 items (25%) Data Collections – Tuples, Dictionaries, Lists, and Strings
  4. 8 items (28%) Functions and Exceptions

1: Computer Programming and Python Fundamentals

1.1 – Understand fundamental terms and definitions

2: Data Types, Evaluations, and Basic I/O Operations (20% - 6 exam items)

   >>> "{} {} cost ${}".format(6, "bananas", 1.74 * 6)<br />
   '6 bananas cost $10.44'



2: Control Flow Control – Conditional Blocks and Loops (29% - 8 exam items)

2.1 – Make decisions and branch the flow with the if instruction

2.2 – Perform different types of iterations

3: Data Collections – Lists, Tuples, and Dictionaries (25% - 7 exam items)

3.1 – Collect and process data using lists

3.2 – Collect and process data using tuples

3.3 Collect and process data using dictionaries

3.4 - Operate with Strings

4: Functions and Exceptions (20% - 6 exam items)

4.1 – Decompose the code using functions

4.2 – Organize interaction between the function and its environment

4.4 – Basics of Python Exception Handling

All instances in Python must be instances of a class that derives from BaseException. Before using a divide operator:

try:
    a = 10/0
    print (a)
except ArithmeticError:
    print ("microbit-001: This raises an arithmetic exception.")
else:
    print ("Success.")

References about Exception Handling:

4.3 – Python Built-In Exceptions Hierarchy

locals()[‘builtins’]

Python inherits from the Exceptions class.

Abstract exceptions:


Debugging using IDE

See key/value pairs without typing print statements in code, like an Xray machine:

  1. Click next to a line number at the left to set a Breakpoint.
  2. Click “RUN AND DEBUG” to see variables: Locals and Globals.
  3. To expand and contract, click “>” and “V” in front of items.
  4. “special variables” are dunder (double underline) variables.
  5. Under each “function variables” and special variables of their own. For a list, it’s append, clear, copy, etc.
  6. Under Globals are its special variable (such as file for the file path of the program) and class variables, plus an entry for each class defined in the code (such as unittest).

Pydantic

Pydantic at docs.pydantic.dev) is the most widely used data validation library for Python.

Use pydantic when you’re not in control of the data input.

Fast and extensible, Pydantic plays nicely with your linters/IDE/brain. Define how data should be in pure, canonical Python 3.8+; validate it with Pydantic. Its success means it suffers from feature creep. There’s a temptation to move other classes over to pydantic, just because pydantic also includes serialization.

It leans heavily on use of type hinting, which makes custom validation more complex than perhaps necessary.

So check if you can get away with dataclasses.

Use Python Code Scans

mypy

Static Application Security Testing (SAST) looks for weaknesses in code and vulnerable packages.

Dynamic Application Security Testing (DAST) looks for vulnerabilities that occur at runtime.

https://www.statworx.com/en/content-hub/blog/how-to-scan-your-code-and-dependencies-in-python/

A. PEP8 “lints” program code for violations of the PIP.

Other formaters: blake, ruff.

B. Bandit (open-sourced at https://github.com/PyCQA/bandit) scans python code for vulnerabilities. It decomposes the code into its abstract syntax tree and runs plugins against it to check for known weaknesses. Among other tests it performs checks on plain SQL code, which could provide an opening for SQL injections, passwords stored in code and hints about common openings for attacks such as use of the pickle library. Bandit is designed for use with CI/CD:

bandit -c bandit_yml.cfg /path/to/python/files

The bandit_yml.cfg configuration file contains YAML lines such as this to specify types of assertion to skip.

# bandit_cfg.yml
skips: ["B101"] # skips the assert check

Bandit throws an exit status of 1 whenever it encounters any issues, thus terminating the pipeline.

The report it generates include the number of issues separated by confidence and severity according to three levels: low, medium, and high.

C. safety checks for dependencies containing vulnerabilities (CVEs) identified.

https://pypi.org/project/scancode-toolkit/

D. Scancode ScanCode scans Python code for license, copyright, package and their documented dependencies and other interesting facts.

E. GitHub’s Advanced Security scans Python code based on CodeQL logic specifications.

https://7451111251303.gumroad.com/l/wotve


Time Complexity Big Oh notation

There is time complexity, data complexity, etc.

Big-O notation summarizes Time Complexity analysis, which estimates how long it can take for an algorithm to complete based on its structure. That’s worst-case, before optimizations such as memoization.

From https://bigocheatsheet.com, in the list of Big O values for sorting:

python-coding-time-complexity-1222x945

BigO References: VIDEO

Let’s go from the most efficient (at the bottom-right) to the least efficient at the upper-left, where n is the number of input items in the list being processed:

The asymptope is when a number reaches an extremely large number that is essentially infinite.

Depth-first trees would have steeper (logarithmic) Time Complexity.

References:

GitHub repos with the highest stars:

Faster routes to machine code

By default, Python comes with the CPython interpreter (command cythonize) to generate machine-code. When speed is needed, such as in loops, custom C/C++ extensions are created. Additional speed is obtained by adding before nested loop code directives and decorators:

# cython: language_level=3, boundscheck=False, wraparound=False
import cython
@cython.locals(i=cython.int,j=cython.int,a=list[cython.int],b=list[cython.int])

VIDEO: benchmarks Numba, mypyc, Taichi (the fastest). Alternately, code compiled using Codon by Exaloop tool “41,212 times faster” than the standard Python interpreter.

Condon is a new python compiler that uses the LLVM framework to compile directly to machine code. Condon can also make use of the thousands of processors on a GPU to process matrix, graphical, and mathematical operations without using a library like numpy, scikit-learn, scipy, and game library pygame. However, Conda cannot use modules like typing functools such as wraps, which provides contextual information for decorators.


Lexis

From https://learning.oreilly.com/library/view/python-in-a/0596100469/ch04s01.html

The syntax of the Python programming language is the set of rules that defines how a Python program will be written and interpreted (by both the runtime system and by human readers). The Python language has many similarities to Perl, C, and Java. However, there are some definite differences between the languages. It supports multiple programming paradigms, including structured, object-oriented programming, and functional programming, and boasts a dynamic type system and automatic memory management.

Python’s syntax is simple and consistent, adhering to the principle that “There should be one— and preferably only one —obvious way to do it.” The language incorporates built-in data types and structures, control flow mechanisms, first-class functions, and modules for better code reusability and organization. Python also uses English keywords where other languages use punctuation, contributing to its uncluttered visual layout.

The language provides robust error handling through exceptions, and includes a debugger in the standard library for efficient problem-solving.

Python’s syntax, designed for readability and ease of use, makes it a popular choice among beginners and professionals alike.

Reserved Keywords

VIDEO: W: Here are the keywords Python has reserved for itself, so they can’t be used as custom identifiers (variables).

  1. _ (soft keyword)
  2. and
  3. as
  4. assert
  5. async
  6. await
  7. break - force escape from for/while loop
  8. case (soft keyword)
  9. class
  10. continue - force loop again next iteration
  11. def - define a custom function
  12. del - del list1[2] # delete 3rd list item, starting from 0.
  13. elif - else if
  14. else
  15. except
  16. False - boolean
  17. finally - of a try
  18. for = iterate through a loop
  19. from
  20. global = defines a variable global in scope
  21. if
  22. import = make the specified package available
  23. in
  24. is
  25. lambda - if/then/else in one line
  26. match (soft keyword)
  27. None - absence of value.
  28. nonlocal
  29. not
  30. or
  31. pass - (as in the game Bridge) instruction to do nothing (instead of return or yield with value)
  32. raise - raise NotImplementedError() throws an exception purposely
  33. return
  34. True - Boolean
  35. try - VIDEO
  36. while
  37. with
  38. yield - resumes after returning a value back to the caller to produce a series of values over time.

NOTE: match, case and _ were introduced as keywords in Python 3.10.

The list above can be retrieved (as an array) by this code after typing python for the REPL (Read Evaluate Print Loop) interactive prompt:

from keyword import kwlist, softkwlist
def display_keywords() -> None: 1usage
    print('Keywords:')  # not alphabetically
    for i, kw in enumerage(kwlist, start=1):
        print(f'{i:2}: {kw})
    print('Software keywords')
    for i, skw in enumerate(softwlist, start=1):
        print(f'{i:2}: {skw}')

def main() -> None: 1usage
    display_keywords()

if __name__ == '__main__':
    main()

Soft keywords:

  1. _ (magic)
  2. cate
  3. match
  4. type (added by Python 3.12)

Press control+D to exit anytime.

Built-in Methods/Functions

Don’t create custom functions with these function names reserved.

Know what they do. See https://docs.python.org/3/library/functions.html


The first thing that most tutorials cover is this:

PROTIP: Don’t just print out the value. Include the variable name, such as:

print("=== var1=",var1)

While Loop

CAUTION: What’s wrong with this sample code?

Insecure While loop

PROTIP: Passwords and other secrets should not be requested in an input() prompt because that would expose the passwords in CLI logs.

PROTIP: Passwords and other secrets should not be stored in programming code.

PROTIP: The way to verify passwords is not to store them as the raw password which the user typed in but as a hash of what the password the user typed in. The hash would also be created with a “salt” to ensure randomness. To verify whether the user provides the correct password, the program would add the salt to calculate the hash the user provides, then compare the two.

PROTIP: The user should be provided with a set limited number of tries. When exceeded, the user and IP address used should be locked out, entered in central (SIEM) security logs, and reported as a Security incident.

Magic underlines

VIDEO from idently.co: Underlines in numbers are ignored by Python:

n: int = 1_000_000_000

Specify command separator:

num: float = 1_000_000_000.342
print(f'{num:_.3f}')

Right-align 20 spaces:

print(f'{var:>20}')

Center align within 20 | characters with the cap character:

print(f'{var:|^0}:')

The : character at the end of the string is a pass-through.

A colon after a variable begins a formatting specification:

from datetime import datetime
now: datetime = datetime.now()
print(f'{now:%d.%m.%y(%H:%M:%D)})

Function return Not None

Returning 0 on error can be confused with the number 0 as a valid response.

So to avoid the confusion, return the Python reserved word “None”:

result = safe_square_root(4)
<strong>if result is not None:</strong>   # happy path:
   value = result.pop()  # pop up from stack.
   print(value)
else:  # notice we're not checking for None.
    # calling function does not need to handle error:
    # an error occurred, but encapsulated to be forwarded and processed upstream:
    print("unable to compute square root")

Function:

def safe_square_root(x):
    try:
        return [math.sqrt(x)]   # in a stack.
    except ValueError:
        return None   # using reserved word.

The parameter (x) is what is declared going into the function.

The value passed through when calling the function is called an argument.

Operators

DEFINITION: Walrun operator :=

VID1 VID2

Floor division Operators

This feature was added in Python 3.

11 // 5 uses “floor division” to return just the integer (integral part) of 2, discarding the remainder. This can be useful to efficiently solve the “Prefix Sums CountDiv” coding interview challenge: “Write a function … that, given three integers A, B and K, returns the number of integers within the range [A..B] that are divisible by K”:

def solution(a, b, k):
    return 0 if b == 0 else int(b // k - (a - 1) // k)
   

Instead of a “brute force” approach which has linear time complexity — O(n), the solution using floor division is constant time - O(1).

Modulo operator

11 % 5 uses the (percent sign), the modulo operator to divide 11 by the quotient 5 in order to return 1 because two 5s can go into 11, leaving 1 left over, the remainder. Modulus is used in circular buffers and hashing algorithms.

def solution(A, K):
    # A is the array.
    # K is the increment to move.
    result = [None] * len(A)   # initialize result array for # items in array

    for i in range(len(A)):
        # Use % modulo operator to calculate new index position 0 - 9:
        result[(i + K) % len(A)] = A[i]
        print(f'i={i} A[i]={A[i]} K={K} result={result} ')
    return result

print(solution([7, 2, 8, 3, 5], 2))

Modulu is also used in this


What Day and Time is it?

The ISO 8601 format contains 6-digit microseconds (“123456”) and a Time Zone offset (“-5.00” being five hours West of UTC):

# import datetime
start = datetime.datetime.now()
# do some stuff ...
end = datetime.datetime.now()
elapsed = end - start
print(elapsed)
# or
print(elapsed.seconds,":",elapsed.microseconds)

Some prefer to display local time with a Time Zone code from Python package pytz or zulu.

PROTIP: Servers within enterprises and military run in UTC time and Logs should be output in UTC time rather than local time,

datetime.datetime.now() provides microsecond precision:

References:

Timezone handling

During Debian OS 12 install from iso file, a time zone is requested to be manually selected. After boot-up:

  1. Check the current timezone with bash timedatectl
  2. Set the timezone to UTC with bash sudo timedatectl set-timezone Etc/UTC Alternately, reconfigure the timezone data with bash sudo dpkg-reconfigure tzdata then select “None of the above” from the Continents list, then select “UTC” from the second list: Follow the prompts to navigate through the menus and select Etc or None of the above, then choose UTC.

NOTE: On macOS, timezone data are in a binary file at /etc/localtime.

Within Python, there are several ways to detect time zone:

from dateutil import tz

local_timezone = tz.tzlocal()
print("dateutil local_timezone=",local_timezone)

Use the dateutil library to read /etc/localtime and get the timezone-aware datetime object:

from datetime import datetime

local_now = datetime.now().astimezone()
local_timezone = local_now.tzinfo
print("zoneinfo local_timezone=",local_timezone)
from zoneinfo import ZoneInfo
from datetime import datetime

local_timezone = datetime.now(ZoneInfo("localtime")).tzinfo
print("zoneinfo local_timezone=",local_timezone)

Use the tzlocal library to obtain the IANA time zone name (e.g., ‘America/New_York’). But it varies across operating systems.

import tzlocal
local_timezone = tzlocal.get_localzone_name()
print("tzlocal local_timezone=",local_timezone)

Once a datetime has a tzinfo, the astimezone() strategy supplants new tzinfo

# astimezone() defaults to the local time zone when no argument is provided.
from datetime import datetime

local_now = datetime.now().astimezone()
local_timezone = local_now.tzinfo
print("astimezone local_timezone=",local_timezone)

Timing Attacks

A malicious use of precise microseconds timing code is used by Timing Attacks based on the time it takes for an application to authenticate a password to determine the algorithm used to process the password. In the case of Keyczar vulnerability found by Nate Lawson, a simple break-on-inequality algorithm was used to compare a candidate HMAC digest with the calculated digest. A value which shares no bytes in common with the secret digest returns immediately; a value which shares the first 15 bytes will return 15 compares later.

Similarly, PDF: entropy python-sample-entropy-times-957x402

PROTIP: Use the secrets.compare_digest module (introduced in Python 3.5) to check passwords and other private values. It uses a constant amount of time to process every request.

Functions hmac.compare_digest() and secrets.compare_digest() are designed to mitigate against timing attacks.

http://pypi.python.org/pypi/profilehooks

REMEMBER: Depth-First Seach (DFS) uses a stack, whereas Breadth-First Search (BFS) use a queue.

VIDEO: The Sliding Window

VIDEO: FullStack’s REACTO framework during coding interviews:

  1. Repeat the question
  2. Examples
  3. Approach
  4. Code
  5. Test
  6. Optimization

Run Duration calculations

Several packages, functions, and methods are available. They differ by:

We want both reported.

timeit.default_timer() is time.perf_counter() on Python 3.3+.

The same program run several times would report similar CPU time but varying wall-clock times due to differences in what else was taking up resources during the runs.

To time the difference between calculation strategies, new since Python 3.7 is PEP 564.

time.perf_counter() (abbreviation of performance counter) measures the elapsed time of short duration because it returns 82 nano-second resolution on Fedora 4.12. It is based on Wall-Clock Time which includes time elapsed during sleep and is system-wide. The reference point of the returned value is undefined, so that only the difference between the results of consecutive calls is valid. See https://docs.python.org/3/library/time.html#time.perf_counter

time.clock is no longer available since Python 3.8.

time.time() has a resolution of whole seconds. And in a measurement period between start and stop times, if the system time is disrupted (such as for daylight savings) its counting is disrupted. time.time() resolution will only become larger (worse) as years pass since every day adds 86,400,000,000,000 nanoseconds to the system clock, which increases the precision loss. It is called “non-monotonic” because falling back on daylight savings would cause it to report time going backwards:

timeit()

For more accurate wall-time capture, the timeit() functions disable the garbage collector.

timeit.timer() provides a nice output format of 0:00:01.946339 for almost 2 seconds.

timeit.timeit(stmt='pass', setup='pass', timer=<default timer>, number=1000000, globals=None)

PEP-418 in Python 3.3 added three timers:

time.process_time() offers 1 nano-second resolution on Linux 4.12. It does not include time during sleep.

# import time
t = time.process_time()
# do some stuff ...
elapsed_time = time.process_time() - t

time.monotonic() is used for measurements on the order of hours/days, when you don’t care about sub-second resolution. It has 81 ns resolution on Fedora 4.12. BTW “monotonic” = only goes forward. See https://docs.python.org/3/library/time.html#time.monotonic

References:


Pickle objects

Pickling is the process of converting (serializing) a (especially complex) Python object (list, dict, set, tuple, matrix) into a byte stream used to transfer to another object, over the internet, or store in a database.

https://www.youtube.com/watch?v=wO_gVvINtg0

minmax

The difference between API, Library, Package, Module, Script, Frameworks.

A library contains several modules which are separated by its use.

http://docs.python.org/3/reference/import.html

A module is a bunch of related code saved in a file with the extension .py. Code in a module can be functions, classes, or variables.

The most popular imports include system, time, random, datetime, argparse, re (regular expressions), math, xarray, polars (for computation), seaborn (charts with themes) on top of matplatlib, pytorch, pygame, result (exception handling), pydantic (data validation), missingno, sqlmodel (ORM fastapi), beautifulsoup, python-dotenv (key value pairs in environment variables).

Packages can also contain modules and other packages (subpackages).

Packages structure Python’s module namespace by using “dotted module names”.

The ___ VScode extension squences and reformats import statements to save memory. If the program only needs a single function, only that would be imported in.

Django, Flask, Bottle are frameworks - that provide the basic flow and architecture of the application.

def celsius_to_fahrenheit(celsius):
    return (celsius * 9/5) + 32

try:
    celsius = float(input("Enter temperature in Celsius: "))
    fahrenheit = celsius_to_fahrenheit(celsius)
    print(f"{celsius}°C is equal to {fahrenheit:.2f}°F")
    f"{fahrenheit:.2f}"
    round(fahrenheit, 2)
except ValueError:
    print("Please enter a valid number for the temperature.")

Swapping

To swap values, here’s a straight-forward function:

def swap1(var1,var2):
    var1,var2 = var2,var1
    return var1, var2
>>> swap1(10,20)
>>> 2 1
def swap2(x,y):
    x = x ^ y
    y = x ^ y
    x = x ^ y
    return x, y
>>> swap2(10,20)
(20,10)

Sorting

Challenges:

Implement Bubble Sort

Implement Quick Sort

Implement Selection Sort

Implement Insertion Sort

Implement Quick Sort

Implement Merge Sort

Implement Binary Search and Quick Sort

Reduce Space Complexity with Dynamic programming

Techniques for calculation of nested loops is often used to shown how to reduce run times by using techniques that use more memory space. Rather than “brute-force” repeatitive computations as in the definition of how to calculate Fibonacci numbers, which by definition is based on numbers preceding it.

Fibonacci has the highest BigO because it uses recursive execution with Python generators. VIDEO

Memoization (sounds like memorization) is the technique of writing a function that remembers the results of previous computations.

Longest Increasing Subsequence (LIS)

That’s a technique of “Dynamic Programming” (See https://www.wikiwand.com/en/Dynamic_programming)

Dynamic programming is a catch phrase for solutions based on solving successively similar but smaller problems, using algorithmic tasks in which the solution of a bigger problem is relatively easy to find, if we have solutions for its sub-problems.

Avoid divide by zero errors

Use this in every division to ensure that a zero denominator results in falling into “else 0” rather than a “ZeroDivisionError” at run-time:

def weird_division(n, d):
    # n=numerator, d=denominator.
    return n / d if d else 0

Random

Flip a coin:

import random

if random.randint(0, 1) == 0:
  print("heads!")
else:
  print("tails!")

TODO: Roll a 6-sided die? See bomonike/memon

TODO: Roll a 20-sided die?


Environment Variable Cleansing

To read a file named “.env” at the $HOME folder, and obtain the value from “MY_EMAIL”:

import os
env_vars = !cat ~/.env
for var in env_vars:
    key, value = var.split('=')
    os.environ[key] = value
 
print(os.environ.get('MY_EMAIL'))   # containing "johndoe@gmail.com"

This code is important because it keeps secrets in your $HOME folder, away from folders that get pushed up to GitHub.

There is the “load_dotenv” package that can do the above, but using native commands mean less exposure to potential attacks.

Remember that attackers can use directory traversal sequences (../) to fetch the sensitive files from the server.

Sanitize the user input using “shlex”


Object-oriented class functions

To use .maketrans() an d .translate()

BTW not everyone is enamored with Object-Oriented Programming (OOP). Yegor Bugayenko in Russia recorded “The Pain of OOP” lectures “Algorithms hurt object thinking” May 2023 and #2 Static methods and attributes are evil, a repeat of his 11 March 2020: #1: Algorithms and Lecture #2: Static methods and attributes are evil. His 2016 ElegantObjects.org presents an object-oriented programming paradigm that “renounces traditional techniques like null, getters-and-setters, code in constructors, mutable objects, static methods, annotations, type casting, implementation inheritance, data objects, etc.”


Blob vs. File vs. Text

A “BLOB” (Binary Large OBject) is a data type that stores binary data such as mp4 videos, mp3 audio, pictures, pdf. So usually large – up to 2 TB (2,147,483,647 characters).

https://github.com/googleapis/google-cloud-python/issues/1216

https://towardsdatascience.com/image-processing-blob-detection-204dc6428dd


GUI

https://docs.python.org/3/using/ios.html

Not many develop iOS and iPad apps using Python vs. coding Swift, which is similar to Python. Learning Swift to develop an iOS application would be easier than figuring out how to develop an iOS application in Python.

But if you are hell-bent on it:

VIDEO: Qt Media player

create-gui-applications-pyside6.epub

https://github.com/mfitzp/books/tree/main/create-gui-applications/pyside6

More advanced developers integrate Python directly into an iOS project using a Python XCFramework.


Cloud

Azure storage

https://github.com/yokawasa/azure-functions-python-samples

https://chriskingdon.com/2020/11/24/the-definitive-guide-to-azure-functions-in-python-part-1/

https://chriskingdon.com/2020/11/30/the-definitive-guide-to-azure-functions-in-python-part-2-unit-testing/

https://github.com/Azure/azure-storage-python/blob/master/tests/blob/test_blob_storage_account.py

https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python

Azure Blobs

NOTE: Update of azure-storage-blob deprecates blockblobservice.

VIDEO: https://pypi.org/project/azure-storage-blob/

https://www.educative.io/edpresso/how-to-download-files-from-azure-blob-storage-using-python

https://github.com/Azure/azure-sdk-for-python/issues/12744 exists() new feature

import asyncio
 
async def check():
    from azure.storage.blob.aio import BlobClient
    blob = BlobClient.from_connection_string(conn_str="my_connection_string", container_name="mycontainer", blob_name="myblob")
    async with blob:
        exists = await blob.exists()
        print(exists)

Azure Streams

https://blog.siliconvalve.com/2020/10/29/reading-and-writing-binary-files-with-python-with-azure-functions-input-and-output-bindings/ Reading and writing binary files with Python with Azure Functions input and output bindings


Web Scraper

Beautiful Soup

Movie Recommender

A popular project is to combine from Kagle a historical database of movies and TV shows from several streaming sites:

https://github.com/dataquestio/project-walkthroughs/blob/master/movie_recs/movie_recommendations.ipynb https://files.grouplens.org/datasets/movielens/ml-25m.zip

My rudimentry show-recommendations.py makes recommendations based on identifying atrributes of a single movie and showing others with the same attributes. https://www.youtube.com/watch?v=eyEabQRBMQA

It uses imports numpy and pandas for data handling.

Another advancement is to use the SurPRISE library (https://surpriselib.com/), named from the acronym Simple Python RecommendatIon System Engine. VIDEO

surprise -h

A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 as it may crash. To support both 1.x and 2.x versions of NumPy, modules must be compiled with NumPy 2.0. Some module may need to rebuild instead e.g. with ‘pybind11>=2.12’.

If you are a user of the module, the easiest solution will be to downgrade to ‘numpy<2’ or try to upgrade the affected module. We expect that some modules will need time to support NumPy 2.

An advancement is Movielens (https://grouplens.org/datasets/movielens/) https://grouplens.org/datasets/movielens/ The load_builtin() method will offer to download the movielens-100k dataset if it has not already been downloaded, and it will save it in the .surprise_data folder in your home directory (you can also choose to save it somewhere else).

Surprise is a “scikit” (https://projects.scipy.org/scikits.html) which enables you to build your own cross-validation recommendation algorithm as well as use ready-to-use prediction algorithms such as:

Matrix factorization-based algorithms are used for collaborative filtering within recommender systems. The algorithms aim decompose a large user-item interaction matrix into smaller matrices that capture latent factors. The four common matrix factorization algorithms are SVD, PMF, SVD++, NMF:

SVD (Singular Value Decomposition) decomposes the user-item matrix into three lower-dimensional matrices:

When applied to collaborative filtering, SVD aims to minimize the sum of squared errors between predicted and actual ratings for observed entries in the rating matrix.

QUESTION: The prediction for a user-item pair is calculated as: r̂ui = μ + bu + bi + qi^T * pu Where μ is the overall mean rating, bu and bi are user and item biases, and qi and pu are item and user factor vectors.

SVD++ extends SVD to incorporate both implicit and explicit ratings and implicit feedback (e.g., which items a user has rated). The prediction formula for SVD++ is:

r̂ui = μ + bu + bi + qi^T * (pu + N(u) ^(-1/2) * Σj∈N(u)yj) Where N(u) represents the set of items rated by user u, and yj are item factors that capture implicit feedback.

PMF (Probabilistic Matrix Factorization) is a model-based technique that assumes ratings are generated from a Gaussian (normal) distribution. So it factorizes the user-item matrix R into two lower-dimensional matrices: U (user factors) and V (item factors). PMF is particularly effective for large, sparse datasets and scales linearly with the number of observations.

NMF (Non-negative Matrix Factorization) factorizes a non-negative matrix V into two non-negative matrices W and H

V ≈ W * H^T Where V is the user-item rating matrix, W represents user factors, and H represents item factors. The non-negativity constraint in NMF often leads to more interpretable and sparse decompositions compared to other techniques. Key advantages of NMF include:

Reduced prediction errors compared to techniques like SVD when non-negativity is imposed

Ability to work with compressed dimensional models, speeding up clustering and data organization Automatic extraction of sparse and significant features from non-negative data vectors

These matrix factorization algorithms have proven to be effective in capturing latent factors and similarities between users and items, making them powerful tools for building recommender systems. The choice of algorithm depends on the specific requirements of the application, such as dataset characteristics, computational resources, and desired interpretability of the results.

To evaluate the performance of regression models and recommender systemsusing Singular Value Decomposition (SVD):

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).
&nbsp;
            Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std
RMSE        0.9311  0.9370  0.9320  0.9317  0.9391  0.9342  0.0032
MAE         0.7350  0.7375  0.7341  0.7342  0.7375  0.7357  0.0015
Fit time    6.53    7.11    7.23    7.15    3.99    6.40    1.23
Test time   0.26    0.26    0.25    0.15    0.13    0.21    0.06

Lower RMSE and MAE values indicate better predictive accuracy.

RMSE (Root Mean Square Error) is calculated as the square root of the average of squared differences between predicted and actual values. It gives higher weight to larger errors, making it more sensitive to outliers. The formula for RMSE is:

RMSE = √(Σ(predicted - actual)^2 / n)

MAE (Mean Absolute Error) is the average of the absolute differences between predicted and actual values. It treats all errors equally, regardless of their magnitude. The formula for MAE is:

MAE = Σ predicted - actual / n

RMSE is more sensitive to large errors, while MAE provides a more intuitive measure of average error magnitude.


GCP

https://gcloud.readthedocs.io/en/latest/storage-blobs.html

https://cloud.google.com/appengine/docs/standard/python/blobstore


OpenCV

RockPaperScissorsLizardSpock2.jpg A mobile app that recognizes your hand pattern to play the Rock Paper Sissors plus Spock Lizard. Use AI to guess what you will do next.

A macOS app that runs constantly to sound an alert if someone is looking over your shoulders.

https://learnopencv.com/blob-detection-using-opencv-python-c/

Scikit-Image

https://towardsdatascience.com/image-processing-with-python-blob-detection-using-scikit-image-5df9a8380ade

GIS

https://gsp.humboldt.edu/olm/Courses/GSP_318/11_B_91_Blob.html


String Handling

Regular Expressions

import re

Handle Strings safely

Python has four different ways to format strings.

Using f-strings to format (potentially malicious) user-supplied strings can be exploited:

from string import Template
greeting_template = Template("Hello World, my name is $name.")
greeting = greeting_template.substitute(name="Hayley")

So use a way that’s less flexible with types and doesn’t evaluate Python statements.

Data Types

In Python 2, there was an internal limit to how large an integer value could be: 2^63 - 1.

But that limit was removed in Python 3. So there now is no explicitly defined limit, but the amount of available address space forms a practical limit depending on the machine Python runs on. 64-bit

0xa5 (two character bits) represents a hexdidecimal number

3.2e-12 expresses as a a constant exponential value.

https://docs.python.org/3/tutorial/introduction.html#lists

list methods

Slicing strings

For flexibility with alternative languages such as Cyrillic (Russian) character set, return just the first 3 characters of a string:

letters = "abcdef"
first_part = letters[:3]
   

Unicode Superscript & Subscript characters

# Specify Unicode characters:
# superscript
print("x\u00b2 + y\u00b2 = 2")  # x² + y² = 2
 
# subscript
print(u'H\u2082SO\u2084')  # H₂SO₄

Superscript

# super-sub-script.py converts to superscript:
def conv_superscript(x):
    normal = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+-=()"
    super_s = "ᴬᴮᶜᴰᴱᶠᴳᴴᴵᴶᴷᴸᴹᴺᴼᴾᴾᴿˢᵀᵁⱽᵂˣʸᶻᵃᵇᶜᵈᵉᶠᵍʰᶦʲᵏˡᵐⁿᵒᵖ۹ʳˢᵗᵘᵛʷˣʸᶻ⁰¹²³⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾"
    res = x.maketrans(''.join(normal), ''.join(super_s))
    return x.translate(res)
 
print(conv_superscript('Convert all this2'))
# Or you can simply copy the text

Functions

Internationalization & Localization (I18N & L18N)

Internationalization, aka i18n for the 18 characters between i and n, is the process of adapting coding to support various linguistic and cultural settings:

  1. Install

    pip install gettext

    NOTE: pip is a recursive acronym that stands for either “Pip Installs Packages” or “Pip Installs Python”.

  2. Create a folder for each locale in the ./locale folder.

  3. Use Lokalise utility to manage translations through a GUI. It also has a CLI tool to automate the process of managing translations. https://lokalise.com/blog/lokalise-apiv2-in-practice/

    locales/
    ├── el
    │   └── LC_MESSAGES
    │       └── base.po
    └── en
     └── LC_MESSAGES
         └── base.po
    
  4. Add the library

    import gettext
    # Set the local directory
    localedir = './locale'
    # Set up your magic function
    translate = gettext.translation('appname', localedir, fallback=True)
    _ = translate.gettext
    # Translate message
    print(_("Hello World"))
    

    See https://phrase.com/blog/posts/translate-python-gnu-gettext/

  5. Store a master list of locales supported in a Portable Object Template (POT) file, also known as a translator:

    #: src/main.py:12
    msgid "Hello World"
    msgstr "Translation in different language"
    
    >>> unicode_string = u"Fuu00dfbu00e4lle"
    >>> unicode_string
    Fußbälle
    >>> type(unicode_string)
    &LT;type 'unicode'>
    >>> utf8_string = unicode_string.encode("utf-8")
    >>> utf8_string
    'Fuxc3x9fbxc3xa4lle'
    >>> type(utf8_string)
    &LT;type 'str'>
    
# ALTERNATIVE: TODO: http://babel.pocoo.org/en/latest/numbers.html
#from babel import numbers
# numbers.format_decimal(.2345, locale='en_US')
# Internationalization: http://babel.pocoo.org/en/latest/dates.html
# Requires: pip install Babel
# from babel import Locale
# NOTE: Babel generally recommends storing time in naive datetime, and treat them as UTC.
# from babel.dates import format_date, format_datetime, format_time
# d = date(2007, 4, 1)
# format_date(d, locale='en')     # u'Apr 1, 2007'
# format_date(d, locale='de_DE')  # u'01.04.2007'

Switch language in browsers

Ensure that your program works correctly when another human language (such as “es” for Spanish, “ko” for Korean, “de” for German, etc.) is configured by the user:

A. English was selected in browser’s Preferences, but the app displays another language.

B. Another language was selected in browser’s preferences, and the app displays that language.

To simulate selecting another language in the browser’s Preferences in Firefox:

FirefoxOptions options = new FirefoxOptions();
options.addPreference("intl.accept_languages", language);
driver = new FirefoxDriver(options);

Alternately, in Chrome:

HashMap<String, Object> chromePrefs = new HashMap<String, Object>();
chromePrefs.put("intl.accept_languages", language);
ChromeOptions options = new ChromeOptions();
options.setExperimentalOption("prefs", chromePrefs);
driver = new ChromeDriver(options);

Version management

  1. To create a requirements.txt file containing the latest versions:

    pip freeze > requirements.txt
    
  2. Identify whether CVEs have been filed against each module in requirements.txt:

    sbom ???
    

If you’re writing a library that you intend to distribute and use in many places (or to be used by many people), the standard approach is to write a setup.py package manifest, and in the install_requires argument of setup() declare your dependencies. You should declare only direct dependencies, and declare the range of versions your library is compatible with.

If you’ve built something that you want to deploy, or otherwise reproduce as an environment somewhere else, the standard approach is to create a requirements file containing the full (direct and transitive) dependency tree, pinned to exact versions, with package hashes included. You can do this by writing a script that strings together several pip commands, or by using the pre-made “pip-compile” script from the pip-tools project.

This pyproject.toml file will work with modern versions of setuptools (61.0 and above). It replaces the need for a separate setup.py or setup.cfg file in many cases. However, if you need more complex build configurations or have custom build steps, you may still need to use a setup.py file alongside pyproject.toml.

Remember to adjust the content according to your specific project requirements. The pyproject.toml file is designed to be human-readable and writable, making it easier to manage your project’s metadata and build configuration.

PROTIP: I’ve found Poetry to be difficult to debug https://install.python-poetry.org:

   brew install poetry
  1. Verify:

    poetry --version
    

    Expected response like:

    Poetry (version 1.8.3)
    
  2. Initialize to be prompted to create a pyproject.toml file:

    poetry init
    
  3. Run based on the pyproject.toml

    poetry add requests --no-interaction
    poetry update requests
    
  4. Run based on the pyproject.toml

    poetry export -f requirements.txt --output requirements.txt
    

    Instead of

    [build-system]
    requires = ["setuptools>=61.0"]
    build-backend = "setuptools.build_meta"
    

Excel handling using Dictionary object

Alternately, the Python library to work with Excel spreadsheets translates between Excel cell addresses (such as “A1”) and zero-based Python array tuple:

str = xl_rowcol_to_cell(0, 0, row_abs=True, col_abs=True)  # $A$1
(row, col) = xl_cell_to_rowcol('A1')    # (0, 0)
column = xl_col_to_name(1, True)   # $B

However, if you want to avoid adding a dependency, this function defines a dictionary to convert an Excel column number to a number:*

def letter_to_number(letters):
    letters = letters.lower()
    dictionary = {'a':1,'b':2,'c':3,'d':4,'e':5,'f':6,'g':7,'h':8,'i':9,'j':10,'k':11,'l':12,'m':13,'n':14,'o':15,'p':16,'q':17,'r':18,'s':19,'t':20,'u':21,'v':22,'w':23,'x':24,'y':25,'z':26}
    strlen = len(letters)
    if strlen == 1:
        number = dictionary[letters]
    elif strlen == 2:
        first_letter = letters[0]
        first_number = dictionary[first_letter]
        second_letter = letters[1]
        second_number = dictionary[second_letter]
        number = (first_number * 26) + second_number
    elif strlen == 3:
        first_letter = letters[0]
        first_number = dictionary[first_letter]
        second_letter = letters[1]
        second_number = dictionary[second_letter]
        third_letter = letters[2]
        third_number = dictionary[third_letter]
        number = (first_number * 26 * 26) + (second_number * 26) + third_number
    return number

REMEMBER: Square brackets are used to reference by value.

Instead of defining a dictionary, you can use a property of the ASCII character set, in that the Latin alphabet begins from its 65th position for “A” and its 97th character for “a”, obtained using the ordinal function:

ord('a')  # returns 97
ord('A')  # returns 65

This returns ‘a’ :

chr(97)

More dictionaries:

# Eastern European countries: SyntaxError: invalid character in identifier
ee_countries={"Ukraine": "43.7M", "Russia": "143.8M", "Poland": "38.1M", "Romania": "19.5M", "Bulgaria": "6.9M", "Hungary": "9.6M", "Moldova": "4.1M"}
float(ee_countries["Moldova"].rstrip("M"))  # 4.1
ee_countries.get("Moldova")   # 4.1M
len(ee_countries.items())     # 7 are immutable in dictionary
min(ee_countries.items())     # ('Bulgaria', '6.9M') the smallest country
max(ee_countries.values())  # largest country = 9.6M ?
max(ee_countries.keys())    # largest key length = Ukraine
sorted(ee_countries.keys(),reverse=True) # ['Ukraine', 'Russia', 'Romania', 'Poland', 'Lithuania', 'Latvia', 'Hungary', 'Bulgaria']
 
del ee_countries["Estonia"]
ee_countries.pop["Bulgaria"]
ee_countries["Latvia"] = "1.9M"
ee_countries.update[['Lithuania', '2.8M'],['Belarus' , '9.4M']]
ee_countries.popitem()     # remove item last added
len(ee_countries.items())  # 8 are immutable in dictionary
ee_countries["Bulgaria"]="7M"
 
ee2=ee_countries.copy()
ee_countries.clear()  # remove all
print(ee_countries)   # {} means empty

https://www.codesansar.com/python-programming-examples/sorting-dictionary-value.htm

File open() modes

The Python runtime does not enforce type annotations introduced with Python version 3.5. But type checkers, IDEs, linters, SASTs, and other tools can benefit from the developer being more explicit.

Use this type checker to discover when the parameter is outside the allowed set and warn you:

MODE = Literal['r', 'rb', 'w', 'wb']
def open_helper(file: str, mode: MODE) -> str:
    ...
    open_helper('/some/path', 'r')  # Passes type check
    open_helper('/other/path', 'typo')  # Error in type checker

BTW Literal[…] was introduced with version 3.8 and is not enforced by the runtime (you can pass whatever string you want in our example).

PROTIP: Be explicit about using text (vs. binary) mode.

with open("D:\\myfile.txt", "w") as myfile:
    myfile.write("Hello")
CharacterMeaning
bbinary (text mode is default)
ttext mode (default)
rread-only (the default)
+open for updating (read and write)
wwrite-only after truncating the file
aappend
a+opens a file for both appending and reading at the same time
xopen for exclusive creation, failing if file already exists
Uuniversal newlines mode (used to upgrade older code)

myfile.write() returns the count of codepoints (characters in the string), not the number of bytes.

myfile.read(3) returns 3 line endings (\n) in string lines.

myfile.readlines() returns a list where each element of the list is a line in the file.

myfile.truncate(12) keeps the first 12 characters in the file and deletes the remainder of the file.

myfile.close() to save changes.

myfile.tell() tells the current position of the cursor.

File Copy commands

The shutil package provides fine-grained control for copying files:

This table summarizes the differences among shutil commands:

 Dest. dir.Copies metadataPreserve permissionsAccepts file object
shutil.copyfile----
shutil.copyfileobj---Yes
shutil.copyYes-Yes-
shutil.copy2YesYesYes-

See https://docs.python.org/3/library/filesys.html

File Metadata

Metadata includes Last modified and Last accessed info (mtime and atime). Such information is maintained at the folder level.

For all commands, if the destination location is not writable, an IOError exception is raised.

Notice both individual file copy commands do not copy over permissions from the source file. Both folder-level copy commands below carry over permissions.

CAUTION: folder-level copy commands do not buffer.

Error Exception handling

Handle file not found exception : :

# if file doesn't exist in folder, create it:
import os
import sys
 
def make_at(path p, dir_name)
    original_path = os.getcwd()
    try:
        os.chdir(path)
        os.makedir(dir_name)
    except OSError as e:
        print(e, file=sys.stderr)
        raise
    finally:  #clean-up no matter what:
        os.chdir(original_path)

Operating system

There are platform-specific modules:

To determine what operating system to wait for a keypress, use sys.platform, which has finer granularity than sys.name because it uses uname:

http://code.google.com/p/psutil/ to do more in-depth research.

PROTIP: This is an example of Python code issuing a Linux operating system command:

if run("which python3").find("venv") == -1:
    # something when not executed from venv

SECURITY PROTIP: Avoid using the built-in Python function “eval” to execute a string. There are no controls to that operation, allowing malicious code to be executed without limits in the context of the user that loaded the interpreter (really dangerous):

Command generator

Create custom CLI commands by parsing a command help text into cli code that implements it.

Brilliant.

See docopt from https://github.com/docopt/docopt described at http://docopt.org

CLI code enhancement

Python’s built-in mechinism for coding Command-line menus, etc. is difficult to understand. So some have offered alternatives:

Handling Arguments

For parsing parameters supplied by invoking a Python program, the command-line arguments and options/flags:

The argparse package comes with Python 3.2+ (and the optparse package that comes with Python 2), it’s difficult to understand and limited in functionality.

https://www.geeksforgeeks.org/argparse-vs-docopt-vs-click-comparing-python-command-line-parsing-libraries/

Alternatives: to Argparse are Docopt, Click, Client, argh, and many more.

Instead, Dan Bader recommends the use of click.pocoo.org/6/why click custom package (from Armin Ronacher).

Click is a Command Line Interface Creation Kit for arbitrary nesting of commands, automatic help page generation. It supports lazy loading of subcommands at runtime. It comes with common helpers (getting terminal dimensions, ANSI colors, fetching direct keyboard input, screen clearing, finding config paths, launching apps and editors, etc.)

Click provides decorators which makes reading of code very easy.

The “@click.command()” :

\# cli.py
import click
 
@click.command()
def main():
    print("I'm a beautiful CLI ✨")
 
if __name__ == "__main__":
    main()
   

Python in the Cloud

On AWS:

Tutorials:

import boto3
s3_client = boto3.client('s3')
s3_client.create_bucket(Bucket="johnny-chivers-test-1-boto", CreateBucketConfiguration={'LocationConstraint':'eu-west-1'})
response = s3_client.list_buckets()
print(response)

On Azure:

  1. https://portal.azure.com/
  2. Sign in
  3. https://portal.azure.com/#view/Microsoft_Azure_Billing/SubscriptionsBlade
  4. https://aka.ms/azsdk/python/all lists available packages.

    pip install azure has been deprecated from https://github.com/Azure/azure-sdk-for-python/pulls

    New Program Authorization

    PROTIP: Each Azure services have different authenticate.

  5. Install Azure CLI for MacOS:

    brew install azure-cli

    https://www.cbtnuggets.com/it-training/skills/python3-azure-python-sdk by Michael Levan https://www.youtube.com/watch?v=we1pcMRQwD8

    from azure.cli.core import get_default_cli as azcli
    # Instead of > az vm list -g Dev2
    azcli().invoke(['vm','list','-g', 'Dev2'])
    

    ###



Using Digital Blueprints with Terraform and Microsoft Azure

Sets: Day of week Set handling

set([3,2,3,1,5]) # auto-renumbers with duplicates removed

day_of_week_en = ["Sun","Mon","Tue","Wed","Thu","Fri","Sat"]
day_of_week_en.append("Luv")
days_in_week=len(day_of_week_en)
print(f"{days_in_week} days a week" )
print(day_of_week_en)
 
x=0
for index in range(8):
    print("{0}={1}".format(day_of_week_en[x],x))
    x += 1

Lists

Use a list instead for a collection of similar objects.

Prefix what to print with an asterisk so it is passed as separate values so a space is added in between each value.

li = [10, 20, 30, 40, 50]
li = list(map(int, input().split()))
print(*li)

Tuples

Values are passed to a function with a single variable. So to multiple values of various types to or from a function, we use a tuple - a fixed-sized collection of related items (akin to a “struct” in Java or “record”).

PROTIP: When adding a single value, include a comma at the end to avoid it being classified as a string:

  1. REMEMBER: When storing a single value in a Tuple, the comma at the end makes it not be classified as a string:

    mytuple=(50,)
    type(mytuple)
    
    <class 'tuple'>
  2. Store several items in a single variable:

    person = ('john', 'doe', 40)
    (a, b, c) = person
    person
    a
    person[0::2]  # every 2 from 2nd item  =  ('john', 40)
    person.index(40)  # index of item containing 40 = 2
    

Range

range object and property-based unit testing.

myrange=range(3)
type(myrange)
myrange  # range(0, 3)
print(myrange)  # range(0, 3)
list(myrange)   # [0, 1, 2] from zero
myrange=range(1,5)
list(myrange)   # [1, 2, 3, 4] # excluding 5!
myrange=range(3,15,2)
list(myrange)         # [3, 5, 7, 9, 11, 13]  # skip every 2
list(myrange)[2]      # 7
print( range(5,15,4)[::-1] )  # range(13, 1, -4)
   

&LT;class ‘range’>

List comprehension

squares = [x * x for x in range(10)]

would output:

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Classes and Objects

Encapsulation is a software design practice of bundling the data and the methods that operate on that data.

Methods encode behavior (programmed logic) of an object and are represented by functions.

Attributes encode the state of an object and are represented by variables.

MEMONIC: Scopes: LEGB

Metaclasses

metaclasses: 18:50

metaclasses(explained): 40:40

Decorators

The string starting with “@” before a function definition

Decorators allow changes in behavior without changing the code.

Decorators take advantage of Python being live dynamically compiled.

There are limitations, though.

By default, functions within a class need to supply “self” as the first parameter.

VIDEO: However, decorator @classmethod enable “cls” to be accepted as the first argument:

The @classmethod is used for access to the class object to call other class methods or the constuctor.

There is also @staticmethod when access is not needed to class or instance objects.


Protocols

Generators

generator: 1:04:30

dunders with Context Manager

“For repetitive set up and tear down, use Context Managers”. -VIDEO by Doug Mercer

When a client is used in Python code, it must be closed as well. Context manager is a language feature of Python that takes care of things when you enter and exit the context.

with open("myfile.txt", r) as f:
    contents = f.read()

double underscores (“dunders”) before and after each name. enter, exit,
init, repr, len, hash, add, sub,
and, reversed, contains, format, iter, call,

Magic methods getitem, len, etc. make you code look like it’s part of the library.

Make it Iterable.

context manager: 1:22:37


https://www.codementor.io/alibabacloud/ how-to-create-and-deploy-a-pre-trained-word2vec-deep-learning-rest-api-oekpbfqpj


Secure coding

https://snyk.io/blog/python-security-best-practices-cheat-sheet/

  1. Always sanitize external data

  2. Scan your code

  3. Be careful when downloading packages

  4. Review your dependency licenses

  5. Do not use the system standard version of Python

  6. Use Python’s capability for virtual environments

  7. Set DEBUG = False in production

  8. Be careful with string formatting

  9. (De)serialize very cautiously

  10. Use Python type annotations

Insecure code in Pygoat

https://awesomeopensource.com/project/guardrailsio/awesome-python-security

https://github.com/mpirnat/lets-be-bad-guys from 2017

https://github.com/fportantier/vulpy from 2020 in Brazil

OWASP’s PyGoat is written using Python with Django web framework. Its code intentionally contains both traditional web application vulnerabilities (i.e. XSS, SQLi) and OWASP vulnerabilities The top 10 OWASP vulnerabilities in 2020 are:

• A1:2017-Injection • A2:2017-Broken Authentication • A3:2017-Sensitive Data Exposure • A4:2017-XML External Entities (XXE) • A5:2017-Broken Access Control • A6:2017-Security Misconfiguration • A7:2017-Cross-Site Scripting (XSS) • A8:2017-Insecure Deserialization • A9:2017-Using Components with Known Vulnerabilities • A10:2017-Insufficient Logging & Monitoring

Instructions at https://github.com/adeyosemanputra/pygoat

  1. Obtain the Docker image:

    docker pull pygoat/pygoat
    docker run --rm -p 8000:8000 pygoat/pygoat
    
    Watching for file changes with StatReloader
    Performing system checks...
     
    System check identified no issues (0 silenced).
    November 05, 2021 - 14:57:11
    Django version 3.0.14, using settings 'pygoat.settings'
    Starting development server at http://127.0.0.1:8000/
    Quit the server with CONTROL-C.
    
  2. In the browser localhost:

    http://127.0.0.1:8000
    

To learn how to code securely, PyGoat has an area where you can see the source code to determine where the mistake was made that caused the vulnerability and allows you to make changes to secure it.

https://owasp.org/www-pdf-archive/OWASP-AppSecEU08-Petukhov.pdf

https://rules.sonarsource.com/python/tag/owasp/RSPEC-4529 3400+ static analysis rules across 27 programming languages

Logging for Monitoring

It is estimated that it can take up to 200 days, and often longer, between attack and detection by the attacked. In the meantime, attackers can tamper with servers, corrupt databases, and steal confidential information.

“Insufficient Logging and Monitoring” is among the top 10 OWASP.

The vulnerability includes ineffective integration of security systems, which give attackers a way to pivot to other parts of the system to maintain persistent threats.

Prevent that by emitting a log entry for each activity such as: add, change/update, delete.

Use the Python logging module:

import logging

To emit each log entry, use the loggin method so that logs can be filtered by level. In order of severity:

logging.critical("CRITICAL - Can't ... Aborting!") # A serious error. The program itself may be unable to continue running. Displayed even in production runs.
logging.error("ERROR - Program cannot do it!") # A serious problem: the software is not been able to perform some function. Displayed even in production runs.
logging.warning("WARNING - unexpected!")  # The software is still working as expected. But may be a problem in the near future (e.g. ‘disk space low’).
logging.info("INFO - version xxx")  # Provides confirmation that things are working as expected.
logging.debug('DEBUG - detailed information such as each iteration in a loop used during troubleshooting at the lowest level of detail.')
   

At run-time, specify the highest level to display during that run:

python3 pylogging.py --log=INFO
   

CRITICAL, FATAL, and ERROR are always shown.

WARN (WARNING) is the default verbosity level. Set the default:

Also, provide a run-time option for outputing to a file:

logging.basicConfig(filename='app.log', filemode='w', format='%(name)s - %(levelname)s - %(message)s')
   

CAUTION: Be careful to not disclose sensitive information in logs. Encrypt plaintext.

The logging module also allows you to capture the full stack traces in an application.

-q (for -quiet) suppresses INFO headings.

-v (for -verbose) to display DEBUB messages.

-vv to display TRACE messages.

Use assert only during testing

PROTIP: By default, python executes with “debug” = “true” so asserts are processed by the Python interpreter. But in production when the program is run in optimized mode, “debug” = “true” so assert statements are ignored.

So avoid coding the sample code below which uses a comma that acts as an if/then:

def get_clients(user):
    assert is_superuser(user),  # user is not a member of superuser group
    return db.lookup('clients')

In the above code, the user ends up with access to a resource with improper authentication controls.

Instead (to remediate), use a if-else logic to implement true and false conditions.

https://app.pluralsight.com/library/courses/using-unit-testing-python/table-of-contents

VIDEO: Use the hypothesis library

Concurrency Programming

https://app.pluralsight.com/library/courses/python-concurrency-getting-started

Bit-wise operators

https://app.pluralsight.com/course-player?clipId=5802d30b-69a9-4679-8594-53854739368a

https://techstudyslack.com/ a Slack for people studying tech

Stegnography

https://packetstormsecurity.com/files/165102/Stegano-0.10.1.html Stegano implements two methods of hiding: using the red portion of a pixel to hide ASCII messages, and using the Least Significant Bit (LSB) technique. It is possible to use a more advanced LSB method based on integers sets. The sets (Sieve of Eratosthenes, Fermat, Carmichael numbers, etc.) are used to select the pixels used to hide the information.

Parallel Computing

Multithreading, Multiprocessing, Concurrency & Parallel programming in Python for high performance.

Use multiple threads, processes, mutexes, barriers, waitgroups, queues, pipes, condition variables, deadlocks, and more.

https://www.udemy.com/course/parallel-computing-in-python/

On LinkedIn Learning: “Python Parallel and Concurrent Programming2h 11m Part 1 and Part 2 (using Python 3.7.3 on Windows PC machines) by Barron Stone and Olivia Chiu Stone Advanced

Vectors instead of loops

https://medium.com/codex/say-goodbye-to-loops-in-python-and-welcome-vectorization-e4df66615a52

ODBC

Java programs used JDBC to create databases within Salesforce, Microsoft Dynamics 365, Zoho CRM, etc.

To create and read/write such databases from within Python programs running under 32-bit and 64-bit Windows, macOS, Linux, use ODBC (Open Database Connect) API functions in:

Pyodbc by Michael Kleehammer:

Functions:

Referenes

https://python.plainenglish.io/the-easiest-ways-to-generate-a-side-income-with-python-60104ad36998

https://learnpython.com/blog/9-best-python-online-resources-start-learning/

https://github.com/PacktPublishing/Python-for-Security-and-Networking https://learning.oreilly.com/library/view/python-for-security/9781837637553/ Python for Security and Networking - Third Edition by José Manuel Ortega covers the main modules we have in Python to encrypt and decrypt information, including pycryptome and cryptography. Covers extracting Geolocation and Metadata from Documents, Images, and Browsers, covers, main modules. Covers the pcapy and scapy modules to analyze network traffic and packet sniffing.


CS50P Harvard

0https://cs50.ai/chat

Videos from 15h47m47s of https://cs50.harvard.edu/python/2022:


Cybrary.it

FREE: 2h57m by Joe Perry https://app.cybrary.it/browse/course/python

CS50 Python class at Project STEM

CS Python Fundamentals AFE
Unit 0: Welcome
Unit 1: Beginning in Computer Science
Unit 2: Number Calculations and Data: Division, Built-in Functions, Random Numbers,
Unit 3: Making Decisions: Simple Ifs, Logical Operators, Else, Elif, Alogorithm
Unit 4: Repetition and Loops: Loops, Count Variables, End Loop, Range, For Loops, Counting by Other Than 1, Modeling
Unit 5: Programming in EarSketch
Unit 6: Graphics: Color Code, Loops, X&Y Coordinates, Lines, Circles, Animation
Unit 7: Functions: Parameters, return, Tracing,
Unit 8: Lists
Unit 9: 2D Lists: Declaring, Loops, Algorithms, Animating
Unit 10: Programming in EarSketch
Unit 11: Internet: IP address, DNS, Packets & Routers, Web Pages, Cybersecurity, Net Neutrality,
Unit 12: Dictionaries (Extension): Methods, Iterating, Word Frequency Analysis

https://www.youtube.com/playlist?list=PLhQjrBD2T381WAHyx1pq-sBfykqMBI7V4 CS50x 2024 Lectures

https://www.youtube.com/watch?v=8wysIxzqgPI by neetcodeio referencing jointaro.com/r/neetcode

Problem Solving for Developers - A Beginner’s Guide

VIDEO: Python in 100 seconds:

Streamlit

https://www.youtube.com/watch?v=o8p7uQCGD0U Python Interactive Dashboard Development using Streamlit and Plotly by Programming Is Fun

https://www.youtube.com/watch?v=7yAw1nPareM

https://www.youtube.com/watch?v=_Um12_OlGgw Streamlit Elements You Should Know About in 2023 by Mısra Turp

https://www.youtube.com/watch?v=9n4Ch2Dgex0


Docstrings

Google Style Docstrings

Google style uses indentation to separate sections. The basic structure is:

```python def function(arg1, arg2): “"”Summary line.

Extended description of function.

Args:
    arg1 (int): Description of arg1
    arg2 (str): Description of arg2

Returns:
    bool: Description of return value

Raises:
    ValueError: Description of when this error is raised

Examples:
    Examples should be written in doctest format and should illustrate how
    to use the function.

    >>> function(1, 'test')
    True
"""
return True

Compilers

  1. CPython is the standard and most widely used implementation of the Python programming language. It is both an interpreter and a compiler, providing a solid balance between performance and ease of use. CPython translates Python code into bytecode before executing it, which allows for excellent integration with C extensions and libraries.

  2. Pyston is a fork of CPython, with additional optimizations primarily aimed at improving the performance of large applications. It uses JIT techniques similar to PyPy but focuses on maintaining maximal compatibility with CPython.

  3. Nuitka is a Python-to-C++ compiler that translates Python code into optimized C++ executables. It can significantly improve the performance of Python applications by generating faster code while maintaining compatibility with the vast majority of Python libraries.

  4. PyPy is renowned for its performance improvements over CPython, thanks to its Just-In-Time (JIT) compiler. It aims to execute Python code faster by dynamically compiling Python bytecodes to machine code at runtime. PyPy is particularly effective for long-running processes due to its optimization capabilities.

  5. Jython compiles Python code to Java bytecode, allowing Python programs to run on the Java Virtual Machine (JVM). This makes it a great choice for integrating Python with Java, accessing Java frameworks, and using Java libraries in Python programs.

  6. IronPython is tailored for compatibility with the .NET Framework, compiling Python code to .NET Common Intermediate Language (CIL). It enables developers to use Python scripts and libraries within the .NET framework and access .NET functionalities directly.

  7. MicroPython is designed for use in microcontrollers and in constrained environments. It implements a subset of Python standards and includes specific libraries to optimize Python code to run on hardware with limited resources like RAM and processing power.

  8. Brython (Browser Python) is an implementation of Python 3 for client-side web programming via a JavaScript framework. It allows Python code to run in browsers, utilizing web APIs as seamlessly as JavaScript.

  9. Stackless Python enhances Python with support for microthreads, allowing for concurrent programming without traditional thread-related overhead. It’s particularly useful for applications requiring a large number of simultaneously active tasks, like game development or network servers.

QUESTION: What coding style would take advantage of compilers or hinder their use?

More about Python

This is one of a series about Python:

  1. Python install on MacOS
  2. Python install on MacOS using Pyenv
  3. Python install on Raspberry Pi for IoT

  4. Python tutorials
  5. Python Examples
  6. Python coding notes
  7. Pulumi controls cloud using Python, etc.
  8. Jupyter Notebooks provide commentary to Python

  9. Python certifications

  10. Test Python using Pytest BDD Selenium framework
  11. Test Python using Robot testing framework
  12. Testing AI uses Python code

  13. Microsoft Azure Machine Learning makes use of Python

  14. Python REST API programming using the Flask library
  15. Python coding for AWS Lambda Serverless programming
  16. Streamlit visualization framework powered by Python
  17. Web scraping using Scrapy, powered by Python
  18. Neo4j graph databases accessed from Python