I am my own experimental subject.
2022-05-05 | #128
Time: 8:40-9:40am
Activity: project management
Reflection: I have forgotten half of the panda... I can probably learn more efficiently if I re-familiarize myself with the panda syntax.
Motivation level: 4 out of 5
2022-05-03 | #127
Time: 9:40-10:15pm
Activity: Navigate through "fuzzy" requests.
Reflection:
Motivation level: 4 out of 5
2022-05-02 | #126
Time: 8:40-9:15pm
Activity: Continued on the web scraping techniques. Learned how to collect web information using the CSS selector. CSS stands for cascading style sheet, which is a language used to add style to html documents. "." specifies the class whereas "#" indicates the ids.
Reflection: Where did my last week go?!
Motivation level: 4 out of 5
2022-04-26 | #125
Time: 1:30-2:00pm
Activity: basic tags of html (hypertext makeup language) and how to extract information using the BeautifulSoup library. Combining "parser" with specific tags or "find_all" allows me to get to the specific "branches". These branches often are tagged with unique ID or common class name, which can also be used in conjunction with parser.
Reflection: Time is limited but I have to start somewhere.
Motivation level: 5 out of 5
2022-04-24 | #124
Time: 3:30-4:30pm
Activity: SQL practice.
Reflection: well... did I actually procrastinate by doing only the practice sets these days?
Motivation level: 3 out of 5
2022-04-21 | #123
Time: 3:30-4:30pm
Activity: SQL practice.
Reflection: Spent most the brain power on work today. decided to practice SQL so that I don't lose the rhythm!
Motivation level: 5 out of 5
2022-04-20 | #122
Time: 8:00-9:00am
Activity: finished 3 sets of SQL practice.
Reflection: apparently if the data science learning session is not the first thing I did to start my day, I would totally forget about it... 'let's do it later' just doesn't work for me.
Motivation level: 5 out of 5
2022-04-18 | #121
Time: 8:20-8:50am
Activity: finished 3 sets of SQL practice.
Reflection: mentally ready for more learning.
Motivation level: 5 out of 5
2022-04-17 | #120
Time: 9:30-10:30am
Activity: finished two practice sets of SQL.
Reflection: tried to gain the momentum back. practice session always calm me down.
Motivation level: 5 out of 5
2022-04-15 | #119
Time: 8:30-9:00am
Activity: worked with Reddit API today. Retrieved the most upvoted post or comments and post an upvote for a specific comment through API.
Reflection: I honestly had no idea what I am doing... I should probably read the reddit api document more carefully. The structure became a problem when i have to write the request for dictionary.
Motivation level: 4 out of 5
2022-04-11 | #118
Time: 5:30-6:30am
Activity: made API requests through authentication (token) and created repo by post.
Reflection: I had encountered the rate-limiting problem years ago and finally realized where went wrong. Can't remember how I obtained the token but I certainly did something stupid to it (cover my face)...
Motivation level: 4 out of 5
2022-04-10 | #117
Time: 7:40-8:45am
Activity: API and JSON. Requests data through API, determine the data type (.headers), and convert the data into JSON (.json()) for further processing.
Reflection: not sure if I fully understand the concept of API but the coding was smooth today.
Motivation level: 5 out of 5
2022-04-08 | #116
Time: 5:15-6:05am
Activity: operate sqlite3 under python with or without cursor.
Reflection: not exactly sure the reason to use cursor method since you can do the same thing without it. I should read the cursor document to figure out what I missed...
Motivation level: 5 out of 5
2022-04-05 | #115
Time: 10:10-11:10am
Activity: add columns/values to an existing table by "INSERT INTO" and update the values of a column by "UPDATE...SET..."
Reflection: An important concept that was left out from today's practice is to keep track of each updates made to the values of tables by generating the boolean expression.
Motivation level: 5 out of 5
2022-04-04 | #114
Time: 8:30-9:20am
Activity: create tables and assign primary/foreign key to certain columns of the table in a database shell.
Reflection: A chaotic week. One thing I did learn from last week is that learning should be the first and the only thing that I start my day.
Motivation level: 5 out of 5
2022-03-30 | #113
Time: 4:00-4:45pm
Activity: finished the sql project by calculating the percentage of the orders that purchase the entire albums.
Reflection: the learning routine is interrupted for 4 days. Have to figure out a way not to put work before learning.
Motivation level: 3 out of 5
2022-03-25 | #112
Time: 8:45-10:00am
Activity: finished part of the project where I analyzed the customer and the revenue generated from each country.
Reflection: Everything worked out great today. I was able to dissect the the complicate request into small, achievable steps. The key to to be patient. Write down and check every stage of coding to avoid bugs.
Motivation level: 5 out of 5
2022-03-24 | #111
Time: 8:30-9:30am
Activity: started the intermediate SQL project.
Reflection: coding is pretty smooth today. Happy!
Motivation level: 5 out of 5
2022-03-23 | #110
Time: 8:45-9:45am
Activity: debug
Reflection: The ultimate problem I have is my poor ability to break a complicated task down into small, coherent steps. I also need to be more careful about the precise definition for each syntax. For example, the differences between join versus union. union can only be used when the two datasets have equal numbers of columns and the same quality (int/float to int/float, txt to txt).
Motivation level: 5 out of 5
2022-03-22 | #109
Time: 8:45-10:00am
Activity: built nested codes.
Reflection: bugs bugs bugs. so many bugs today :(
Motivation level: 5 out of 5
2022-03-21 | #108
Time: 8:45-10:00am
Activity: Practiced how to use 'UNION', 'INTERSECT', and 'EXCEPT'. union and intersect are similar to the ideas of full join and inner join, respectively. except is a clean way to filter.
Reflection: hey I let overthinking took over me again today :(
Motivation level: 5 out of 5
2022-03-18 | #107
Time: 5:40-6:40am
Activity: practiced the usage of 'WITH' and 'CREATE VIEW'. 'WITH' separates the subqueries from the main queries in a temporary manner whereas 'CREAT VIEW' can reserve the subqueries through out the script. CREAT VIEW will be the way to go if there are some subqueries will be used repeatedly.
Reflection: It's tough today. Again I think it is the issue where I can't fluently break the task down into small, coherent steps. I am confusing myself basically.
Motivation level: 5 out of 5
2022-03-17 | #106
Time: 1:30-2:30pm
Activity: 'self join' the table, used 'pattern' to filter information, and combined 'if/then' logic to classify the dataset.
Reflection: had a hard time to focus today.
Motivation level: 3 out of 5
2022-03-16 | #105
Time: 8:30-9:30am
Activity: Joined multiple tables together via different common columns.
Reflection: writing down the strategy before coding does help.
Motivation level: 5 out of 5
2022-03-15 | #104
Time: 8:30-9:40am
Activity: Combined 'Join' with subqueries and filters.
Reflection: writing nested code is still a bit challenging to me. I think what I actually need is to formulate a procedure for breaking a complex task down.
Motivation level: 5 out of 5
2022-03-14 | #103
Time: 8:50-9:20am
Activity: learned how to concatenate two tables with "join", specifically "inner" join. That means that the end product will only keep the rows that have the corresponding definitions (ON definition). Alias can be used in conjunction with the join syntax to keep the code clean. WHERE can also be used to set restrictions on what rows to combine.
Reflection: Time is a little bit tight this morning and it seems to be a recurrent issue every Monday. Apparently I had too much fun every Sunday nigh... Have to fix this.
Motivation level: 3 out of 5
2022-03-13 | #102
Time: 10:30-11:30am
Activity: finished the guided project and the practice questions for sql. I learned how to implement sql in the jupyter notebook and extract the table information.
Reflection: had tons of fun using sql so far. Have to figure out a way to use it in my work
Motivation level: 5 out of 5
2022-03-11 | #101
Time: 8:30-9:20am
Activity: learned how to write nested logic with subqueries. 'IN' is similar to 'OR' but it makes the code more concise.
Reflection: The concept of SQL is quite natural to me but I definitely need some more practice to be fluent.
Motivation level: 5 out of 5
2022-03-10 | #100
Time: 8:30-9:30am
Activity: learned the higher-level structure of a query: SELECT > FROM > WHERE > GROUP BY >HAVING > ORDER BY > LIMIT. Where can only be used before group by. I also learned how to apply the 'if/then' logic in SQL. In a if/themn logic clause, you start with CASE, follow by the condition WHEN (cond.) THEN 'value'. It can be combined with 'ELSE'. The logic clause ends with 'END' AS 'alias', which adds a new column named by the alias in the dataste. You can redefine column type from integer to float by CAST and define the number of decimal by 'ROUND()'. This step is critical when calculating the ratio because SQLite will automatically round up the number to 0 or 1 if you don't define the column as float.
Reflection: I am loving SQL so far but have no idea how to 'deposit' data in SQL. wired....
Motivation level: 5 out of 5
2022-03-09 | #99
Time: 8:45-9:45am
Activity: applied simple operators such as count, max, min to specific columns. You can also count the length of the column name by 'length'. For presentation or comprehension purpose, we can rename the column or the result of calculation by alias (AS with the quotation of the new name). These operators can build on top of each other. A simple way to concatenate the text is to combine single quotation with ||, for example, 'Major: ' || LOWER(Major).
Reflection: The COUNT function reminds me of the 'value_count' syntax for panda dataframe operation. There are probably more SQL operators that worth exploring. p.s. It snows heavily today. so beautiful!
Motivation level: 5 out of 5
2022-03-07 | #98
Time: 8:30-9:30am
Activity: learned the concept of structured query language (SQL) under the context of SQLite: how to write clauses with reserved words, and filter the results with operators, and sort them with ascending or descending orders.
Reflection: Finally! the SQL lesson I've been waiting for. The content has been quite straightforward so far.
Motivation level: 5 out of 5
2022-03-06 | #97
Time: 4:05am-5:15pm
Activity: three standard stream types: stdin, stdout, and stderr. they can be directed both ways using > or <. Transliteration (tr) can replace characters of strings.
Reflection:I tried to set up my kettelkasten system today with neuron. The installation employed the command line. It always makes me feel good when I can do something I couldn't do before.
Motivation level: 5 out of 5
2022-03-05 | #96
Time: 10:55am-12:05pm
Activity: learned how to 'redirect' the files and how to 'pipe' the commands. To create the empty files, I can use either 'echo -n "" > filename' or "touch filename". I can discard the output of a command by redirect it into /dev/null.
Reflection: I like the concepts of the redirection and the pipeline. It makes the code concise and clean.
Motivation level: 5 out of 5
2022-03-04 | #95
Time: 8:30-9:20am
Activity: Formatting text using command line: cat, cut, sort, and prep.
Reflection: What to do if you want to cat two texts with different numbers of column?
Motivation level: 5 out of 5
2022-03-03 | #94
Time: 8:40-9:40am
Activity: file inspection. Some syntaxes are very similar to python, like head and tail. Some syntaxes are new to me: column (list the number of columns), wc (counting the characters), and shuf (sampling).
Reflection:
Motivation level: 5 out of 5
2022-03-02 | #93
Time: 9:00-10:00am
Activity: learned how to check manuals using command lines, how to assign a new name for an existing command (alias), and how to navigate in the 'less' environment.
Reflection: why '--help', 'whatis', 'man' coexist? What's the advantage of doing so?
Motivation level: 5 out of 5
2022-03-01 | #92
Time: 7:45-8:30am
Activity: change the owner/groups of the files, overwrite the permission of access by the syntax of 'sudo'. Use stat, id, groups, whoami to get the file info.
Reflection: a bit rush over the concept of root access. have to find time to revisit.
Motivation level: 5 out of 5
2022-02-28 | #91
Time: 7:10-8:10am
Activity: change the permission of the files through command lines.
Reflection: what is the difference between 'change' and 'set' in the world of user system?
Motivation level: 5 out of 5
2022-02-27 | #90
Time: 9:00-9:45am
Activity: practice the usage of wildcard and global pattern expression. [] for match, / for escaping, ? for single character, and * for any character/length.
Reflection: if life can be this simple ;)
Motivation level: 5 out of 5
2022-02-26 | #89
Time: 2:30-3:10pm
Activity: move, copy, delete, rename folders or files using the command line.
Reflection: today's content is relatively familiar. consider it as a weekend break.
Motivation level: 5 out of 5
2022-02-24 | #88
Time: 7:40-8:40am
Activity: continue learning the unix file systems, like the absolute/relative directories, permission, etc.
Reflection: It's interesting to realize that I don't actually 'fully' understand many things that I thought I knew. I've been using these commands without actually knowing the design and the concept behind them.
Motivation level: 5 out of 5
2022-02-23 | #87
Time: 10:10-11:10am
Activity: learning the command line: 'command' + '[options]' + [argument N]. The option can be long or short and can be stacked. arguments can be multiple. History can be t=retrieved based on the index.
Reflection: I've always like the terminal environment. Look forward to learning more.
Motivation level: 5 out of 5
2022-02-22 | #86
Time: 9:00-10:00am
Activity: Finished the guided project: examined the correlation between the rating of the six star wars movies and the box office record (how many people actually watched the movie). Also, how the rating may differ by gender.
Reflection: the step of data cleaning is usually the bottle-neck. Once the data structure is organized, the analysis is quite straight forward. Should be as organized as possible when making recording of anything including the experimental results!
Motivation level: 5 out of 5
2022-02-21 | #85
Time: 6:00-7:10am
Activity: started my sixth guided project: the behavior of star wars fans. Mainly cleaning the data today: examined the structure of the dataframe, converted the yes-no answers into Boolean type, and filled the nan.
Reflection:may the force be with me
Motivation level: 5 out of 5
2022-02-18 | #84
Time: 11:35am-12:55pm
Activity: finished the data cleaning challenge. Calculated how many times each avengers has died and returned since 1960.
Reflection: New studying time. Hope it will fit better for the (future) working schedule.
Motivation level: 5 out of 5
2022-02-17 | #83
Time: 8:35-09:55am
Activity: finished the guided project. I examined the correlations of the average SAT score of the NYU high schools with the race, the gender, and the percentage of AP exam takers.
Reflection: The "preparation" steps in the guided project makes me realized that there is still a gap that I need to cross in order to handle the real-world project myself. I mean, I can probably do individual steps just find if there is an instruction but how to adapt these steps in different projects is something I really need to practice. I should probably find some interesting data on kaggle to build my own project.
Motivation level: 5 out of 5
2022-02-16 | #82
Time: 9:45-10:45am
Activity: analyze the correlation between the average sat score and the school district.
Reflection: just realized that I've forgotten some of the dataframe indexing syntax. Have to practice.
Motivation level: 5 out of 5
2022-02-15 | #81
Time: 9:30-10:30am
Activity: practice df.merge, df.groupby, and df.fillna.
Reflection: Whew~ It's great to be back! Already felt somewhat strange to some of the syntax but it will only gets better from now on.
Motivation level: 5 out of 5 (Hello east coast!)
2022-01-25 | #80
Time: 5:00-6:20am
Activity: cleaning data to make sure that each row of the dataframes that I am going to combine has a unique ID.
Reflection: Exhausted from the crazy schedule these days. It's just temporary. Everything will be more manageable soon. I am learning how to keep my learning routine while taking care of all the other responsibilities in my life.
Motivation level: 4.5 out of 5 (because I barely could keep my eyes open...)
2022-01-20 | #79
Time: 8:00-9:00am
Activity: finished cleaning the dataset and ready to merge them. Learned how to extract location information.
Reflection: I should figure out a way to practice regular expression more.
Motivation level: 5 out of 5
2022-01-19 | #78
Time: 5:00-6:00am
Activity: resumed the practice of project building. One of the later step is to concatenate the datasets collected from difference sources. To do so, I had to first make sure that the rows in each dataset has a unique but unified reference code. What I did today basically is to check whether such code exists in each dataset and most importantly whether these codes are in the right format. One of the dataset doesn't have the code so I have to extract the information from other columns to generate the codes for each row in this dataset.
Reflection: data cleaning process is getting smooth.
Motivation level: 5 out of 5
2022-01-18 | #77
Time: 6:30-8:00am
Activity: reading the article about encoding: https://kunststube.net/encoding/
Reflection: Such a fun read! I finally understand the difference between UTF-8, Unicode or ASCII.It certainly created another red pill moment ;P
Motivation level: 5 out of 5
2022-01-17 | #76
Time: 4:15-5:00am
Activity: learned how to initiate a data science project: come up with your question and approach angle, identify available datasets that are related to the sectors, clean the dataset, and do the analysis. I was also reading the article that introduces the idea of encoding.
Reflection: This is what I have always wanted to do: explore the world with my own questions. I can't wait to be more independent about building projects.
Motivation level: 5 out of 5
2022-01-15 | #75
Time: 6:30-7:30am
Activity: still working on the missing data. use correlations between each columns to determine if we can fill in more null entries with estimated values.
Reflection: lost my learning rhythm a little bit. need to work on my regular expression a bit more.
Motivation level: 4 out of 5 (physically tired...)
2022-01-13 | #74
Time: 6:00-7:50am
Activity: practiced imputation. imputation is part of the data cleaning process. It means that we identify the missing or inconsistent data and decide if we either drop the data or fill in with estimation (series.mask(bool, replacement)). We can also visualize the missing data and the correlation between each columns with seaborn.heatmap.
Reflection: it is the second time that the course circled back to the concept of imputation. This time the process got more sophisticated. Rather than simply filling in the empty slots, we first made and evaluated the 'estimation' before using it to replace the null value.
Motivation level: 5 out of 5
2022-01-12 | #73
Time: 7:00-8:15am
Activity: learned how to convert JSON into pandas format (pd.DataFrame) and how to simplify code with "lambda" (including the ternary operator).
Reflection: finally! I've been using lambda for a while but didn't really know how lambda actually work. I saw it a lot in other people's code and roughly know (okay, guess) what it meant and how to use it. It's great to officially learn the definition of lambda.
Motivation level: 5 out of 5
2022-01-11 | #72
Time: 4:20-6:00am
Activity: learned the concept of JavaScript Object Notation (JSON) and list comprehension. From a python perspective , JSON can be thought as a collection of python objects nested inside each other. json.load"s" is to convert JSON data contained in a string to the equivalent set of python objects (opposite is the json.dump()). json.load is to load JOSN data from a file object. List comprehensions provides a concise way of creating lists in a single line of code. Three common applications of list comprehensions are transforming, creating, and reducing a list.
Reflection: felt great to get back on the routine...
Motivation level: 5 out of 5
2022-01-10 | #71 (Practice)
Time: 5-6am
Activity: practice regular expression
Reflection: regular expression is fun
Motivation level: 5 out of 5
2022-01-09 | #70
Time: 6-7am, 1:40-2:40pm
Activity: finished the section of regular expression. use str.extract or replace to obtain information or clean data.
Reflection: experienced a learning turbulence in the past couple of days. I studied at the irregular hours and didn't keep the log. I kept telling myself that the most important thing is that I didn't stop. However, it still left a bad taste in my mouth when my routine was interrupted. Anyway, lesson learned. Regarding to the regular expression, it took me much longer time to digest the concept and I still don't feel that I fully grasped the whole picture. That means more practice! It's like solving a puzzle and do a 6-word novel in one so I enjoy writing the regular expression patterns.
Motivation level: 5 out of 5
2022-01-05 | #69
Time: 5:50-6:15pm
Activity: dive deeper in the world of regular expression. At this point, I am still at the stage of [set].
Reflection: can't wait to learn more about regex! I need more time!
Motivation level: 5 out of 5
2022-01-04 | #68 (Practice)
Time: 5:15-6:00pm
Activity: finished half of the problem set.
Reflection: time is not on my side this month...
Motivation level: 4 out of 5
2022-01-03 | #67
Time: 5:15-6:30pm
Activity: finished my project. Plot correlation between the level of job satisfaction and the length of employment. It seems that people who resigned in less than three year generally have lower level of job satisfaction. It could be that people are just 'trying' out the job. They will leave as soon as they know the job is not for them. However, I did notice that most of the people who left the company in less than three years resigned around 2011-2013. In other words, there was a 'peak' of resignation during the year of 2011-2013. Clearly, something critical for pushing people's decisions to leave had happened.
Reflection: Have to figure out a way to better use the Jupiter Notebook. Loading files had taken too long.
Motivation level: 4 out of 5
2022-01-02 | #66
Time: 2:00-3:00pm
Activity: continued building the project: analyzed the 'separationtype' (reason for ceasing the employment) and decided to only analyze the participants who resigned. I was able to apply 'regex' when performing the row indexing. The code looked nice and concise.
Reflection: can't say it was the best learning session because I had a hard time to focus. I was pre-occupied with all the experiment and moving stuff...
Motivation level: 4 out of 5
2022-01-01 | #65
Time: 7:00-8:30am
Activity: started my 6th project which is to analyze how the length of the employment is related to the satisfaction level. So far, I have been cleaning the datasets, including renaming the columns, checking the missing values, and dropping the irrelevant columns.
Reflection: New year new start. Felt great!
Motivation level: 5 out of 5
2021-12-31 | #64
Time: 2:30-3:30pm
Activity: learned how to handle the missing data and established the data cleaning workflow. Several decision-making steps for handling the missing data: (1) how are the missing data generated? (transformation errors? no entires?) (2) Can we fill in the missing data from other source? (fixed value) (3) Should we drop the columns/rows with missing value? (4) Can we fill estimated value for the missing data? These decisions should be determined based on the 'goal' of the projects. What is the information we are trying to dig out here? Will fill in /drop these missing values affect the analysis?
Reflection: not the best learning day. I had a hard time to focus. The learning session will go back to morning tomorrow!
Motivation level: 4 out of 5
2021-12-30 | #63
Time: 8:00-8:10am
Activity: only finished reading the csv...
Reflection: tried to finish the learning session in a tire repair shop is just not the best idea at all...
Motivation level: 2 out of 5
2021-12-29 | #62 (Practice)
Time: 6:00-7:15am
Activity: finished one problem set. The problem set covers the practice of dataframe operations, including nlargest, series.splicing, pivot_table.
Reflection: currently it takes me an hour to finish a problem set. Have to pick up the speed!
Motivation level: 5 out of 5
2021-12-28 | #61
Time: 7:30-8:30am
Activity: learn how define general expression r"(?p<name>[options]{times})" () defines the 'group', str.extractall() will identify all the substrings that fit the genex. series '+' series will skip the nan and concat two strings.
Reflection: two changes really improve my learning efficiency: active recall right after the learning session and recap again in the evening. Will spend tomorrow's learning session for hands on practice.
Motivation level: 5 out of 5
2021-12-27 | #60
Time: 8:30-9:30am
Activity: learn the differences between each vectorizing methods: df.apply() versus series.(function). Because df.apply() will not skip null entires, I should be extra careful when I use it to tidy the dataset. I also learned how to use 'regex' (regular expression) to screen the string content.
Reflection: schedule is a bit tight today. didn't have time to finish the whole session :(
Motivation level: 5 out of 5
2021-12-26 | #59
Time: 5:30-6:50am
Activity: 'tidy' data with the function of 'df.melt()'. 'melt' allows me to vectorize the dataset. 'series.apply()' or 'series.map()' have similar function, which is to loop through the series and apply the specified function to each element in the series. 'df.applymap()' allows me to perform the same task on a dataframe instead of a series. Be careful about the 'df.apply()' because it performs the specified function to the 'columns'.
Reflection: had a hard time to focus this morning probably because I was not fully awake yet. Have to readjust my routine to avoid the foggy brain situation. I've been doing this project for 59 days!
Motivation level: 5 out of 5
2021-12-25 | #58 (practice)
Time: 8:00-9:30am
Activity: went through two problem sets. These two problem sets covers the basis operations for 'list'.
Reflection: Can't seem to find time to 'practice' so I am seriously thinking to spend one or two 'learning' sessions purely for practicing. It might be helpful to do solid practice once in a week or every 3 days? Still debating. Maybe I will try both and see which one works better. From today's on I will spend one learning session on the practice problem set every three days.
Motivation level: 5 out of 5
2021-12-24 | #57
Time: 8:40-9:40am
Activity: practice the differences between 'pd.concat' and 'pd.merge'. There are four ways to combine two dataframes together: outer, inner, left, and right. The default of 'concat' is 'outer'. That means, no matter whether the size of the two dataframes are consistent, concat will join the two and fill the inconsistent spaces with nan. Two things that concat can do but 'merge' can't: (1) concat can join multiple dataframes at the same time but 'merge' can only do two at once; (2) 'concat' can join dataframes in 'vertical' axis (axis=0). On the other hand, 'merge' allows you to 'selectively' join two dataframes. Yu can choose to select only consistent rows to join or stick with the index of the left or the right dataframe. You can relabeled the joined dataframes using the object'suffixes'.
Reflection: how to join the dataframes based on the ranks?
Motivation level: 5 out of 5
2021-12-23 | #56
Time: 11:30am-1pm
Activity: practice how to aggregate data by 'df.groupby().agg()' or 'df.pivot_table()'. The idea is to group the data by certain categories, apply function(s), and recombine the results. 'df.groupby().agg()' and 'df.pivot_table()z' have the same function. Personally I prefer pivot_table because it makes the code more concise and therefore easier to read.
Reflection: These two might be my favorite syntax ;P imagine if you can apply it in real life as a spell or something XD
Motivation level: 5 out of 5
2021-12-22 | #55
Time: 8:30-10:30 am, 1:00-1:45pm
Activity: I finished my 5th project!
Reflection: So much fun!!
Motivation level: 5 out of 5
2021-12-21 | #54
Time: 5:30-7:30 am
Activity: started the fifth project: analyze the exchange rate between euro and dollars. I cleaned the data and decided what story I wanted to tell. I will analyze the rate changes during the three presidency period: Bush, Obama, and Trump. I also learned how to calculate the rolling means .
Reflection: I still forgot some basic syntaxes. That means I need more practice!
Motivation level: 5 out of 5
2021-12-20 | #53
Time: 5:30-6:30 am
Activity: generate 'FiveThirtyEight' style plots (matplotlib.style, use.style(), style.available), redefine the position of the y-labels(make a dictionary for the labels and corresponding coordinates, use for loop and dic.items() to generate y labels), use different colors to label positive or negative values (first define a boolean list, map the color values onto the boolean list), add signature (ax.text(background)).
Reflection: I guessed it is a process of learning how to polish the figures. I enjoyed it and can't wait to start my fifth project tomorrow!
Motivation level: 5 out of 5
2021-12-19 | #52
Time: 6:00-7:00 am
Activity: learning the concept of the Gestalt Principles: proximity, similarity, familiarity, connections. How we can guide readers's focus to the story of the figures.
Reflection: Gestalt psychology is interesting. The principles can be extended to 'problem solving'. I wanted to read more on this subject. "Gestalt theories of perception are based on human nature being inclined to understand objects as an entire structure rather than the sum of its parts." "productive vs. reproductive thinking"
Motivation level: 5 out of 5
2021-12-18 | #51
Time: 11:15am- 12:38 pm
Activity: practice how to 'plot' a story. Conceptually, a story involves something 'developed' overtime ( that is , the 'evolution' of the data). In addition to maximize the data-ink ratio, I should also think about how to display the evolution of the data, for example, plotting the 'progress'. To achieve this goal, I can combine different plot parameters to emphasize the progress. Today I practiced how to alter the transparency of the color as well as how to add critical text information in the plot to help the audiences grasp the message at one glance.
Reflection: I am getting more comfortable of using various matplotlib syntax. It's kinda fun and I love love love beautiful and concise plots!
Motivation level: 5 out of 5
2021-12-17 | #50
Time: 6:00- 7:00 am
Activity: Design a figure from the reader's point of view. Today I learned how to maximize the ration of information ink. When plotting a figure, we should always consider what the most important message is and how much 'ink' is used to print this message. In other words, get rid of the distractions as much as possible so that the message stands out immediately to the reader. I learned to separate the axes from the canvas. Remove the spines (box), remove the ticks, align the tick labels (remove the tick label and then enter the new text at designated locations), change the text color and weight, add reference line (ax.axvline()).
Reflection: Be 'mindful' for everything you do.
Motivation level: 5 out of 5
2021-12-16 | #49
Time: 6:30- 8:00 am
Activity: I finished my 4th project! In this project, I tried to find out what factors has correlations with heavy traffic on the westbound I-94. The logic flow was first to see if the traffic is heavier in certain time of the year, then whether the traffic is heavier during weekdays or weekends (and in which hour of a day), and lastly, if heavy traffic is always associated with certain weather description. I used the correlation syntax, bar plot, and scatter plot to visualize the analysis and learned how to define the size of the subplots.
Reflection: making figures is so fun!
Motivation level: 5 out of 5
2021-12-15 | #48
Time: 5:30- 6:30 am
Activity: started my fourth project! clean the data and made histograms.
Reflection: (1) Apparently I had difficulty distinguishing 'AND' and 'OR'... (What happened to me...) (2) Had an idea this morning. I need to find a 'real world' project to practice the data analysis skills I've learned so far. Why not record and analyze my own behaviors? Maybe this is the way to find my own blind spots.
Motivation level: 4 out of 5
2021-12-14 | #47
Time: 9:10- 9:35 pm
Activity: making 'relationship plots' using seaborn. Basically, seaborn allows you to visualize the relationships between different variables. Plot the first two variables as x and y. Change the color of each data points according to the 3rd variable (hue/palette). Vary the size of each data point for the 4th variable (size/sizes). Categorize the data points according to the 5th variable (style/markers). Separate the data points by the 6th variable (col). Technically, you can investigate the relationships between 6 variables in one figure!
Reflection: Just finished reading the book of 'ultralearning'. Decided to include the practice of 'active recall' in the activity section to boost my learning efficiency.
Motivation level: 5 out of 5
2021-12-13 | #46
Time: 5:00- 6:35 am
Activity: making subplots, how to 'zip'
Reflection: made a mistake on the causality interpretation. need to find time for practice!
Motivation level: 5 out of 5
2021-12-12 | #45
Time: 6:30-8 am
Activity: bar plot (vertical or horizontal), plot histograms and make interpretations (normal, even, right/left-skewed).
Reflection: Have to find time for practice. 1.5hr of studying is only enough for finishing the lectures.
Motivation level: 5 out of 5
2021-12-11 | #44
Time: 5:50-6:55am, 7:15-7:45pm
Activity: scatter plot, pearson's correlation. I also finished one practice problem set.
Reflection: I l.o.v.e. scatter plots.
Motivation level: 5 out of 5
2021-12-10 | #43
Time: 5:45-7:25am
Activity: practice very basic plot function in matplotlib. pyplot and finished a set of practice problems.
Reflection: started plotting! finally! so excited. I have to spend time on finishing the problem sets because many functions that were not covered in the lectures are in there to learn.
Motivation level: 5 out of 5
2021-12-09 | #42
Time: 5:20-6:40am
Activity: finished the project 3. The last activity of this project is to investigate if the reselling price of used cars is correlated with the average mileage. My conclusion is no. I also used the remanning time to further practice indexing.
Reflection: should definitely practice more. 15% done and 85% to go! Coding is becoming part of my daily routine and it feels great!
Motivation level: 5 out of 5
2021-12-08 | #41
Time: 8:20-9:50am
Activity: working on the third project. I analyzed the mean price of the used cars for the top 20 brands. To calculate the mean, I have to first sort the dataset by brands. Because some cars were sold for free (price ==0), I also calculated the percentage of the cars sold for free for each brand and identified the brands that have the highest percentage of cars sold for free or have no car sold for free. I learned how to compare the keys of two dictionaries and how to generate index list.
Reflection: progress is a bit slower than I hoped. Have to pick up the pace.
Motivation level: 5 out of 5
2021-12-07 | #40
Time: 8:30-10:00am
Activity: continue working on the third project. practiced how to convert the text into datetime, how to calculate the percentage of each value in a specified column, how to rank the data entries based on the date.
Reflection: got up late today. Still need more practice to increase my coding speed. 2/3 in my third project and 1/3 to go!
Motivation level: 5 out of 5
2021-12-06 | #39
Time: 4:00-5:00am
Activity: remove outliers in the dataset based on the values of 'price' and 'odometer_km'
Reflection: still debating what the best way is to define 'outliers'. the dataset is obviously very left-skewed. How to avoid bias?
Motivation level: 5 out of 5
2021-12-05 | #38
Time: 6:20-7:50am
Activity: started my third project! Hooray! still at the data cleaning stage but it has been smoother than I expected.
Reflection: better practice more! I am getting better at editing the markdown cells.
Motivation level: 5 out of 5
2021-12-04 | #37
Time: 5:00-5:50am, 11:30am-12:20pm
Activity: practice data cleaning in pandas: str.replace, series.astype(), df.rename(dictionary), df.dropna(axis=0 or 1)
Reflection: better practice a bit more before jumping into the project.
Motivation level: 5 out of 5
2021-12-03 | #36
Time: 4:00-5:35am (I know. I couldn't sleep so I decided to get up and code...)
Activity: practice aggregation and loop in panda"S". (I have kept calling it "panda". What was I thinking?)
Reflection: I spend the time I should have been learning yesterday on debating whether I should continue Dataquest or I should switch to the Caltech coding bootcamp content. I shouldn't have done that. Nothing wrong with Dataquest. The original purpose is to learn and to acquire coding habit without pressure. Dataquest is perfect in this regard. I may go over the bootcamp materials whenever time permits. Learning is an iterative process.
Motivation level: 5 out of 5
2021-12-01 | #35
Time: 6:10-7:30am
Activity: practice method-chaining in pandas.
Reflection: panda is life-changing...
Motivation level: 5 out of 5
2021-11-30 | #34
Time: 6:45-7:30am
Activity: finish the pandas basis.
Reflection: I like pandas, a lot. Can't wait to learn more how it can do. Will squeeze out some time for syntax practicing today!
Motivation level: 5 out of 5
2021-11-29 | #33
Time: 6:45-7:30am
Activity: use panda to read csv, how to slice pandas objects
Reflection: Rather than counting the number of columns, pandas allows you to index using "string". This makes everything so much clear.
Motivation level: 5 out of 5
2021-11-28 | #32
Time: 6:10-7:25am
Activity: numpy.genfromtxt() and Boolean array
Reflection: I understand the concept but have not reach the 'fluency' that I hope to achieve. practice!
Motivation level: 5 out of 5
2021-11-27 | #31
Time: 6:35-7:20pm
Activity: learned the concept of vectorization, ndarray slicing, ndarray calculation, function vs. method
Reflection: Do I miss Matlab? Nope.
Motivation level: 5 out of 5
2021-11-25 | #30
Time: 12:50-1:55am
Activity: I finished my second project. The project was trying to understand which type of posts on hn receive more comments and whether there is a time of the day when the posts would receive more comments.
Reflection: I finished the project without difficulty. In fact, I finally understand how to use 'sorted' properly for list or dictionary. Feel good about it. Practice is the only way to get fluent in a language. Today also marks the 30-day anniversary for my dataquest adventure. 1 step down, 7 steps to go!
Motivation level: 5 out of 5
2021-11-24 | #29
Time: 7:10-8:30am
Activity: started my second project!
Reflection: compared to the first project, I felt I am more efficient this time. Can't wait to finish it.
Motivation level: 5 out of 5
2021-11-23 | #28
Time: 7:30-8:30am
Activity: differences between module/class/constructor, organize the date entries with datetime.strptime (formatting) and datetime.strftime (retrieving).
Reflection: coding might be the easiest way to make your life easier...
Motivation level: 5 out of 5
2021-11-22 | # 27
Time: 8:00-9:50am
Activity: learned the concept of the object-oriented programming. practice how to create new object class, how to assign an attribute at instantiation, and how to define a method inside a class
Reflection: I still don't fully grasp the point of '__init__()' but I guess it becomes more useful as the number of methods defined in a given class increases.
Motivation level: 5 out of 5
2021-11-19 | # 26
Time: 6:30-7:30am
Activity: practice how to format the text (format doc)
Reflection: Coding is really rewarding. I can see how people get addicted to gaming or cooking. But I can't wait to work on real projects. Need something more challenging.
Motivation level: 5 out of 5
2021-11-18 | #25
Time: 8:30-9:10pm
Activity: reduce dimensionality (list slicing + string stitching) and dictionary (frequency table).
Reflection: feel that I was basically repeating the same steps for last project. learned a new syntax: isinstance(input, int (or float)).
Motivation level: 4 out of 5 (Side benefit of coding: calming)
2021-11-17 | #24
Time: 5:30-6:40pm
Activity: data cleaning part II > replace the substring, split the string, convert the string.
Reflection: it is always a good rule of thumb to test the new function with the makeup dataset before you apply it to the real dataset. My new life motto: " What is the logic behind your decision? "
Motivation level: 5 out of 5
2021-11-16 | #23
Time: 9:30-10:10pm
Activity: practice the data cleaning basics, e.g., import dataset and replace substrings
Reflection: have not figured out how to iterate through every column of each row in a list. The goal is to write a function that will automatically check and correct all the unwanted substrings.
Motivation level: 4 out of 5
2021-11-15 | #22
Time: 5:10-6:30am
Activity: execute the plan and it went well.
Reflection: Became better at 'sorting', which is my favorite thing to do haha. I am basically done with my first Data science project. but I want to polish my code and share it before moving onto the next stage.
Motivation level: 5 out of 5
2021-11-14 | #21
Time: 9:11-10:14am
Activity: determine the strategy for analyzing the dataset. The goal is to recommend an app genre for developing a new ios app. My rationale is that the recommendation should based on 'popularity', 'targeted user age', and 'rating'. I hypothesized that the result may vary depending on whether the analysis is done based on the 'accumulative rating count over all versions' or 'rating count for the current version only', so I am going to separate my analysis.
Reflection: Planning is fun! Can't wait to start doing the analysis.
Motivation level: 5 out of 5
2021-11-13 | #20
Time: 7:40-9:00am
Activity: debug. agh!
Reflection: debug is fun but I wonder if it is actually bad that I spent 80% of my time on 'nice to have' features? (question of my life I guess...)
Motivation level: 4 out of 5
2021-11-12 | #19
Time: 7:00-9:20am
Activity: removed the non-free apps from the dataset and determined which genre of apps is most popular
Reflection: spent more time than expected on sorting the data. After finishing the dataset, I want to try if I can compile all the required steps for cleaning data into one function.
Motivation level: 5 out of 5
2021-11-10 | #18
Time: 7:40-9:00pm
Activity: write a function to remove non-English apps in the dataset.
Reflection: tbh I am exhausted but I really want to keep my coding routine. I did it and I am HAPPY!
Motivation level: 5 out of 5
2021-11-09 | #17
Time: 9:10-11:00am
Activity: remove the duplicated row which has lower review numbers (if the review number is equal, keep the first entry)
Reflection: "AND" and "&" mean two different things. Interesting...
Motivation level: 5 out of 5
2021-11-08 | #16
Time: 5:00-6:50am
Activity: find duplicated entries in the dataset
Reflection: can't believe that I've already forgotten how to make a frequency table.... It only means that I should practice more!
Motivation level: 4 out of 5
2021-11-07 | #15
Time: 6:00-6:50am, 4:40-5:15pm
Activity: successfully debug! (yay!) I also wrote a new function to merge the steps of checking and removing.
Reflection: It felt great when you learned something new in the first hour after you woke up!
Motivation level: 5 out of 5
2021-11-06 | #14
Time: 6:30-7:00am, 8:00-9:15pm
Activity: write functions to clean the dataset
Reflection: (1) learning efficiency was low... tried to figure out how to extract the variable name as a string but have not found a solution so far. (2) not sure why my loop ran twice... (3) decided to change a way of tracking so I don't get discouraged by my perfectionism. Keep going!
Motivation level: 3 out of 5 (debugging is just not my favorite Saturday activity)
2021-11-04 | #13
Time: 5:30-7:15am
Activity: Started my first project! defined the goal of the project and prepare the dataset for analysis
Reflection: excited! can't wait to analyze the data.
Motivation level: 4 out of 5 (felt the gravity of other deadlines...)
2021-11-03 | #12
Time: 7:30-8:00pm
Activity: set up Anaconda
Reflection: It was a long day. Lots of things happened but I am happy that I still put in time to continue learning.
Motivation level: 3 out of 5
2021-11-02 | #11
Time: 6:30-8:00am
Activity: (1) learning how to set up jupyter notebook (2) I self-taught how to change the variable name for each loop! so excited!
Reflection: (1) efficiency was a bit low but the creativity was high today. (2) didn't figure out why the sequence was inverted when I ran the loop...
Motivation level: 4 out of 5 (physically tired from the intense experiments in the past two days...)
2021-11-01 | #10
Time: 4:00-5:30am
Activity: the concept of tuple, define function with multiple inputs/flexible outputs, global vs local variables, python documentation
Reflection: (1) sleepy but managed to went through the learning session just fine. excited about learning how to define multiple inputs/outputs for a function. (2) I didn't forget a single ":" today so I counted that as an improvement.
Motivation level: 5 out of 5
2021-10-31 | #9
Time: 8:30-10:30am
Activity: how to define a function, parameter vs argument, debug
Reflection: (1) Can't get over the excitement that I am writing python functions! things are getting more and more interesting! (2) putting in solid two hours learning during weekend morning seem working well for me. It might be luxury and sometime impossible to do the same during the weekdays but I should try to do it every weekend. (3) remember to add ":". remember it remember it remember it!
Motivation level: 5 out of 5
2021-10-30 | #8
Time: 6:30-8:40am
Activity: dictionary, frequency table
Reflection: A good studying day. Morning is best. Felt that my python is improving.
Motivation level: 5 out of 5
2021-10-29 | #7
Time: 7:30-8:40pm
Activity: dictionary, if/else/elif (I finished the python training I. Yay!)
Reflection: felt really sleepy after 40min of study. Decided to get up and walk around. Felt much better and finished the desired progress today! Learnt two things: (1) avoid studying at night as much as possible. (2) refill the oxygen of the brain by walking is a great solution to fatigue.
Motivation level: 4 out of 5 (mostly because I was tired after a day of work...)
2021-10-28 | #6
Time: 4-5:30am
Activity: logic operator
Reflection: (1) I constantly forget to add ":" and indent for the logic operator. (2) I didn't study in the past two days because I felt exhausted after the workshop. I don't want to break my study routine for more than two days so I decided that the first thing I would do when I woke up this morning was to study. It worked.
Motivation level: 5 out of 5 ( I am almost reaching the end of the first training session. Yay!)
2021-10-25 | #5
Time: 9:50-10:20pm
Activity: finally, the for loop!
Reflection: Another tough day. Happy that I managed to finish learning before bed. Not quite sure if I fully grasp the concept of "open file".
Motivation level: 3 out of 5
2021-10-24 | #4
Time: 7-8pm
Activity: learn how to make a list, open/read csv files, retrieve values from a list (indexing)
Reflection: Had a tough time to focus. Have been in Janelia Junior Scientist Workshop starting 7am in the morning and gave a presentation in the afternoon. Physically tired. Decided to stop at 1hr mark. My learning efficiency is low.
Motivation level: 4 out of 5
2021-10-23 | #3
Time: 5:45-7am
Activity: refresh Python syntax: assign/update variables, int/float/round, arithmetic
Reflection: (1) lose focus after 50min. maybe it is better to break up thy study sessions to stay focused. (2) try to set up my "focus" routine. used rain sound as the background music today but felt a bit sleepy after a while. The rain sound seems to calm me down efficiently. Current routine: wake up > drink warm water > play the rain sound > study.
Motivation level: 5 out of 5.
2021-10-22 | #2
Time: 5-6am
Activity: (1) start my first lesson on Dataquest. Just basic python stuff. Good to refresh the memory of the syntax. (2) Read introductory vignette of Seurat. Installed the R package.
Reflection: (1) I need to block off a specific time slot for efficient learning. (2) I guess I will be using R and python at the same time since Seurat is built on R.
2021-10-21 | #1
Time: 7-9pm
Activity: search online resources and decided a game plan for this learning project